CrawlJobs Logo

Research Intern - AI Network Observability

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Mountain View

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

6710.00 - 13270.00 USD / Month

Job Description:

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. As a Research Intern in the Strategic Planning and Architecture (SPARC) group, you will contribute to the research, design, and development of tools to provide insights into multi-path network transports for large-scale Artificial Intelligence (AI) datacenter environments. Your work will focus on building high-performance tracing and analysis systems capable of capturing packet-level behavior at extremely high speeds (up to 800Gbps). These tools will enhance observability for next-generation transport protocols supporting AI workloads. The role offers opportunities to prototype solutions on real hardware and collaborate with engineers to improve reliability and strengthen the explainability of AI intra-datacenter networking.

Job Responsibility:

  • Engage early with their mentors to clearly formulate a plan of work for the 12 weeks of the Research Internship
  • Clearly and frequently document and communicate their progress, adjusting the plan as the project evolves
  • Show initiative and think unconventionally to derive creative and innovative solutions

Requirements:

  • Currently enrolled in a PhD program in Computer Science or a related STEM field
  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship
  • Applicants should demonstrate depth of knowledge in datacenter networking and systems research
  • Experience in high performance programming network data paths (e.g., using C++)
  • Experience in RDMA and/or DPDK
  • Experience in RoCE, knowledge of TCP, UDP, IP, ethernet

Additional Information:

Job Posted:
April 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Research Intern - AI Network Observability

Staff Software Engineer, GPU Infrastructure (HPC)

The internal infrastructure team is responsible for building world-class infrast...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments
  • Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads
  • Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions
  • Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads
  • Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges
  • Self-directed problem-solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast-paced environment
Job Responsibility
Job Responsibility
  • Build and scale ML-optimized HPC infrastructure: Deploy and manage Kubernetes-based GPU/TPU superclusters across multiple clouds, ensuring high throughput and low-latency performance for AI workloads
  • Optimize for AI/ML training: Collaborate with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, leveraging technologies like RDMA, NCCL, and high-speed interconnects
  • Troubleshoot and resolve complex issues: Proactively identify and resolve infrastructure bottlenecks, performance degradation, and system failures to ensure minimal disruption to AI/ML workflows
  • Enable researchers with self-service tools: Design intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently
  • Drive innovation in ML infrastructure: Work closely with AI researchers to understand emerging needs (e.g., JAX, PyTorch, distributed training) and translate them into robust, scalable infrastructure solutions
  • Champion best practices: Advocate for observability, automation, and infrastructure-as-code (IaC) across the organization, ensuring systems are maintainable and resilient
  • Mentorship and collaboration: Share expertise through code reviews, documentation, and cross-team collaboration, fostering a culture of knowledge transfer and engineering excellence
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

AI Engineering Intern (LLM)

Student Exploration and Experience Development (SEED) is a 12-week internship op...
Location
Location
United States , Paramus
Salary
Salary:
21.00 - 25.00 USD / Hour
veolianorthamerica.com Logo
Veolia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Working towards a PhD degree in AI/ML/Computer Science
  • 3.8 Cumulative G.P.A required
  • Strong communication skills, including written, verbal, listening, presentation and facilitation skills
  • Demonstrated ability to build collaborative relationships
  • Understanding and working with commercial/proprietary LLMs such as Gemini( Google), GPT(OpenAI) and Claude Sonnet (Anthropic)for high performance, large context, and multimodal tasks
  • Familiarity with open-source/self-hosted LLMs like Llama from Meta and Mixtral from (Mistral AI)
  • Requirements Gathering: Using Confluence for documentation and collaboration
  • Architecture Design: Creating system diagrams and workflows with Lucidchart
  • Prototyping: Designing UI/UX prototypes in Figma
  • Project Management: Tracking tasks and progress in Jira
Job Responsibility
Job Responsibility
  • Support the development and implementation of an AI-powered deep research agent
  • Gain hands-on experience with cutting-edge large language models, cloud infrastructure, and enterprise software development
  • Work on real-world projects
  • Receive mentorship from industry professionals
  • Participate in workshops and networking events
  • Fulltime
Read More
Arrow Right

Senior Full Stack Engineer - Go / React.js

Rapid7’s Metasploit team is building the future of the world’s best-known softwa...
Location
Location
Czechia , Prague
Salary
Salary:
Not provided
rapid7.com Logo
Rapid7
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in software development using Go, JavaScript, TyperScript and React (Next.js) or equivalent of programming languages
  • Experience with modern cloud infrastructure (AWS, GCP, or Azure)
  • Experience with design patterns
  • Experience with message queues (RabbitMQ, SQS)
  • Understanding of APIs, interprocess communication, and modern networking and deployment tooling (AWS, Docker)
  • High level of accountability and ownership
  • Leading with empathy and strong user focus
  • Ability to learn and evaluate new technologies quickly
  • Interest in or experience with offensive security, penetration testing, or SOC analysis
  • Product driven mindset
Job Responsibility
Job Responsibility
  • Develop and enhance AI-powered applications within Metasploit ecosystem
  • Architect and implement performant, scalable, and reliable solutions that support AI-driven interactions in web development
  • Collaborate cross-functionally with researchers, engineers and product teams to push the boundaries of AI in cybersecurity
  • Ensure an exceptional user experience through user-friendly UI/UX
  • Diagnose and resolve complex issues, ensuring the reliability and performance of AI-powered products
  • Build tooling and automation to enhance incident response, developer experience, observability, and internal debugging workflows
  • Champion your teammates' successes, and support each other when needed
  • Fulltime
Read More
Arrow Right

Enterprise Account Executive

We are looking for a fast-paced, client-obsessed Account Executive with an entre...
Location
Location
United States , New York City
Salary
Salary:
250000.00 - 300000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years enterprise SaaS sales experience
  • Hungry, aggressive and motivated
  • Familiarity or willingness to learn sales technologies to find and attract prospects
  • Self-starter and comfortable working in limited process environments
  • Full-cycle sales experience and ability to navigate the complexities of enterprise deals
  • Fast-paced and focused on helping prospects / customers
  • Team player: Collaboration with peers and other organizations within Arize is critical to success
  • Strong communication skills: Clearly and objectively communicate observations from the field
Job Responsibility
Job Responsibility
  • Be a networker, seller and closer
  • Build relationships with AI/ML stakeholders and be an active member of the community
  • Conduct discovery with prospects and share the Arize vision
  • Run a sophisticated prospecting strategy to “get the word out” and find deals
  • Create sales plays, write talk tracks and strategically identify new business opportunities
  • Deeply research accounts, stakeholders and competitors
  • Manage proof of concepts, drive adoption and grow accounts
  • Manage and navigate internal / external stakeholders to ensure success
  • Understand use cases, scope licensing and find more workloads
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k) plan
  • unlimited paid time off
  • generous parental leave plan
  • others for mental and wellness support
  • competitive equity package
  • Fulltime
Read More
Arrow Right

Enterprise Account Executive

We are looking for a fast-paced, client-obsessed Account Executive with an entre...
Location
Location
United States , San Francisco
Salary
Salary:
250000.00 - 300000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years enterprise SaaS sales experience
  • Hungry, aggressive and motivated
  • Familiarity or willingness to learn sales technologies to find and attract prospects
  • Self-starter and comfortable working in limited process environments
  • Full-cycle sales experience and ability to navigate the complexities of enterprise deals
  • Fast-paced and focused on helping prospects / customers
  • Team player: Collaboration with peers and other organizations within Arize is critical to success, we deeply value the success of the collective team over individual gains
  • Strong communication skills: Clearly and objectively communicate observations from the field
Job Responsibility
Job Responsibility
  • Be a networker, seller and closer
  • Build relationships with AI/ML stakeholders and be an active member of the community
  • Conduct discovery with prospects and share the Arize vision
  • Run a sophisticated prospecting strategy to “get the word out” and find deals
  • Create sales plays, write talk tracks and strategically identify new business opportunities
  • Deeply research accounts, stakeholders and competitors
  • Manage proof of concepts, drive adoption and grow accounts
  • Manage and navigate internal / external stakeholders to ensure success
  • Understand use cases, scope licensing and find more workloads
  • BANT or MEDDIC methodology preferred
What we offer
What we offer
  • competitive equity package
  • comprehensive benefit package, including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, and others for mental and wellness support
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Together Cloud Infrastructure

Together AI is building the AI Acceleration Cloud, an end-to-end platform for th...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 230000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
  • Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
  • Deep experience with DC networking tech + solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN
  • Experience with Cluster API or similar a big plus
  • Experience working on high-performance compute, networking, and/or storage a big plus
  • Experience virtualizing GPUs and/or Infiniband a big plus
Job Responsibility
Job Responsibility
  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right

Enterprise Account Executive

We are looking for a fast-paced, client-obsessed Account Executive with an entre...
Location
Location
United States , New York City
Salary
Salary:
250000.00 - 300000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years enterprise SaaS sales experience
  • Hungry, aggressive and motivated
  • Familiarity or willingness to learn sales technologies to find and attract prospects
  • Self-starter and comfortable working in limited process environments
  • Full-cycle sales experience and ability to navigate the complexities of enterprise deals
  • Fast-paced and focused on helping prospects / customers
  • Team player: Collaboration with peers and other organizations within Arize is critical to success, we deeply value the success of the collective team over individual gains
  • Strong communication skills: Clearly and objectively communicate observations from the field
Job Responsibility
Job Responsibility
  • Be a networker, seller and closer
  • Build relationships with AI/ML stakeholders and be an active member of the community
  • Conduct discovery with prospects and share the Arize vision
  • Run a sophisticated prospecting strategy to “get the word out” and find deals
  • Create sales plays, write talk tracks and strategically identify new business opportunities
  • Deeply research accounts, stakeholders and competitors
  • Manage proof of concepts, drive adoption and grow accounts
  • Manage and navigate internal / external stakeholders to ensure success
  • Understand use cases, scope licensing and find more workloads
What we offer
What we offer
  • competitive equity package
  • comprehensive benefit package, including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, and others for mental and wellness support
  • Fulltime
Read More
Arrow Right

Enterprise Account Executive

We are looking for a fast-paced, client-obsessed Account Executive with an entre...
Location
Location
United States , New York City
Salary
Salary:
250000.00 - 300000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years enterprise SaaS sales experience
  • Hungry, aggressive and motivated
  • Familiarity or willingness to learn sales technologies to find and attract prospects
  • Self-starter and comfortable working in limited process environments
  • Full-cycle sales experience and ability to navigate the complexities of enterprise deals
  • Fast-paced and focused on helping prospects / customers
  • Team player: Collaboration with peers and other organizations within Arize is critical to success, we deeply value the success of the collective team over individual gains
  • Strong communication skills: Clearly and objectively communicate observations from the field
Job Responsibility
Job Responsibility
  • Be a networker, seller and closer
  • Build relationships with AI/ML stakeholders and be an active member of the community
  • Conduct discovery with prospects and share the Arize vision
  • Run a sophisticated prospecting strategy to “get the word out” and find deals
  • Create sales plays, write talk tracks and strategically identify new business opportunities
  • Deeply research accounts, stakeholders and competitors
  • Manage proof of concepts, drive adoption and grow accounts
  • Manage and navigate internal / external stakeholders to ensure success
  • Understand use cases, scope licensing and find more workloads
  • BANT or MEDDIC methodology preferred
What we offer
What we offer
  • competitive equity package
  • comprehensive benefit package, including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, and others for mental and wellness support
  • Fulltime
Read More
Arrow Right