CrawlJobs Logo

Member of Technical Staff, AI Networking

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Mountain View

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Microsoft AI is hiring a Member of Technical Staff, AI Networking to design and scale the world’s most advanced high-performance networks powering Copilot and next-generation AI systems. Join the team building the fabric that connects frontier-class datacenters, enables multi-gigawatt AI supercomputers, and supports the training of the most sophisticated AI models on the planet.

Job Responsibility:

  • Advanced ROCE transport design, congestion control, ECN/WRED/DCTCP tuning
  • Fabric architecture, topology planning, network modeling, and scaling strategy
  • Telemetry, observability, reliability engineering, and automated troubleshooting
  • Develop and tune the deployment of novel routing techniques to achieve reliability in large networks
  • Work with world class network designers like NVIDIA, Broadcom, and in-house silicon/network co-design teams
  • AI training + inference cluster bring-up, performance benchmarking, and root-cause analysis
  • Gather data and insights to develop the pretraining compute roadmap
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience

Additional Information:

Job Posted:
April 01, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, AI Networking

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...
Location
Location
United States , New York, NY; San Mateo, CA; Redwood City, CA
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Strong software development skills in languages like Python, or C++
  • Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Job Responsibility
Job Responsibility
  • Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
  • Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
  • Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
  • Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
  • Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
  • Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
  • Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Capacity & Efficiency Infrastructure

Microsoft AI is looking for a Member of Technical Staff – Capacity & Efficiency ...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Deep understanding of the fundamentals of GPU architectures and DL/LLM architectures
  • Deep experience in profiling and analyzing performance in large-scale distributed computing systems
  • Deep experience in profiling and analyzing performance in ML models especially GenAI models
  • Experience with low-level GPU programming (CUDA, Triton, NCCL) and frameworks such as PyTorch or JAX
  • Experience in leading technical projects and supporting architectural decisions with data
  • Experience building infrastructure for large-scale machine learning or generative AI workloads
  • Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • Track record of contributing to high-performance computing or large-scale AI infrastructure projects
Job Responsibility
Job Responsibility
  • Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters
  • Build and evolve telemetry systems to provide visibility into infrastructure & ML model performance, utilization, and cost related metrics
  • Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems
  • Drive architectural improvements across various ML services which deliver measurable efficiency improvements
  • Build and evolve tools to automatically provide insights and recommendations to improve fleet-wide efficiency
  • Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies
  • Partner with ML researchers and infrastructure engineers to understand their plans and future needs and develop plans to balance growth with efficiency
  • Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, MAIA, and beyond)
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - GPU Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier age...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
Prime Intellect
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years hands-on experience with GPU clusters and HPC environments
  • Deep expertise with SLURM and Kubernetes in production GPU settings
  • Proven experience with InfiniBand configuration and troubleshooting
  • Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack
  • Experience with infrastructure automation tools (Ansible, Terraform)
  • Proficiency in Python, Bash, and systems programming
  • Track record of customer-facing technical leadership
  • NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)
  • Container runtime configuration for GPUs (Docker, Containerd, Enroot)
  • Linux kernel tuning and performance optimization
Job Responsibility
Job Responsibility
  • Partner with clients to understand workload requirements and design optimal GPU cluster architectures
  • Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs
  • Develop deployment strategies for LLM training, inference, and HPC workloads
  • Present architectural recommendations to technical and executive stakeholders
  • Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads
  • Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects
  • Optimize GPU utilization, memory management, and inter-node communication
  • Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance
  • Tune system performance from kernel parameters to CUDA configurations
  • Serve as primary technical escalation point for customer infrastructure issues
  • Fulltime
Read More
Arrow Right

Technical Specialist (Teaching Technologist)

Recognised as a technical specialist in Immersive technology, this post will con...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
emeritus.org Logo
Emeritus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Current specialist knowledge and refined skills in immersive technologies, interactive and immersive media production
  • significant experience in tools for creating virtual reality and augmented reality, building virtual environments and interactions
  • Knowledge of using games engines (Unity or Unreal) to develop for Immersive projects
  • Knowledge of 360 video and 3D graphics for VR
  • 3D Animation/ Modelling and design (Maya, Blender)
  • Thorough knowledge of digital/interactive media software (e.g. DaVinci Resolve, FCP, Premiere, Touch Designer, Stornoway), and AI tools for XR creation
  • Knowledge of analogue and digital sound and spatial audio for VR and Immersive projects
  • Knowledge of Motion Capture, 3D scanning and photogrammetry
  • Detailed, current knowledge, skills and experience of XR hardware, such as Meta Quest VR headsets, mobile handsets, including system setup, troubleshooting, and maintenance, platforms like SteamVR and Oculus SDK for managing VR hardware
  • expertise with specialist equipment like 360 cameras and ambisonic microphones
Job Responsibility
Job Responsibility
  • Apply professional expertise in the design of stimulating learning solutions
  • Provide specialist guidance and support to students, staff or other technical staff in the utilisation of virtual reality, augmented reality, mixed reality, and motion capture environments, development tools, and related technologies, to enable excellent teaching and learning for all types of students
  • Use specialist experience to provide services, guidance, troubleshooting, support and training to students conducting creative immersive projects or research studies
  • Provide workshops and project support across a range of units and particular guidance will be needed as students develop blended or virtual performance and immersive pieces or need to submit sketches and show the process of design work
  • Proactively engage in academic and pastoral student support procedures, provide direct support and/or making referrals to the wider University support network as appropriate
  • Collaborate with Unit and Programme Directors in the design and delivery of teaching within labs or other teaching space(s)
  • Work with key stakeholders (e.g Unit and Programme Directors) to provide specialist solutions in the planning, design and delivery of practical teaching
  • Deliver teaching and develop physical and digital teaching resources
  • Respond independently using initiative and judgement to find solutions and take action
  • Maintain awareness of developments and changes in immersive technologies and media
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Engineer

As Microsoft continues to push the boundaries of AI, we are on the lookout for i...
Location
Location
United States , New York
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience
  • 4+ years technical engineering experience building data processing applications (batch and streaming) with coding in languages including, but not limited to, Python, Java, Spark, SQL
  • Experience working with Apache Hadoop eco system, Kafka, NoSQL, etc
  • 3+ years experience with data governance, data compliance and/or data security
  • 2+ years' experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP. Extensive use datastores like RDBMS, key-value stores, etc
  • 2+ years' experience building distributed systems at scale and extensive systems knowledge that spans bare-metal hosts to containers to networking
  • Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience
  • Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security
Job Responsibility
Job Responsibility
  • Build scalable data pipelines for sourcing, transforming and publishing data assets for AI use cases
  • Work collaboratively with other Platform, infrastructure, application engineers as well as AI Researchers to build next generation data platform products and services
  • Ship high-quality, well-tested, secure, and maintainable code
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Backend Engineer

Microsoft AI is looking for a talented Backend engineer to help build the next w...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years' experience building backend API for mobile apps such as GraphQL/Rest APIs/Protobuf/Thrift, and streaming protocols such as websocket/SSE/WebRTC with familiarity in backend and mobile data schema code generation or consistency, version control for mobile releases, analytics, feature flags, a/b testing framework
  • 4+ years' experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP. Extensive use datastores like RDBMS, key-value stores, etc
  • 4+ years' experience building distributed systems at scale and extensive systems knowledge that spans bare-metal hosts to containers to networking
Job Responsibility
Job Responsibility
  • Build secure and performant APIs that power Copilot apps
  • Work collaboratively with other product engineers, Product Managers, and platform engineers to take ambiguous projects and mold them into amazing experiences
  • Ship high-quality, well-tested, secure, and maintainable code
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, High Performance Computing Engineer

Microsoft AI is looking for experienced Member of Technical Staff, High Performa...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical field AND 4+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters
  • 4+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.)
  • 4+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP
  • OR equivalent experience
Job Responsibility
Job Responsibility
  • Design, operate, and maintain large-scale HPC environments
  • Own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes)
  • Serve as a technical owner for at least one core HPC domain (GPU compute, high-performance storage, networking, or similar)
  • Develop and maintain automation and tooling using Bash and/or Python
  • Partner closely with researchers and engineers to support their workloads, troubleshoot cluster usage issues, and triage failed or underperforming jobs
  • Drive work forward independently by navigating ambiguity and technical roadblocks
  • Enjoy working in a fast-paced, design-driven product development environment
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Applied Research

The Applied Researcher role is designed for engineers who love working across ML...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Science, Electrical Engineering, Machine Learning, or a related field, or equivalent practical experience, open to all levels of experiences
  • Strong experience with PyTorch and modern Transformer architectures
  • Solid computer science fundamentals: data structures, algorithms, concurrency, distributed systems, networking
  • Hands-on experience training, fine-tuning, or evaluating machine learning models, preferably LLMs
  • Familiarity with recent developments in the LLM research domain, including model architectures, training methods, and evaluation strategies
  • Passion for partnering with customers: understanding their constraints, co-designing solutions, and iterating based on real-world feedback
  • Curiosity and enthusiasm for exploring a wide range of problem domains and project types - from quick experiments to long-running, complex engagements
  • Ability to operate in a fast-paced, ambiguous environment and drive projects independently
Job Responsibility
Job Responsibility
  • Sit at the intersection of ML research, systems engineering, and customer-facing problem solving
  • Work hands-on with customers and customer data to tune, evaluate and deploy models using various techniques such as SFT / DPO / RL
  • Help customers build competitive models using their unique data tailored to their unique products
  • Be the technical bridge between customer needs, customer data, and our tuning and serving infrastructure
What we offer
What we offer
  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure
  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally
  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results
  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation
  • Fulltime
Read More
Arrow Right