CrawlJobs Logo

Senior Researcher - GPU Performance

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Generative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering these experiences at scale. Within our Microsoft wide Systems Innovation initiative, we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We are an Applied Research team driving mid- and long-term product innovations. We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical knowledge in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers. We are looking for a Senior Researcher - GPU Performance – Hardware/Software Codesign researcher to explore hardware/kernel-level optimizations to deliver significant efficiency gains for Large Language Models and Generative AI experiences.

Job Responsibility:

  • Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing
  • Research and develop novel optimization techniques for generation of GPU kernels
  • Profile and analyze kernel performance using advanced diagnostic tools
  • Generate automated solutions for kernel optimization and tuning
  • Collaborate with other researchers to improve model performance
  • Document optimization strategies and maintain performance benchmarks
  • Contribute to the development of internal GPU computing frameworks

Requirements:

  • Doctorate in relevant field OR equivalent experience
  • 2+ years of experience in GPU architecture, memory hierarchies, parallel computing and algorithm optimization
  • 2+ years of experience in GPU programming, including performance profiling and optimization tools
  • Reliable C++ programming skills
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have:

  • 5+ years of experience in GPU programming and optimization, expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks
  • Experience with machine learning frameworks (PyTorch, TensorFlow)
  • Familiarity with compiler optimization techniques and background in auto-tuning and automated code generation
  • Publication record in relevant conferences or journals (MLSys, NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, ISCA, MICRO, ASPLOS, HPCA, SOSP, OSDI, NSDI, etc.)

Additional Information:

Job Posted:
January 29, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Researcher - GPU Performance

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Senior Research Engineer/Scientist - Edge, Consumer Products

As a Research Engineer/Scientist on the Consumer Products Research team, you wil...
Location
Location
United States , San Francisco
Salary
Salary:
380000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Research background in adapting transformers to run in environments with significantly less compute than traditional GPUs and datacenter accelerators
  • Love performance optimization and working with GPU kernel engineers
  • Do rigorous science (rather than vibes based)
  • Have already spent time in the weeds teaching models to speak and perceive
Job Responsibility
Job Responsibility
  • Train and evaluate multimodal SoTA models along axis that are important to our vision for future devices
  • Develop novel architectures that improve model performance when scaling the models themselves is not an option
  • Run through the necessary walls to take nascent research capabilities and turn them into capabilities we can build on top of
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Senior Research Software Engineer - Azure Office of the CTO

Azure Office of the CTO (AOCTO) plays a crucial role in Microsoft’s rapidly expa...
Location
Location
United States , Multiple Locations
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design and execute AI and security research initiatives from hypothesis development through experimentation, validation, and analysis, driving outcomes that contribute to academic publication and/or product integration
  • Develop and evaluate model improvement strategies through systematic experimentation and ablation, ensuring both scientific rigor and practical applicability
  • Analyze model behavior, robustness, and safety characteristics to inform technical direction, research contributions, and real-world deployment decisions
  • Maintain and optimize GPU research infrastructure, ensuring cluster reliability, performance efficiency, and adherence to security best practices to support experimentation
  • Synthesize emerging technical trends into actionable insights and collaborate across research and engineering teams to translate validated findings into high-impact outcomes
  • Conduct market, technical, and architectural research to evaluate emerging technologies
  • Keep up with cloud trends and share insights with the CTO and executive office
  • Maintain confidentiality on internal projects and initiatives not yet public
  • Fulltime
Read More
Arrow Right

Senior Principal Engineering Manager

Microsoft Research (MSR) is working to transform the future of artificial intell...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of people management experience leading software engineering teams, including managing principal engineers
  • Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning(ML) workloads
  • Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability
  • Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
  • Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise
  • Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team)
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure
  • Recruit and develop exceptional engineering talent, building a diverse team - including hiring, onboarding, career development, and performance management
  • Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality
  • Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development
  • Provide technical vision and judgment on the team's architecture, strategy, and roadmap — spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows — while empowering engineers to own deep technical details
  • Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together
  • Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate
  • Fulltime
Read More
Arrow Right

Senior Researcher - Efficient AI

Generative AI is transforming how people create, collaborate, and communicate—re...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field OR Master's Degree in relevant field AND 3+ years related research experience OR Bachelor's Degree in relevant field AND 4+ years related research experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Demonstrated experience in designing and optimizing efficient inference systems, combining foundations in algorithmic optimization, parallel computing, and request orchestration under strict SLO constraints with deep knowledge of attention and KV‑cache optimizations, batching and scheduling strategies, and cost‑aware deployment
  • 3+ years of experience with machine learning frameworks (e.g., PyTorch, TensorFlow) and inference serving frameworks (e.g., vLLM, Triton Inference Server, TensorRT-LLM, ONNX Runtime, Ray Serve, DeepSpeed-MII)
  • 3+ years of experience in GPU programming and optimization, with expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks
  • Proficiency in C++ and Python for high-performance systems, with code quality and profiling/debugging skills
  • Research impact through publications and/or patents, coupled with hands‑on experience taking research ideas through execution and delivery in production
Job Responsibility
Job Responsibility
  • Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths
  • Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production
  • Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities
  • Build and benchmark experimental prototypes and large-scale measurements to validate research ideas and drive them toward production readiness
  • produce clear technical documentation, design reviews, and operational playbooks
  • Publish research results, file patents, and, where appropriate, contribute to open-source systems and serving frameworks
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

The R&D of Search Ads aims to build an online advertising ecosystem of users, ad...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm OR equivalent experience
  • 3+ years' practical experience working on applications that use GPUs, experience in optimizing their performance
  • Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR Master's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm OR equivalent experience
  • Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute
  • Technical background and solid foundation in software engineering principles and architecture design
  • Familiar with inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM
  • Exposure to Deep Neural Network inference and experience in one or more deep learning frameworks such as PyTorch, Tensorflow, or ONNX Runtime
Job Responsibility
Job Responsibility
  • Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton
  • Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms
  • Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains
  • Profile workloads end-to-end, identify bottlenecks, and implement kernel-level and system-level performance improvements
  • Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models
  • Validate performance, stability, and correctness through benchmarking, automated testing, and production readiness reviews
  • Fulltime
Read More
Arrow Right

Senior Researcher - Efficient AI

Generative AI is transforming how people create, collaborate, and communicate—re...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field
  • OR Master's Degree in relevant field AND 3+ years related research experience
  • OR Bachelor's Degree in relevant field AND 4+ years related research experience
  • OR equivalent experience
  • Demonstrated expertise in areas of algorithmic optimization, parallel computing, queuing and scheduling theory, and practical request orchestration under strict SLO constraints
  • Strong understanding of GPU architecture and memory hierarchies
  • Proficiency in C++ and Python for high-performance systems, with strong code quality and profiling/debugging skills
  • Proven record of research impact through publications and/or patents, and experience carrying ideas through to systems that operate at scale in real production environments
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths
  • Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production
  • Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities
  • Build and benchmark experimental prototypes and large-scale measurements to validate research ideas and drive them toward production readiness
  • produce clear technical documentation, design reviews, and operational playbooks
  • Publish research results, file patents, and, where appropriate, contribute to open-source systems and serving frameworks
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Scientist, Multimodal & Relational Foundation Models

As part of our team, you will help to accelerate and optimize our progress in de...
Location
Location
United States , Redwood City; San Diego
Salary
Salary:
251700.00 - 330000.00 USD / Year
altoslabs.com Logo
Altos Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Machine Learning, or a similar quantitative field with 5+ years of relevant work experience in academic or industry settings
  • Prior experience in developing and implementing novel generative AI models, specifically in multimodal integration, GraphRAG, or relational deep learning
  • Deep understanding of Machine Learning principles and how they apply to diverse architectures like Transformers, GNNs, and diffusion models
  • Very strong programming skills in Python and deep learning libraries (e.g., PyTorch, JAX, Hugging Face Transformers/Accelerate)
  • Proven experience with multi-GPU and distributed training at scale (e.g., DDP, FSDP, DeepSpeed, Megatron, or Ray)
  • Strong track record of published, peer-reviewed innovative AI/ML research at top-tier conferences (NeurIPS, ICML, ICLR, CVPR)
Job Responsibility
Job Responsibility
  • Pre-train and fine-tune large-scale machine learning systems using multimodal biological data, natural language, and structured relational inputs
  • Architect and implement novel hybrid models that integrate Large Language Models (LLMs) with Graph Neural Networks (GNNs) for multi-hop reasoning over biological knowledge graphs
  • Develop Relational Foundation Models (RFMs) that enable zero-shot predictive tasks over heterogeneous, multi-table biological datasets
  • Lead the design of efficient data loading strategies and distributed training recipes (e.g., FSDP, DeepSpeed) to train models across multiple GPU nodes
  • Gain insights into model performance based on theory, deep research, and the mathematical underpinnings of set-invariant and graph-structured architectures
  • Apply strong coding experience to model development and deployment, ensuring research prototypes transition into reliable, scalable production systems
  • Stay up-to-date on the latest developments in deep learning—including native early-fusion and Mixture-of-Experts (MoE) architectures—and apply this knowledge to Altos' research
  • Mentor junior staff while maintaining a high individual technical contribution to the core research ecosystem and peer-reviewed publications
  • Fulltime
Read More
Arrow Right