CrawlJobs Logo

Senior Researcher - GPU Performance

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Generative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering these experiences at scale. Within our Microsoft wide Systems Innovation initiative, we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We are an Applied Research team driving mid- and long-term product innovations. We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical knowledge in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers. We are looking for a Senior Researcher - GPU Performance – Hardware/Software Codesign researcher to explore hardware/kernel-level optimizations to deliver significant efficiency gains for Large Language Models and Generative AI experiences.

Job Responsibility:

  • Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing
  • Research and develop novel optimization techniques for generation of GPU kernels
  • Profile and analyze kernel performance using advanced diagnostic tools
  • Generate automated solutions for kernel optimization and tuning
  • Collaborate with other researchers to improve model performance
  • Document optimization strategies and maintain performance benchmarks
  • Contribute to the development of internal GPU computing frameworks

Requirements:

  • Doctorate in relevant field OR equivalent experience
  • 2+ years of experience in GPU architecture, memory hierarchies, parallel computing and algorithm optimization
  • 2+ years of experience in GPU programming, including performance profiling and optimization tools
  • Reliable C++ programming skills
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have:

  • 5+ years of experience in GPU programming and optimization, expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks
  • Experience with machine learning frameworks (PyTorch, TensorFlow)
  • Familiarity with compiler optimization techniques and background in auto-tuning and automated code generation
  • Publication record in relevant conferences or journals (MLSys, NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, ISCA, MICRO, ASPLOS, HPCA, SOSP, OSDI, NSDI, etc.)

Additional Information:

Job Posted:
January 29, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Researcher - GPU Performance

Senior Research Engineer

We are seeking a highly skilled Senior Research Engineer to collaborate closely ...
Location
Location
United States
Salary
Salary:
210000.00 - 309000.00 USD / Year
assembly.ai Logo
Assembly
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise in the Python ecosystem and major ML frameworks (PyTorch, JAX)
  • Experience with lower-level programming (C++ or Rust preferred)
  • Deep understanding of GPU acceleration (CUDA, profiling, kernel-level optimization)
  • TPU experience is a strong plus
  • Proven ability to accelerate deep learning workloads using compiler frameworks, graph optimizations, and parallelization strategies
  • Solid understanding of the deep learning lifecycle: model design, large-scale training, data processing pipelines, and inference deployment
  • Strong debugging, profiling, and optimization skills in large-scale distributed environments
  • Excellent communication and collaboration skills, with the ability to clearly prioritize and articulate impact-driven technical solutions
Job Responsibility
Job Responsibility
  • Investigate and mitigate performance bottlenecks in large-scale distributed training and inference systems
  • Develop and implement both low-level (operator/kernel) and high-level (system/architecture) optimization strategies
  • Translate research models and prototypes into highly optimized, production-ready inference systems
  • Explore and integrate inference compilers such as TensorRT, ONNX Runtime, AWS Neuron and Inferentia, or similar technologies
  • Design, test, and deploy scalable solutions for parallel and distributed workloads on heterogeneous hardware
  • Facilitate knowledge transfer and bidirectional support between Research and Engineering teams, ensuring alignment of priorities and solutions
What we offer
What we offer
  • competitive equity grants
  • 100% employer-paid benefits
  • flexibility of being fully remote
  • Fulltime
Read More
Arrow Right

Senior Research Engineer/Scientist - Edge, Consumer Products

As a Research Engineer/Scientist on the Consumer Products Research team, you wil...
Location
Location
United States , San Francisco
Salary
Salary:
380000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Research background in adapting transformers to run in environments with significantly less compute than traditional GPUs and datacenter accelerators
  • Love performance optimization and working with GPU kernel engineers
  • Do rigorous science (rather than vibes based)
  • Have already spent time in the weeds teaching models to speak and perceive
Job Responsibility
Job Responsibility
  • Train and evaluate multimodal SoTA models along axis that are important to our vision for future devices
  • Develop novel architectures that improve model performance when scaling the models themselves is not an option
  • Run through the necessary walls to take nascent research capabilities and turn them into capabilities we can build on top of
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Senior Research Software Engineer - Azure Office of the CTO

Azure Office of the CTO (AOCTO) plays a crucial role in Microsoft’s rapidly expa...
Location
Location
United States , Multiple Locations
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design and execute AI and security research initiatives from hypothesis development through experimentation, validation, and analysis, driving outcomes that contribute to academic publication and/or product integration
  • Develop and evaluate model improvement strategies through systematic experimentation and ablation, ensuring both scientific rigor and practical applicability
  • Analyze model behavior, robustness, and safety characteristics to inform technical direction, research contributions, and real-world deployment decisions
  • Maintain and optimize GPU research infrastructure, ensuring cluster reliability, performance efficiency, and adherence to security best practices to support experimentation
  • Synthesize emerging technical trends into actionable insights and collaborate across research and engineering teams to translate validated findings into high-impact outcomes
  • Conduct market, technical, and architectural research to evaluate emerging technologies
  • Keep up with cloud trends and share insights with the CTO and executive office
  • Maintain confidentiality on internal projects and initiatives not yet public
  • Fulltime
Read More
Arrow Right

Senior Researcher - Efficient AI

Generative AI is transforming how people create, collaborate, and communicate—re...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate in relevant field
  • OR Master's Degree in relevant field AND 3+ years related research experience
  • OR Bachelor's Degree in relevant field AND 4+ years related research experience
  • OR equivalent experience
  • Demonstrated expertise in areas of algorithmic optimization, parallel computing, queuing and scheduling theory, and practical request orchestration under strict SLO constraints
  • Strong understanding of GPU architecture and memory hierarchies
  • Proficiency in C++ and Python for high-performance systems, with strong code quality and profiling/debugging skills
  • Proven record of research impact through publications and/or patents, and experience carrying ideas through to systems that operate at scale in real production environments
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths
  • Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production
  • Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities
  • Build and benchmark experimental prototypes and large-scale measurements to validate research ideas and drive them toward production readiness
  • produce clear technical documentation, design reviews, and operational playbooks
  • Publish research results, file patents, and, where appropriate, contribute to open-source systems and serving frameworks
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Scientist, Multimodal & Relational Foundation Models

As part of our team, you will help to accelerate and optimize our progress in de...
Location
Location
United States , Redwood City; San Diego
Salary
Salary:
251700.00 - 330000.00 USD / Year
altoslabs.com Logo
Altos Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Machine Learning, or a similar quantitative field with 5+ years of relevant work experience in academic or industry settings
  • Prior experience in developing and implementing novel generative AI models, specifically in multimodal integration, GraphRAG, or relational deep learning
  • Deep understanding of Machine Learning principles and how they apply to diverse architectures like Transformers, GNNs, and diffusion models
  • Very strong programming skills in Python and deep learning libraries (e.g., PyTorch, JAX, Hugging Face Transformers/Accelerate)
  • Proven experience with multi-GPU and distributed training at scale (e.g., DDP, FSDP, DeepSpeed, Megatron, or Ray)
  • Strong track record of published, peer-reviewed innovative AI/ML research at top-tier conferences (NeurIPS, ICML, ICLR, CVPR)
Job Responsibility
Job Responsibility
  • Pre-train and fine-tune large-scale machine learning systems using multimodal biological data, natural language, and structured relational inputs
  • Architect and implement novel hybrid models that integrate Large Language Models (LLMs) with Graph Neural Networks (GNNs) for multi-hop reasoning over biological knowledge graphs
  • Develop Relational Foundation Models (RFMs) that enable zero-shot predictive tasks over heterogeneous, multi-table biological datasets
  • Lead the design of efficient data loading strategies and distributed training recipes (e.g., FSDP, DeepSpeed) to train models across multiple GPU nodes
  • Gain insights into model performance based on theory, deep research, and the mathematical underpinnings of set-invariant and graph-structured architectures
  • Apply strong coding experience to model development and deployment, ensuring research prototypes transition into reliable, scalable production systems
  • Stay up-to-date on the latest developments in deep learning—including native early-fusion and Mixture-of-Experts (MoE) architectures—and apply this knowledge to Altos' research
  • Mentor junior staff while maintaining a high individual technical contribution to the core research ecosystem and peer-reviewed publications
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer

If you’re passionate about scalability, automated deployment, and well-optimized...
Location
Location
Romania , Bucharest
Salary
Salary:
Not provided
it-genetics.com Logo
IT Genetics Romania
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • University degree, preferably in engineering (software, industrial, mechanical, process) or a related field
  • Over 5 years of experience in MLOps or machine learning engineering, with a focus on deploying and managing deep learning models at scale
  • Strong skills in Python, CI/CD pipelines, and ML frameworks (e.g., PyTorch, TensorFlow, OpenCV) for automating and scaling ML workflows
  • Expertise in monitoring and alert automation for ML workflows, including data pipelines, training processes, and model performance (e.g., Prometheus, Grafana)
  • Familiarity with distributed training techniques, multi-GPU strategies, and hardware optimization for deep learning
  • Strong communication and interpersonal skills
Job Responsibility
Job Responsibility
  • Design end-to-end architecture for the automated training of ML models
  • Create data pipelines to build relevant datasets and data annotation flows
  • Monitor ML model performance and data drift
  • Handle versioning, deployment, and integration with the software team
  • Develop and manage CI/CD pipelines for building, testing, and deploying models
  • Apply best practices for model versioning, rollback, and A/B testing to ensure reliable and accurate production releases
  • Set up a robust monitoring system and develop automated alerting solutions to proactively identify issues in data pipelines, model training, validation, and data variation
  • Promote MLOps best practices (Infrastructure as Code, reproducibility, security) and continuously improve internal processes to increase reliability and efficiency
  • Research and implement cutting-edge technologies to improve training efficiency (e.g., distributed training, HPC, multi-GPU strategies) for the research team
  • Explore future MLOps frameworks and GPU-based cloud solutions as part of the scalability roadmap
What we offer
What we offer
  • Meal tickets
  • A place where your voice truly matters
  • Performance bonuses
  • A day off on your birthday
  • Private medical subscription
  • Trainings and learning resources
  • Hybrid work model
  • Bookster subscription
  • A friendly, passionate, and solution-oriented team
  • Opportunities to grow or change your role within the company
Read More
Arrow Right

Customer Support Engineer

As a Customer Support Engineer at a pioneering AI company, you'll be the first l...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 260000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI
  • Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
  • Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages
  • Familiarity with operating storage systems in HPC environments such as Vast and Weka
  • Familiarity with inspecting and resolving network-related errors
  • Strong knowledge of Python, TypeScript, and/or JavaScript with testing/debugging experience using curl and Postman-like tools
  • Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
  • Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution
  • Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success
  • Strong sense of ownership and willingness to learn new skills to ensure both team and customer success
Job Responsibility
Job Responsibility
  • Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services
  • ensure swift and effective solutions every time
  • Become a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams
  • Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns
  • collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction
  • Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
  • Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers
  • Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right

Customer Support Engineer

As a Customer Support Engineer at a pioneering AI company, you'll be the first l...
Location
Location
India
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI
  • Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
  • Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages
  • Familiarity with operating storage systems in HPC environments such as Vast and Weka
  • Familiarity with inspecting and resolving network-related errors
  • Strong knowledge of Python, TypeScript, and/or JavaScript with testing/debugging experience using curl and Postman-like tools
  • Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
  • Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution
  • Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success
  • Strong sense of ownership and willingness to learn new skills to ensure both team and customer success
Job Responsibility
Job Responsibility
  • Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services
  • ensure swift and effective solutions every time
  • Become a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams
  • Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns
  • collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction
  • Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
  • Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers
  • Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work for the respective hiring region
Read More
Arrow Right