CrawlJobs Logo

Principal Researcher - GPU Performance

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Redmond

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Generative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering these experiences at scale. Within our Microsoft wide Systems Innovation initiative, we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We are an Applied Research team driving mid- and long-term product innovations. We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical knowledge in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers. We are looking for a Principal Researcher - GPU Performance – Hardware/Software Codesign researcher to explore hardware/kernel-level optimizations to deliver significant efficiency gains for Large Language Models and Generative AI experiences.

Job Responsibility:

  • Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing
  • Research and develop novel optimization techniques for generation of GPU kernels
  • Profile and analyze kernel performance using advanced diagnostic tools
  • Generate automated solutions for kernel optimization and tuning
  • Collaborate with other researchers to improve model performance
  • Document optimization strategies and maintain performance benchmarks
  • Contribute to the development of internal GPU computing frameworks

Requirements:

  • Doctorate in relevant field AND 3+ years related research experience OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
  • 3+ years of experience in GPU architecture, memory hierarchies, parallel computing and algorithm optimization
  • 3+ years of experience in GPU programming and optimization
  • familiar knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks including performance profiling and optimization tools
  • Experience with machine learning frameworks (PyTorch, TensorFlow)
  • Working knowledge on Large Language Model architectures
  • Publication record in relevant conferences or journals (MLSys, NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, ISCA, MICRO, ASPLOS, HPCA, SOSP, OSDI, NSDI, etc.)

Additional Information:

Job Posted:
January 29, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Researcher - GPU Performance

Principal Open Source AI/ML Solutions Engineer

The Senior Member in the GPU domain is a technical role responsible for owning t...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong C++ and Python programming skills
  • Performance analysis skills for both CPU and GPU
  • Good knowledge of AI/ML Frameworks and Architecture
  • Basic GPU kernel programming knowledge
  • Experience with software engineering methodologies such as Agile, Scrum, Kanban
  • Experience in all the phases of software development, from requirement gathering, analysis, design, development, testing to final release
  • Experience developing software in an end customer product delivery environment
  • Experience with open-source software development including collaboration with community maintainers and submitting contributions
  • Excellent analytical and problem-solving skills
  • Strong communication skills to effectively convey complex technical concepts to both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Architectural Design: Own architectural design and development of GPU software components, ensuring alignment with industry standards and best practices
  • Technical Leadership: Act as one of the subject matter experts in GPU technologies, providing guidance and mentorship to junior engineers in the team on complex technical challenges
  • Software Development: Design, write, and deliver high-quality open software solutions that enhance GPU performance and capabilities. This includes developing drivers, APIs, and other critical software components
  • Research and Innovation: Conduct research to explore new technologies and methodologies that can improve GPU performance and efficiency. Propose innovative solutions to meet evolving market demands
  • Collaboration: Work collaboratively with cross-functional teams, including hardware engineers, system architects, and product managers, to ensure successful integration of GPU technologies into broader systems
  • Documentation and Standards: Develop comprehensive technical documentation and establish coding standards to ensure maintainability and scalability of software products
Read More
Arrow Right

Senior Principal Engineering Manager

Microsoft Research (MSR) is working to transform the future of artificial intell...
Location
Location
United States , Redmond
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of people management experience leading software engineering teams, including managing principal engineers
  • Experience building or operating infrastructure for large-scale distributed systems, cloud platforms, or artificial intelligence (AI)/machine learning(ML) workloads
  • Track record of driving execution on complex, multi-workstream infrastructure projects with clear milestones and accountability
  • Technical fluency in one or more of: large-scale compute clusters, GPU infrastructure, scheduling and orchestration (Kubernetes, Volcano), or High-Performance Compute (HPC) environments
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch
  • Expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • A track record of strong cross-functional partnerships, including the ability to align on strategic direction, deliver joint accountabilities, and develop relationships with staff members with widely varied expertise
  • Experience scaling engineering teams through significant growth phases (hiring, onboarding, and integrating new engineers into a high-performing team)
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow the engineering team that builds MSR’s AI research infrastructure
  • Recruit and develop exceptional engineering talent, building a diverse team - including hiring, onboarding, career development, and performance management
  • Drive execution across the team by setting clear goals, tracking milestones, managing dependencies, and ensuring accountability for delivering complex infrastructure projects on time and at high quality
  • Lead team culture and process changes, cultivating an AI-first mentality that accelerates our progress through agentic coding, automation, and skills development
  • Provide technical vision and judgment on the team's architecture, strategy, and roadmap — spanning supercomputer GPU clusters, high performance networking, workload optimization, researcher tools, and agentic workflows — while empowering engineers to own deep technical details
  • Collaborate closely cross-discipline with engineers, program managers, and research and science teams to align priorities, resolve dependencies, and build better solutions together
  • Foster a team culture of operational excellence, continuous improvement, and high psychological safety where engineers are empowered to take ownership and innovate
  • Fulltime
Read More
Arrow Right

HPC Principal Federal Technical Consultant

Principal Consultant to join our High-Performance Computing (HPC) team. In this ...
Location
Location
United States
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience, with at least 3+ in HPC architecture, systems engineering, or large-scale infrastructure design
  • Advanced degree in Computer Science, Engineering, Physics, or related technical field (or equivalent experience)
  • Proven ability to design and deliver complex, multi-vendor HPC solutions at scale
  • Demonstrated ability to independently complete solution implementations and application design deliverables
  • Must be United States Citizen due to the responsibilities and requirements of the role as this will be supporting a Federal site
  • Top Secret Clearance, TS/SCI with Full Scope Polygraph (FSP)
  • Must be willing to travel as the business dictates
  • Expertise in one or more of the following: parallel computing, MPI/OpenMP, GPU acceleration, workload schedulers (Slurm, Altair PBS Pro, Torque/MOAB, etc.), or large-scale data storage systems (Lustre, GPFS, Ceph)
  • Experience with Network boot technologies (PXE or gPXE/Etherboot etc)
  • Storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
Job Responsibility
Job Responsibility
  • Lead the technical implementation design and delivery of world class scale HPC solutions, from requirements gathering to implementation
  • Provide architectural guidance on compute, storage, networking, and workload management tailored to customer use cases
  • Configure, deploy, and maintain Linux-based HPC clusters, associated storage, and network infrastructure
  • Work in close collaboration with customers on finalizing and deploying HPC software applications, hosting platforms, and management systems that enable customer research and production workloads
  • Provide technical support and troubleshooting for HPC implementation in secure locations
  • Work on both operational support and strategic HPC projects
  • actively participate in customer user group environments
  • Evaluate and implement new tools, middleware, and methodologies to improve operations and service delivery
  • Ensure compliance with enterprise IT security and technology controls
  • Act as principal consultant in customer engagements, often leading cross-functional project teams (including customer staff)
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - Performance Tooling

The Artificial Intelligence (AI) Frameworks team at Microsoft develops AI softwa...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. This includes passing the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience
  • 4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs
  • Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton
  • Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration
  • Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers
  • Ability to independently lead projects
Job Responsibility
Job Responsibility
  • Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference
  • Benchmark OpenAI and other LLMs for performance on Graphic Processing Units (GPUs) and Microsoft hardware
  • Debug, profile, and optimize performance for training/inference workloads on CPUs (Central Processing Units)/GPUs
  • Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint
  • Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements
  • Fulltime
Read More
Arrow Right

Principal Software Development Engineer

You are influential software engineer who is passionate about improving the perf...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Understanding Machine Learning techniques and its application within graphics
  • ML model development and optimization such as quantization
  • Strong ML development pipeline experience in production environment
  • Strong python/porch development experiences from concept to production
  • Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
  • Strong object-oriented programming background, C/C++ preferred
  • Effective communication and problem-solving skills
  • Good interpersonal skills
  • Master's degree or PhD in Computer Science, Computer Engineering, or equivalent
Job Responsibility
Job Responsibility
  • Research, prototype, benchmark different ML algorithms from conceptual phase to deployment on GPU
  • Research, prototype different quantization techniques for the graphics ML algorithms
  • Optimize existing ML algorithms with techniques such as quantization
  • Work across research, hardware, driver and compiler teams to analyze and troubleshoot performance issues, provide solutions to improve rendering speed and ML workload efficiency
  • Stay current with latest advancements in GPU hardware, rendering techniques, graphics APIs and GPU accelerated ML
  • Document and share knowledge on best practices for GPU programming (both graphics and compute/ML) within the team
  • Participate in code reviews and provide constructive feedback to peers
Read More
Arrow Right

Principal Software Engineering Manager - AI Frameworks

As a Principal Software Engineering Manager - AI Frameworks on the team, you wil...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 304200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Master’s Degree in Computer Science or related technical field AND 10+ years of software engineering experience, including 6+ years in engineering management, OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years of software engineering experience, including 6+ years in engineering management, or equivalent experience
  • Strong technical foundation in software engineering principles, computer architecture, GPU architecture, and hardware acceleration for neural networks, with the ability to guide teams working in these areas
  • Experience leading teams responsible for end-to-end performance analysis and optimization of LLMs, AI systems, or HPC workloads, including use of GPU profiling and performance analysis tools
  • Demonstrated ability to lead cross-team initiatives, align stakeholders, and translate research or platform capabilities into scalable, production-ready solutions
  • Proven people leadership skills, including hiring, coaching, performance management, and career development, with a track record of building high-performing, inclusive teams
  • Exposure to AI / ML infrastructure, including DNN or LLM training and/or inference systems, and experience with at least one modern deep learning framework (e.g., PyTorch, TensorFlow, ONNX Runtime)
  • Familiarity with GPU software stacks and acceleration technologies such as CUDA, ROCm, Triton, or equivalent, sufficient to guide technical direction and evaluate tradeoffs
Job Responsibility
Job Responsibility
  • Lead and develop a team of engineers working across multiple layers of the AI software stack to enable large-scale training and inference
  • Set technical vision and execution strategy for model performance benchmarking, optimization, and deployment across GPUs and Microsoft hardware
  • Drive performance outcomes by prioritizing and overseeing efforts to benchmark, profile, debug, and optimize training and inference workloads
  • Own performance health by establishing mechanisms to monitor regressions, measure impact, and continuously improve time-to-deploy and hardware efficiency
  • Partner cross-functionally with research, product, infrastructure, and hardware teams to deliver scalable, production-ready AI performance improvements
  • Balance short-term delivery and long-term investments, ensuring the team’s work aligns with organizational goals, platform roadmaps, and Azure capex objectives
  • Build a strong engineering culture through coaching, feedback, hiring, and career development, enabling the team to operate with increasing autonomy and impact
  • Fulltime
Read More
Arrow Right

Principal Research Engineer - Agent 365

Copilot usage is growing rapidly across Microsoft 365 and custom agent experienc...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect and deliver AI systems across model development, data, infra, evaluation, and deployment spanning multiple product lines
  • Set technical direction for large programs
  • drive alignment across Research, Engineering, and Product
  • Integrate LLMs, multimodal models, multi-agent architectures, and RAG into Microsoft’s ecosystem
  • Establish standards for MLOps, governance, and Responsible AI, compliant with Microsoft principles and industry standards
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • Production Integration: Turn research prototypes into production-quality code optimized for scale, latency, and maintainability
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Tech Lead

We are looking for a highly skilled and innovative Principal Machine Learning Te...
Location
Location
United States
Salary
Salary:
312000.00 USD / Year
bluerivertechnology.com Logo
Blue River Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-12 years of experience developing high-performance ML systems
  • Experience shaping the technical vision for larger projects and providing guidance while contributing to core capabilities
  • Strong leadership and communication skills, and demonstrated experience in leading and mentoring teams
  • Hands-on work experience with GPU-based computing on large data sets
  • Extensive experience with deep learning frameworks (e.g., PyTorch)
  • Bachelor’s or Master’s Degree in Computer Science or a related field, graduate degree preferred
Job Responsibility
Job Responsibility
  • Lead the design, development, and deployment of machine learning and computer vision models that power real-time perception for autonomous agricultural robots
  • Own the full ML lifecycle—from dataset design and data collection through training, evaluation, model optimization, and deployment on embedded platforms
  • Collaborate cross-functionally with robotics engineers, software developers, and hardware teams to integrate perception systems into field-ready machines
  • Lead a team of CVML engineers, fostering technical growth, strong engineering practices, and a culture of experimentation and delivery
  • Define and execute the ML roadmap, balancing research exploration with production needs to meet performance, safety, and robustness targets in challenging environments
  • Prototype and validate new approaches quickly, using both real-world and synthetic data to improve detection, segmentation, and scene understanding under variable conditions (lighting, occlusion, dust, etc)
  • Continuously evaluate model performance in the field and refine approaches to maximize reliability and generalization across crop types, terrains, and climates
What we offer
What we offer
  • bonus and benefit programs
  • inclusive workplace
  • programs including recruiting, mentorship, career development, and learning & development
  • reasonable accommodation for individuals with disabilities
  • Fulltime
Read More
Arrow Right