This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Generative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering these experiences at scale. Within our Microsoft wide Systems Innovation initiative, we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We are an Applied Research team driving mid- and long-term product innovations. We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical knowledge in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers. We are looking for a Senior Researcher - GPU Performance – Hardware/Software Codesign researcher to explore hardware/kernel-level optimizations to deliver significant efficiency gains for Large Language Models and Generative AI experiences.
Job Responsibility:
Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing
Research and develop novel optimization techniques for generation of GPU kernels
Profile and analyze kernel performance using advanced diagnostic tools
Generate automated solutions for kernel optimization and tuning
Collaborate with other researchers to improve model performance
Document optimization strategies and maintain performance benchmarks
Contribute to the development of internal GPU computing frameworks
Requirements:
Doctorate in relevant field OR equivalent experience
2+ years of experience in GPU architecture, memory hierarchies, parallel computing and algorithm optimization
2+ years of experience in GPU programming, including performance profiling and optimization tools
Reliable C++ programming skills
Ability to meet Microsoft, customer and/or government security screening requirements
Nice to have:
5+ years of experience in GPU programming and optimization, expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks
Experience with machine learning frameworks (PyTorch, TensorFlow)
Familiarity with compiler optimization techniques and background in auto-tuning and automated code generation
Publication record in relevant conferences or journals (MLSys, NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, ISCA, MICRO, ASPLOS, HPCA, SOSP, OSDI, NSDI, etc.)