This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our Inference team is responsible for building and maintaining the tools and systems that enable Microsoft AI researchers to run models easily and efficiently. Our work empowers researchers to run models in RL, synthetic data generation, evals, and more. We are joint stewards of one of the largest compute fleets in the world. The team is responsible for optimizing compute efficiency on our heterogeneous data centers as well as enabling cutting-edge research and production deployment. We are an applied research team that is embedded directly in Microsoft AI’s research org to work as closely as possible with researchers. We are vertically integrated, owning everything from kernels to architecture co-design to distributed systems to profiling and testing tools.
Job Responsibility
Work alongside researchers and engineers to implement frontier AI research ideas
Introduce new systems, tools, and techniques to improve model inference performance
Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues
Build tools and establish processes to enhance the team’s collective productivity
Find ways to overcome roadblocks and deliver your work to users quickly and iteratively
Enjoy working in a fast-paced, design-driven product development cycle
Embody our Culture and Values
Requirements
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Experience with generative AI
Experience with distributed computing
Python and Python ecosystem (eg. uv, pybind/nanobind, FastAPI) expertise
Experience with large scale production inference
Experience with GPU kernel programming
Experience benchmarking, profiling, and optimizing PyTorch generative AI models
Experience with open source inference frameworks like vLLM and SGLang
Working experience and conversant with the material in the JAX scaling book
Nice to have
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python