This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Artificial Intelligence Frameworks team at Microsoft builds the software foundation that enables AI to run everywhere. We operate at the intersection of AI algorithmic innovation, purpose-built hardware, large-scale systems, and production software, working closely with internal hardware teams and external partners. Our team is composed of highly capable, deeply motivated engineers who value technical excellence, collaboration, and an inclusive culture. We own inference performance for OpenAI and other state-of-the-art large language models, working directly with OpenAI on models deployed through Azure OpenAI Service. These systems power some of the largest AI workloads on the planet—serving trillions of inferences per day across Microsoft products. As a Principal Software Engineer - System Optimization you will provide technical leadership across the AI inference software stack, with a strong focus on high-performance, large-scale serving systems. You will lead the benchmarking and optimization of cutting-edge LLMs across GPUs and Microsoft’s custom AI hardware, architect improvements to distributed serving pipelines, and drive deep performance investigations across complex, multi-layered systems. You will own critical performance and efficiency metrics, design and implement durable optimizations, and influence technical direction across teams. In close partnership with research, hardware, and production engineering groups, you will help deliver next-generation AI capabilities into Microsoft’s most widely used products—directly shaping Azure’s efficiency and the future of Microsoft’s AI infrastructure.
Job Responsibility:
Own and drive inference performance for OpenAI LLMs across NVIDIA, AMD, and Microsoft silicon, benchmarking, optimizing, and monitoring large-scale production workloads
Lead deep performance investigations across software, frameworks, and hardware
identify bottlenecks, design durable optimizations, and preserve system integrity
Build and evolve AI tooling that accelerates insight, simplifies pipelines, enables fast model and hardware bringup, and reduces operational complexity
Improve efficiency and reduce fleet footprint, influencing Azure AI CapEx goals and next-generation infrastructure through software-hardware codesign
Provide technical leadership and influence, partnering with research, hardware, and production teams
exercising exceptional judgment, autonomy, and execution while embodying Microsoft’s Culture and Values
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python
OR equivalent experience
PhD Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python
2+ year of experience with Large Language Models (LLMS) and large scale execution on AI workloads
4+ years of technical design, problem solving, and debugging skills