This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join Microsoft’s AI Core team building high performance runtime systems that serve OpenAI chat and multimodal AI models at scale. This role focuses on systems level optimization for large scale LLM inferencing with deep C++ expertise.
Job Responsibility:
Design and implement high performance microservices and runtime components in C++
Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale
Debug and resolve complex production issues related to performance, scaling, and service reliability
Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads
Drive systems level innovations for realtime and batch inferencing efficiency
Participate in code reviews and provide technical mentorship to senior and peer engineers
Requirements:
6+ years of experience in systems programming with strong expertise in C++
Proven experience building, deploying, and operating scalable cloud services
Strong debugging skills and experience using performance profiling and diagnostic tools
Hands-on experience with distributed systems, Kubernetes, and containerized workloads
Experience with largescale LLM inferencing infrastructure, including CUDA
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Nice to have:
Experience optimizing AI model inference distributed GPU/CPU stack
Exposure to Azure OpenAI or similar largescale AI serving platforms
Understanding of service reliability engineering (SRE) principles and operational excellence