This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Joining the CoreAI organization at Microsoft means becoming part of the team that builds the end-to-end AI stack powering Azure’s innovation. As a Principal Applied Scientist on the GenAI Infra and Solutions team within CoreAI, you will help develop the AI infrastructure that accelerates the creation of agentic AI systems across Microsoft. This role is dedicated to advancing scientific methods and scalable infrastructure for training agentic models to achieve frontier-level performance. You will contribute to LLMs, SLMs, and agentic models using both proprietary and open-source frameworks, all aimed at delivering reliable, enterprise-grade agentic workflows.
Job Responsibility:
Write efficient, production-quality code and debug complex training jobs
Build and maintain training pipelines and architectures across both proprietary and open-source frameworks
Collaborate effectively within interdisciplinary teams and communicate complex research concepts in clear, actionable ways
Document findings and insights to enable effective cross-team collaboration and knowledge sharing
Drive innovations that power flagship Microsoft products and services
Requirements:
Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience
OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check
Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 5+ year(s) related experience OR Master's Degree Computer Science, Electrical or Computer Engineering, or related field AND 7+ years related experience
5+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
3+ years experience of large-scale model training for LLMs, SLMs, and agentic models
3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
Experience with agent training frameworks
Leadership and influence with the ability to lead projects and influence others across teams and disciplines
Hands-on experience with large-scale distributed training and/or serving with demonstrated ability to dive deep into complex systems, troubleshoot unconventional issues, and craft innovative solutions under real-world constraints
Extensive experience with large-scale training, model inference, reinforcement learning, and reasoning models
Demonstrated ability to work in cross-functional teams and collaborate effectively with researchers, product managers, and other engineers to deliver complex ML solutions
Startup-style mindset: agile, solution-oriented, and self-driven