This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Search is being transformed by AI – join us to build AI-powered search experiences used by hundreds of millions of people worldwide. The Core Search & AI team in Microsoft AI (MAI) develops and operates the foundational systems behind Search, Grounding, and Agentic Search. You will work on large-scale systems that combine web-scale retrieval, advanced ranking models, and real-time inference to deliver relevant, trustworthy, and high-quality AI experiences. As a Senior Software Engineer in the Core Search & AI team, you will build and operate next-generation AI infrastructure for Search, Grounding, and Agentic Search. You will develop scalable systems for distributed data pipelines, LLM and SLM training (including SFT and RL), high-throughput inference, evaluation frameworks, and observability. You will collaborate with engineering, research, and product teams to deliver reliable, measurable, and high-performing AI solutions, and use data to guide technical decisions, investigate issues, and improve live-site quality. This opportunity will allow you to deepen your expertise in distributed AI infrastructure, gain experience with production-scale AI workloads, and expand your ownership of end-to-end service quality and operational excellence.
Job Responsibility:
Collaborates with appropriate stakeholders to define user requirements for a scenario and incorporates stakeholder insights into system design
Drives identification of dependencies and the development of design documents for a product or service with little oversight
Builds, reviews, and maintains high-quality, secure, and performant code, applying best practices in reliability, testability, and maintainability, and using telemetry and debugging tools to validate assumptions and prevent issues before production
Leverages subject-matter expertise of product features and partners with appropriate stakeholders to drive a workgroup's project plans, release plans, and work items
Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate
Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience
Experience building high throughput, low latency distributed systems and applications at scale
Experience designing and operating data processing, training workflows and inference systems for LLM/SLM using Azure Machine Learning
Experience optimizing GPU-based serving workloads for performance, efficiency and cost
Experience with machine learning fundamentals
Nice to have:
Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python