This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
OneDrive and SharePoint (ODSP) power the world’s most impactful intranets, collaboration experiences, business workflows, and content ecosystems. As AI becomes deeply embedded across these surfaces—from search, Q&A, and summarization to powerful synchronous and autonomous agents—our ability to measure quality, reliability, and safety at scale becomes a strategic advantage. Evaluation, both offline and online, is now the way we build and ship AI. As a Senior Software Engineer for the Eval Tooling team, you will help shape and deliver the systems for validating, measuring, and improving AI quality across ODSP Experiences. Your mission starts with elevating developer productivity and enabling fast, confident iteration across a broad and rapidly expanding set of AI workloads: RAG, agents, content generation, semantic search, content understanding, and ODSP’s emerging agents that orchestrate multi‑step actions across files, lists, and sites. You will also partner with Applied Science and Customer Success teams to scale customer data sets. You will partner closely with evaluation platform and tooling efforts across M365 to both leverage shared capabilities and contribute back to the broader ecosystem—we are One Microsoft. This is a hands‑on technical and strategic role where you will define how ODSP Experiences builds trust in AI and ships AI safely, quickly, and confidently.
Job Responsibility:
Build and evolve evaluation frameworks and tools for AI scenarios across ODSP Experiences
Design and implement tooling and systems for offline and online evaluation, including scenario‑based frameworks, dataset pipelines, LLM auto‑raters, metrics, and dashboards
Collaborate with ODSP Core Eval Platform and M365‑wide tooling teams to leverage shared infrastructure
Enable model agility and safe shipping through automated quality gates, regression detection, telemetry instrumentation, and reliable online metrics
Collaborate deeply with AI feature teams across ODSP Experiences to embed evaluation into development workflows
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years of experience in full stack software engineering (React, TypeScript, JavaScript, C#, REST, Azure)
2+ years of experience on engineering tooling or evals would be a plus
Bachelor’s degree in Computer Science or related field
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Nice to have:
Solid architectural skills in building scalable, distributed systems and cloud services
Deep understanding of AI/ML concepts and practical experience applying AI to real-world product scenarios highly preferred
Track record of rapid iteration, experimentation, and continuous learning
Excellent communication, collaboration, and cross-team partnership skills