This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Principal Software Engineer to join our team to drive all aspects of AI feature fundamentals for one of the biggest modern collaboration platforms in the world - Microsoft Teams. We help feature teams ship quality AI experiences out of the gate, track key performance and reliability metrics for critical high-volume scenarios, aid feature teams in improving the debuggability of AI scenarios, help create offline and online evals for all AI features by incorporating into release pipelines and drive culture of performance by promoting best practices and consulting.
Job Responsibility:
Define the vision, strategy, and roadmap for how to evaluate AI features for good fundamentals at scale across Teams
Lead end-to-end science and technical design for evaluating LLM-powered agents on real-time and batch workloads: designing evaluation frameworks, metrics, and pipelines that capture planning quality, tool use, retrieval, safety, and end-user outcomes, and partnering with engineering for robust, low-latency deployment
Establish rigorous evaluation and reliability practices for LLM/agent systems: from offline benchmarks and scenario-based evals to online experiments and production monitoring, defining guardrails and policies that balance quality, cost, and latency at scale
Collaborate with PM, Engineering, and UX to translate evaluation insights into customer-visible improvements, shaping product requirements, de-risking launches, and iterating quickly based on telemetry, user feedback, and real-world failure modes
Collaborate and mentor across product, research, and engineering teams, sharing best practices on eval design, LLM-as-judge usage, and Responsible AI, and providing code reviews and guidance that raise the bar for the AI features
Provide technical leadership and mentorship within the applied science and engineering community, fostering inclusive, responsible-AI practices in agent evaluation, and influencing roadmap, platform investments, and cross-team evaluation strategy across Fabric
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
2+ years of experience on engineering tooling or eval development
2+ years experience in working on services at scale
1+ years experience in driving fundamentals for AI features within web apps
Nice to have:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Prior experience in driving fundamentals for AI features within web apps
Understanding of building engineering tools on the server side for scale
Prior experience working with building AI workflows is a plus
Prior experience in working closely with AI feature teams and improving fundamentals like performance and reliability is a major plus
Experience solving challenging problems and cross team/organization collaboration skills
Proficiency with React
Curiosity to dive deep, continuously learn and experiment