This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Principal Product Manager, you will lead the human-in-the-loop evaluation program that drives the quality and experience of M365 Copilot Chat for Business—ensuring M365 Copilot is the best AI assistant for work, grounded in M365 data. You'll build and scale the people, processes, and evaluation pipelines that turn expert human judgment into reliable quality signals that directly shape the product.
Job Responsibility:
Define what great looks like for human data—bringing your own knowledge and perspective on the best approaches to collecting, interpreting, and applying human feedback across product decisions
Lead the human-in-the-loop evaluation program—designing evaluation frameworks, scorecards, and quality benchmarks that measure and continuously raise the bar on M365 Copilot's response quality, helpfulness, and user experience
Build and manage evaluation workforce operations, including vendor partnerships, annotator onboarding, qualification, training, and continuous performance management
Partner with data scientists and engineers to scope evaluation needs, define task instructions, calibrate annotators, and ensure evaluation data is reliable and repeatable
Requirements:
Bachelor's Degree AND 8+ years experience in product/service/program management or software development OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
3+ years of human data or human-in-the-loop evaluation experience
Experience at an AI research organization or AI data services provider
4+ years experience taking a product, feature, or experience to market (e.g., design, addressing product market fit, and launch, internal tool/framework)
6+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn)
Experience building and managing workforce programs, including vendor partnerships and annotation operations at scale
Proficiency in evaluation pipeline design, annotation frameworks, and quality governance
Track record of translating human feedback and evaluation signals into measurable product impact