This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
About the Role The AI Evaluations Team Lead II is responsible for leading the successful delivery of AI evaluation programs, ensuring high-quality execution, operational excellence, and alignment across multiple workstreams. This role oversees a team of AI Evaluations Specialists and SMEs, managing capacity, prioritisation, and execution to ensure evaluation outputs are delivered accurately, consistently, and at scale. As the primary operational lead for the program, the Team Lead partners closely with program teams to align on priorities, manage delivery commitments, escalate risks, and ensure evaluation insights translate into measurable improvements across AI-powered support experiences. Their impact extends beyond team performance to driving program effectiveness, scalability, and business insights.
Job Responsibility
Lead and develop a team of AI Evaluations Specialists and SMEs, fostering a high-performance culture focused on quality, accountability, and continuous improvement
Own delivery outcomes across evaluation programs, ensuring work is prioritized, executed, and completed against agreed timelines, quality standards, and stakeholder expectations
Manage workforce planning, capacity allocation, and workload prioritization across multiple evaluation workstreams and business priorities
Partner with Program teams to align on upcoming initiatives, sprint planning, evaluation requirements, and delivery commitments
Act as the primary escalation point for operational risks, delivery blockers, resource constraints, and cross-functional dependencies
Ensure evaluation findings, insights, and recommendations are effectively communicated to stakeholders and translated into actionable improvement opportunities
Drive operational governance across evaluation programs, including performance reviews, delivery tracking, quality oversight, and risk management
Monitor program health and performance metrics, identifying trends, gaps, and opportunities to improve efficiency, quality, and business impact
Coordinate bug identification, issue escalation, and follow-through with Product, Engineering, and Triage teams to support timely resolution and validation
Support the continuous improvement of evaluation methodologies, workflows, quality frameworks, and operational processes
Lead hiring, onboarding, coaching, and performance management activities to build team capability and support organizational growth
Represent the evaluations function in cross-functional forums, ensuring stakeholder alignment on priorities, risks, dependencies, and outcomes
Requirements
Demonstrated experience leading teams responsible for delivering operational, quality, analytics, support, risk, trust & safety, or similar programs in complex environments
Proven ability to manage capacity planning, workload prioritization, and resource allocation across multiple concurrent workstreams
Strong stakeholder management skills, with experience partnering effectively with Product, Engineering, Operations, Policy, Quality, or equivalent cross-functional teams
Experience driving operational delivery against defined goals, timelines, service levels, or business outcomes
Strong program and project management capabilities, including risk identification, dependency management, escalation handling, and execution tracking
Demonstrated ability to translate business priorities into clear operational plans and execution strategies
Strong analytical and problem-solving skills, with the ability to assess operational challenges, identify solutions, and make data-informed decisions
Experience managing performance, coaching team members, and developing talent within high-performing teams
Excellent communication skills, including the ability to influence stakeholders, align priorities, and communicate complex topics clearly across technical and non-technical audiences
Experience operating in fast-paced, ambiguous environments where priorities, products, and processes evolve rapidly
Nice to have
Experience working with AI-powered products, AI quality programs, customer support operations, Trust & Safety, or digital customer experience programs
Familiarity with AI evaluation methodologies, quality assurance frameworks, policy governance, or root cause analysis practices
Experience working with Jira, dashboards, workforce planning tools, and operational reporting systems
Understanding of common GenAI concepts and failure modes, including hallucinations, retrieval failures, grounding issues, and instruction-following errors
Experience supporting global or multi-regional programs involving multiple stakeholders and operational dependencies