This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The AI Observability Program Leader will own the end-to-end strategy, design, and implementation of the frameworks used to monitor, understand, and improve Uber’s GenAI-powered agentic systems. This role sits within the Global Digital Experience team, the operational arm of Uber’s customer support tech organization, and is a critical driver of accuracy, safety, and reliability across Uber’s next-generation AI solutions. This leader will bridge the gap between raw AI logs and actionable product insights.
Job Responsibility:
Architect Observability Frameworks: Own the strategy for understanding AI agentic reasoning, enabling deep analysis of step-by-step agent decision-making
Drive Autoeval Strategy: Design and roll out automated evaluation systems (LLM-as-a-judge) to provide a scalable, high-confidence "pulse" on AI performance across conversational and voice interfaces
Define Micrometrics: Develop granular signals within agentic activity—identifying latent failures, reasoning loops, or tool-calling inefficiencies—to drive product improvements
Lead Pre-Launch Simulation: Partner with Product & Engineering to build and maintain simulation environments that test AI agents against edge cases before deployment, and democratise these tools with Operations teams
Cross-Functional Technical Partnership: Act as the primary liaison between Product, Engineering, and Data Science to ensure observability tooling is integrated into the development lifecycle and directly informs release "Go/No-Go" decisions
Insight Synthesis: Package complex technical observability data into clear, actionable narratives for leadership, highlighting specific failure patterns and opportunities for CX improvement
Operational Excellence: Establish the standards and tooling for how AI performance is reported globally, ensuring consistency across different regions and support modalities.
Requirements:
5+ years of experience in Technical Program Management, Product Operations, AI Quality, or Observability
Bachelor’s degree in Engineering, Computer Science, Data Science, or a related technical field.
Nice to have:
AI Literacy: Deep understanding of GenAI systems, including LLM orchestration, agentic workflows, and the nuances of reasoning chains (e.g., Chain of Thought)
Systems Thinking: Proven experience designing technical frameworks or evaluation pipelines (e.g., autoevals, RAG evaluation, or model benchmarking)
Analytical Rigor: Ability to define and track complex technical metrics (micrometrics) and correlate them with high-level business KPIs
Influence without Authority: Demonstrated ability to drive complex initiatives in an IC capacity by building strong partnerships with Engineering and Product teams
Advanced AI Expertise: Experience with "LLM-as-a-judge" frameworks, prompt engineering for evaluations, and fine-tuning feedback loops
Simulation & Testing: Background in building simulators, "digital twins," or robust A/B testing frameworks for conversational AI or autonomous agents
Tooling Proficiency: Familiarity with AI observability tools
Problem Solving: Exceptional ability to turn "noisy" AI logs into structured failure pattern analysis
Communication: Strong ability to translate highly technical agent behaviors into business-relevant insights for non-technical stakeholders
Domain Knowledge: Experience in Customer Support technology, Voice UX, or high-volume automated workflows.
What we offer:
Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
All full-time employees are eligible to participate in a 401(k) plan