This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As an Applied ML Validation Manager on the Software Validation team within the AV organization, you will lead a team focused on building and operating behavior critics and human benchmarking capabilities for ML-driven autonomy systems. Your team will turn subjective human expectations about safe, comfortable, and intuitive driving into rigorous, scalable evaluation frameworks that directly inform model development and release decisions. You will partner closely with autonomy, simulation, safety, and product teams to define how behavior is judged against human drivers and integrate behavior critic signals into validation pipelines, continuous release, and long-term performance monitoring.
Job Responsibility:
Lead and grow an Applied ML validation team focused on behavior evaluation and human benchmarking for autonomy ML systems
Define the strategy and roadmap for evaluating ML behavior against human-like driving expectations across simulation, replay, and on-road environments
Design, implement, and operate behavior critic frameworks that assess model actions and trajectories, turning qualitative human feedback into structured labels, metrics, and scorecards
Develop and scale human benchmarking programs, including rater guidelines, calibration, and quality controls, to compare ML system performance against expert and typical human drivers
Partner closely with autonomy, simulation, safety, and product teams to integrate behavior critic and human benchmarking outputs into training, offline validation, release gating, and reporting
Requirements:
8+ years of experience and MS/PhD in Computer Science, Machine Learning, Robotics, Software Engineering, Data Science, or a related field
2+ years of people management experience leading engineering, validation, or applied ML teams
Strong programming and data skills in Python and common analysis/ML tooling (e.g., PyTorch)
Demonstrated experience designing and operating evaluation/validation pipelines for complex ML systems
Proven ability to define, implement, and track metrics that capture system quality, reliability, safety, or user experience
Nice to have:
Experience with autonomous driving, robotics, or other safety-critical domains, especially in validation, safety, or systems engineering roles
Demonstrated background with simulation-based validation, including VLM critics, human benchmarking, and scalable evaluation for ML or autonomy systems
Hands-on experience with agentic workflows used to accelerate analyses, automate documentation, or orchestrate complex data and metric pipelines
Track record of building or scaling technical teams and tooling in fast-evolving domains, especially focused on evaluation, automation, and ML observability