This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You will be part of a core reliability team responsible for ensuring safe, stable, and scalable Autonomous Vehicle (AV) software releases by turning failures into actionable insight at scale. The mission of this role is to improve learning velocity from failures , reduce reliability escapes, and increase confidence in production readiness through intelligent triage, deep debugging, and data-driven analysis.
Job Responsibility:
Own the reliability triage framework for the AV software stack, defining how failures from simulation, CI, and on-road validation are detected, categorized, and escalated into actionable insights
Perform deep debugging and root-cause analysis across autonomy software, ML pipelines, and system integrations, connecting failure symptoms to clear solution paths and corrective actions
Design and evolve automated triage mechanisms and reliability taxonomies, improving regression detection, flakiness identification, and signal quality as the system and models evolve
Build and govern reliability data pipelines, providing continuous visibility into stability trends, recurrence patterns, and systemic risks that impact release readiness
Translate reliability findings into decision-grade communication, influencing prioritization, technical debt reduction, and release confidence in partnership with engineering, safety, and systems stakeholders
Requirements:
Strong proficiency in Python and SQL for automation, analysis, and data pipelines
Proven experience with CI/CD systems (GitHub Actions, Jenkins, GitLab, or equivalent)
Hands-on experience implementing ETL/ELT pipelines for reliability, quality, or system health monitoring
Solid understanding of reliability engineering concepts, including regression tracking, flakiness detection, and failure classification
Strong analytical and cross-stack debugging skills in large-scale software systems
Experience integrating simulation, HIL, or system-level test signals into automated analysis workflows
Track record of effective cross-functional collaboration across engineering, QA, and platform teams
Ability to operate autonomously in high-ambiguity, safety-critical environments
Excellent communication skills for presenting data-driven reliability insights to engineering and technical leadership
Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, Robotics, or a related field—or equivalent experience
Nice to have:
Experience with reliability governance in ML-based or AV systems
Familiarity with reliability methodologies (FMEA, reliability growth analysis, MTBF trends)
Knowledge of AV / ADAS software architectures and simulation-to-road validation loops
Experience building reliability or analytics pipelines in cloud environments (AWS, GCP, Azure)
Familiarity with observability and visualization tools (Grafana, Superset, Power BI, etc.)
Experience using Jira, GitHub Projects, or similar tools for reliability tracking and triage workflows