This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our DD Labs team builds real-time autonomous delivery systems. The Planning & Decision-Making group is investing heavily in deep reinforcement learning to move beyond classical planning, learning policies that generalize across novel driving scenarios, handle long-tail edge cases, and improve continuously from large-scale fleet data. Our models jointly handle prediction and planning in a single unified architecture. Our stack is pure JAX end-to-end: the same code you train with is the code that runs on the robot. No C++ rewrites, no TensorRT export. A new policy goes from training to on-vehicle deployment in minutes.
Job Responsibility:
Formulate complex driving tasks as RL problems with well-shaped reward functions and expressive state/action representations
Design and train model-based deep RL agents using GPU-accelerated simulation at massive scale, including improving the simulator itself
Build and maintain distributed training infrastructure in JAX across large compute clusters
Build agentic optimization systems that automatically improve code, run experiments, analyze metrics, and iterate on RL policies with minimal human intervention
Requirements:
BS/MS/PhD in CS, EE, Robotics, or a related field, with a strong foundation in reinforcement learning and deep learning
Hands-on experience training RL agents at scale, ideally in robotics, autonomous driving, or other real-time decision-making domains
Proficiency in JAX or a similar functional ML framework
comfort with JIT compilation, vectorized environments, and GPU-accelerated simulation
Deep grasp of core RL concepts: policy gradients, value functions, exploration-exploitation, model-based RL, reward shaping, and sim-to-real transfer
Data-driven mindset: comfortable building experiment pipelines, analyzing training runs, and letting metrics guide architectural decisions
Nice to have:
Publications at top venues (NeurIPS, ICML, ICLR, CoRL, RSS, ICRA) on RL or learned planning
Experience building or working with GPU-accelerated simulators for RL training
Track record of shipping a learned component in a production robotics or autonomous vehicle stack