This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a rare opportunity to lead foundational work at the intersection of large-scale pretraining and embodied AI. You’ll shape how Wayve curates and experiments with data to pretrain the next generation of multi-modal foundation models for autonomy. Working at the frontier of a still-undefined field, you’ll collaborate with research and engineering teams to define what “good data” looks like for embodied systems — and work with engineering teams to build the infrastructure to scale it. Your work will directly influence the performance, generalization, and capabilities of Wayve’s AI stack.
Job Responsibility:
Lead data curation, enrichment, and filtering efforts for large-scale pretraining of embodied models
Build and manage distributed data processing and ingestion pipelines across modalities
Partner with research teams to run data-centric experiments and influence model training strategy
Identify, integrate, and leverage third-party datasets to enhance pretraining and evaluation
Manage and mentor a team of engineers and data scientists to deliver scientific and technical impact
Requirements:
Leadership in data-centric AI: Experience leading research or engineering teams focused on dataset curation, filtering, or enrichment at scale, particularly for large-scale model pretraining.
Contributions to data benchmarks or tools: Involvement in projects like DataComp, LAION, DINO, MOLMO, or equivalent initiatives that define or evaluate pretraining dataset quality.
Deep understanding of distributed data processing: Strong working knowledge of frameworks such as Ray, Spark, Dask, or equivalent, and designing scalable, fault-tolerant data pipelines.
Hands-on deep learning expertise: Strong proficiency in PyTorch and a solid grasp of how data quality, distribution, and structure impact training dynamics and model generalisation.
Experimental mindset: Demonstrated ability to run and interpret data-centric experiments (e.g., small-scale trials, ablations) to inform large-scale model training.
Collaboration with research: Experience working closely with ML researchers and contributing to experimental design, pretraining strategies, or evaluation design.
Minimum 5 years of relevant industry experience: Including at least several years in data-heavy, model-driven environments involving deep learning at scale.
Nice to have:
Track record of research impact: Publications in top-tier conferences such as NeurIPS, ICML, CVPR, ICCV, CoRL, or equivalent, especially in data-centric learning, representation learning, or self-supervised learning.
People management experience: Track record managing ~5 direct reports in a research or research-leaning engineering environment
skilled in team development, prioritization, and technical alignment.
Experience with multi-modal or embodied systems: Familiarity with datasets involving video, language, lidar, radar and generally sensor fusion or embodied perception and control.
Tooling and infrastructure know-how: Familiarity with modern data versioning, annotation, and orchestration tools (e.g., Weights & Biases, ClearML, Labelbox, Airflow, Metaflow, etc.).
Autonomous systems exposure: While prior AV or robotics experience is not required, a demonstrated interest in embodied intelligence or real-world agent learning is a plus.
Systems thinking and data-product intuition: Ability to reason about upstream data decisions and their downstream effects on models, infrastructure, and product goals.
What we offer:
Attractive compensation with salary and equity
Immersion in a team of world-class researchers, engineers and entrepreneurs
A unique position to shape the future of autonomy and tackle the biggest challenge of our time
Bespoke learning and development opportunities
Relocation support with visa sponsorship
Flexible working hours - we trust you to do your job well, at times that suit you and your time
Benefits such as an onsite chef, workplace nursery scheme, private health insurance, therapy, daily yoga, onsite bar, large social budgets, unlimited L&D requests, enhanced parental leave, and more!