This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Science is the team that is advancing our end-to-end autonomous driving research. The team’s mission is to accelerate our journey to AV2.0 and ensure the future success of Wayve by incubating and investing in new ideas that have the potential to become game-changing technological advances for the company. We’re building AI-native driving intelligence—from raw sensor input to on-road decisions—with a foundation model approach at the core. As part of this, we’re looking for an applied scientist who can help shape how we learn from data: not just more of it, but smarter use of it. This role sits at the intersection of data curation, model evaluation, interpretability, and scientific data optimization. You’ll work closely with our world-class behavior modeling and architecture teams to make the data layer a first-class citizen—turning fleet-scale data into structured, targeted learning signals. This is a unique chance to join a science-first team working at the cutting edge of end-to-end autonomy, where model behavior, interpretability, and generalization all hinge on the right data.
Job Responsibility:
Design principled data selection and sampling strategies to improve model learning efficiency and generalization
Identify and prioritize high-leverage data slices using signals such as loss, uncertainty, rarity, or alignment with specific driving behaviors
Design rigorous offline and on-road evaluation protocols to measure the impact of experimentation on performance
Investigate the interplay between data composition and model behavior, aiming to uncover how different training distributions shape learned representations and decision-making
Understanding and design of how the chosen data affects model behaviour in different scenarios
Requirements:
Proven experience in curating, selecting, and maintaining datasets that directly improve performance of machine learning models
Experience in ML engineering or applied research roles
Deep understanding of the full lifecycle of ML research and deployment
Deep knowledge of transformers for vision or language modelling
Strong Python and PyTorch engineering fundamentals, and experience building research-grade production tools
Enjoy thinking about data as a lever for model behavior, not just as input
Understand the value of “fast signals” and quick feedback loops in ML iteration
Nice to have:
Hands-on experience with large-scale data querying and wrangling, from writing SQL queries to designing custom selection criteria in Python/ML pipelines
Publications in top ML conferences (e.g., NeurIPS, ICLR, ICML) or contributions to open-source ML tooling
Experience with embeddings, representation learning, or token-level analysis
Experience in AVs, robotics, simulation, or other embodied AI domains