CrawlJobs Logo

Engineering Manager - Datasets Enrichment

wayve.ai Logo

Wayve

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are hiring an Engineering Manager (M4) to lead the team responsible for both semantic enrichment pipelines and the final silver and gold layers of the Wayve Corpus. This team transforms multimodal driving data and perception model outputs into reliable, high quality data products used across autonomy, evaluation, simulation, and research. The role combines high scale ML in the loop enrichment pipelines such as semantic segmentation, cuboid annotation, embeddings, and BC and ODD signals with production grade data engineering ownership including schema governance, table interfaces, quality gates, lineage, and SLO based operations. You will lead a team of up to 10 engineers across ML engineering, perception, and data engineering. You will own a multi quarter roadmap that scales enrichment throughput, improves data quality, and hardens corpus tables used across Wayve. You will partner with application, model training, and evaluation, teams to ensure alignment on requirements and interfaces. This role requires a leader comfortable at the intersection of ML systems and data engineering who can provide clear direction, reliable delivery, and strong people leadership during a period of significant technical and organizational scaling.

Job Responsibility:

  • Lead, coach, and grow a team of up to 10 engineers across ML engineering, perception, and data engineering
  • Define team structure, roles, leveling, hiring needs, and long term growth plans
  • Own and scale semantic enrichment pipelines including semantic segmentation, cuboids, embeddings, scenario, and ODD classification
  • Integrate ML assisted labeling, validation, and automated quality checks into enrichment workflows
  • Own the silver and gold layers of the Wayve Corpus including schema evolution, versioning, documentation, lineage, observability, and SLO backed operations
  • Establish data quality gates and quality metrics for enriched and corpus level data
  • Deliver a multi quarter roadmap spanning enrichment and corpus systems with predictable execution
  • Lead architecture decisions to improve efficiency, maintainability, and reliability
  • Partner with Data Platform on distributed compute systems including Spark, Databricks, Ray, and Flyte
  • Align with autonomy, evaluation, and research teams on corpus requirements, interfaces, and lifecycle

Requirements:

  • 2+ years managing engineering teams in ML systems, perception, or large scale data infrastructure
  • Experience delivering ML in production or perception pipelines, or strong experience in production data engineering systems. Ideally exposure to both
  • Proven ownership of production data tables such as Delta Lake, Spark, Hive, or BigQuery including schema evolution and multi team consumers
  • Experience with distributed compute systems such as Spark, Databricks, Ray, or Flyte
  • Experience building observable, high throughput pipelines
  • Ability to lead multi quarter delivery, manage dependencies, and align with multiple stakeholders
  • Strong communication and cross functional collaboration skills

Nice to have:

  • Experience with multimodal perception data such as images, video, and LiDAR
  • Experience with annotation workflows or ML assisted labeling systems
  • Experience with embeddings, feature stores, or ML data layers
  • Familiarity with data quality frameworks and operational analytics
  • Experience in autonomous vehicles, robotics, or large scale computer vision systems

Additional Information:

Job Posted:
January 01, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Engineering Manager - Datasets Enrichment

Senior Technical Program Manager

The Senior Technical Program Manager will drive large scale programs across miss...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree AND 4+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience
  • 2+ years of experience managing cross-functional and/or cross-team projects
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own delivery of mission‑critical 24x7 online services that power core location and geospatial capabilities
  • Lead offline data processing efforts initiatives that enrich geospatial datasets and strengthen ML model quality
  • Align with engineering and service partners on inputs, outputs, SLAs, and integration contracts
  • Manage cross‑team dependencies, sequencing, risk mitigation, and operational readiness
  • Drive project and release management across tightly coupled online and offline components
  • Bring clarity, resolve conflicts, and ensure alignment across engineering, product, SRE, and data teams
  • Fulltime
Read More
Arrow Right

Tech Lead - Pretraining Team, Wayve Foundation Model

This is a rare opportunity to lead foundational work at the intersection of larg...
Location
Location
United States , Sunnyvale
Salary
Salary:
Not provided
wayve.ai Logo
Wayve
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Leadership in data-centric AI: Experience leading research or engineering teams focused on dataset curation, filtering, or enrichment at scale, particularly for large-scale model pretraining.
  • Contributions to data benchmarks or tools: Involvement in projects like DataComp, LAION, DINO, MOLMO, or equivalent initiatives that define or evaluate pretraining dataset quality.
  • Deep understanding of distributed data processing: Strong working knowledge of frameworks such as Ray, Spark, Dask, or equivalent, and designing scalable, fault-tolerant data pipelines.
  • Hands-on deep learning expertise: Strong proficiency in PyTorch and a solid grasp of how data quality, distribution, and structure impact training dynamics and model generalisation.
  • Experimental mindset: Demonstrated ability to run and interpret data-centric experiments (e.g., small-scale trials, ablations) to inform large-scale model training.
  • Collaboration with research: Experience working closely with ML researchers and contributing to experimental design, pretraining strategies, or evaluation design.
  • Minimum 5 years of relevant industry experience: Including at least several years in data-heavy, model-driven environments involving deep learning at scale.
Job Responsibility
Job Responsibility
  • Lead data curation, enrichment, and filtering efforts for large-scale pretraining of embodied models
  • Build and manage distributed data processing and ingestion pipelines across modalities
  • Partner with research teams to run data-centric experiments and influence model training strategy
  • Identify, integrate, and leverage third-party datasets to enhance pretraining and evaluation
  • Manage and mentor a team of engineers and data scientists to deliver scientific and technical impact
What we offer
What we offer
  • Attractive compensation with salary and equity
  • Immersion in a team of world-class researchers, engineers and entrepreneurs
  • A unique position to shape the future of autonomy and tackle the biggest challenge of our time
  • Bespoke learning and development opportunities
  • Relocation support with visa sponsorship
  • Flexible working hours - we trust you to do your job well, at times that suit you and your time
  • Benefits such as an onsite chef, workplace nursery scheme, private health insurance, therapy, daily yoga, onsite bar, large social budgets, unlimited L&D requests, enhanced parental leave, and more!
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer, ML Data Systems

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet t...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
  • Familiarity with machine learning workflows — from training data preparation to evaluation
  • Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
  • Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Job Responsibility
Job Responsibility
  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
  • Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
  • Generous parental leave
  • An exceptional team that trusts you and gives you the freedom to do your best
  • The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
  • Opportunities to connect through affinity, ally, and social groups
  • 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
  • Fulltime
Read More
Arrow Right

Senior Technical Product Manager, Identity

tvScientific is looking for a Senior Product Manager to lead our identity graph ...
Location
Location
United States
Salary
Salary:
165000.00 - 180000.00 USD / Year
tvscientific.com Logo
tvScientific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in product management, solutions engineering, or technical partnerships focused on data-driven products, audience targeting, or martech
  • Deep experience working with Data Science and Engineering teams to build and launch data infrastructure
  • Expertise in identity resolution, segmentation, onboarding, and activation workflows
  • Proven ability to source, evaluate, and operationalize third-party data partnerships
  • Strong analytics mindset and comfort working with large datasets
  • Technical fluency across APIs, data pipelines, audience graphs, and privacy frameworks
  • Excellent communication skills that translate technical ideas into real business value
  • A solid foundation in AdTech
Job Responsibility
Job Responsibility
  • Own the identity product strategy and lead the vision for tvScientific’s identity graph enabling persistent, multi-device recognition across CTV and digital
  • Partner with Data Engineering and Data Science to architect and optimize graph-based data models representing user identity, household relationships, and device linkages
  • Design APIs and services for real-time identity resolution, enrichment, and activation in programmatic ad workflows
  • Embed privacy-centric solutions like UID 2.0, RampID, and emerging standards into the graph infrastructure to ensure compliance and scalability
  • Source, evaluate, and onboard third-party identity and behavioral data providers to improve graph completeness and targeting precision
  • Lead technical integration and operationalization of identity and graph enrichment partners, managing ingestion, data mapping, and deployment
  • Collaborate with Legal, Security, and Data teams to ensure compliance with CCPA, GDPR, and global privacy regulations
  • Maintain a strategic view of the identity and data ecosystem to recommend build versus partner strategies that maximize value
  • Write detailed product requirements, data specifications, and user stories to guide Engineering and Infrastructure teams on performant graph storage, traversal, and querying
  • Define and monitor key metrics such as match rates, accuracy, persistence, identity coverage, and campaign performance impact
What we offer
What we offer
  • Full health, dental, and vision insurance - up to 95% funded by the company for employees
  • Employee stock option program
  • Company-sponsored retirement plan with a matching contribution program
  • 12 annual paid holidays (including 2 flexible days)
  • Generous PTO policy
  • A remote-first environment that allows employees flexibility to work from most places in the US
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

Security represents the most critical priorities for our customers in a world aw...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 5+ years of experience working with distributed data processing frameworks such as Apache Spark, Databricks, or similar technologies to transform and manage large-scale datasets
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft background and Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Design, develop, test, deploy, and maintain core services and platform components that enable Data Security Posture Management (DSPM) scenarios across Microsoft Purview
  • Implement scalable, reliable, and secure backend systems, including data ingestion, processing, and enrichment pipelines operating at cloud scale
  • Contribute to technical design and architecture discussions, applying engineering best practices to ensure performance, resiliency, privacy, and compliance requirements are met
  • Collaborate with product managers, partner engineering teams, and dependent services to translate customer and business requirements into high‑quality technical solutions
  • Drive operational excellence through ownership of code quality, monitoring, on‑call participation, and continuous improvement, while mentoring and supporting other engineers on the team
  • Fulltime
Read More
Arrow Right

Applied Data Scientist II

Our team builds the intelligence layer that powers Microsoft’s next‑generation s...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in CS, Data Science, EE, Mathematics or related field AND 4+ years of hands-on DS/ML experience OR Master’s degree AND 1+ years experience
  • Strong proficiency in Python, ML frameworks (PyTorch/TensorFlow), and data processing libraries
  • Experience with ML techniques such as: gradient-boosted models, supervised/unsupervised learning, embeddings, clustering, anomaly detection
  • Experience querying & analyzing large datasets using Kusto, SQL, Spark, or equivalent data engines
  • Strong fundamentals in probability, statistics, and algorithmic thinking
  • Ability to write clean, reliable research code and communicate findings clearly
Job Responsibility
Job Responsibility
  • Machine Learning & Modeling: Develop supervised and unsupervised ML models for anomaly detection, fraud/threat pattern discovery, alert classification, confidence scoring, and signal fidelity improvements
  • Build and maintain feature pipelines over multi-modal security telemetry (identity, endpoint, network, cloud)
  • Apply graph-focused ML techniques (graph embeddings, GNNs, similarity scoring, relationship modeling)
  • Graph Reasoning & Analytics: Contribute to graph construction logic, schema evolution, and ontology-driven enrichment for Verdict Net, Verdict Propagation, Campaign Graphs, and Vortex insights
  • Implement graph traversal, multi-hop reasoning, and cluster detection algorithms to surface hidden attack patterns
  • Participate in performance optimization and health management of large-scale threat graphs
  • Data Engineering & Experimentation: Analyze large, noisy, high-dimensional security datasets using ADX/Kusto, Spark, and distributed compute platforms
  • Run A/B experiments, offline evaluations, and benchmark models to continually improve detection quality
  • Build high-quality research code and prototypes that transition smoothly to engineering teams for productionization
  • Cross-Functional Impact: Collaborate with detection engineering, threat research, product teams and red teams to integrate ML outcomes into real-world protection experiences
  • Fulltime
Read More
Arrow Right

Senior Java Developer

This role is part of an initiative to build a real-time data pipeline for proces...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of professional experience in Java application development
  • Expertise in Spring Boot and microservices architecture
  • Strong experience with Elasticsearch (indexing, queries, aggregations)
  • Hands-on experience with Apache Kafka (publish/subscribe, streams, scalability)
  • Proficiency in Oracle Database (SQL, PL/SQL, optimization)
  • Extensive experience with Apache Spark for batch processing, including Spark SQL
  • Experience with big data ecosystems and cloud-based data platforms (e.g., Hadoop, Data Lakes, Snowflake, Databricks) is highly desirable
  • Experience with caching frameworks (Redis or equivalent)
  • Ability to effectively leverage Gen AI coding assistants for improved development productivity
  • Knowledge of real-time data processing and large-scale batch processing and data pipeline design
Job Responsibility
Job Responsibility
  • Design, develop, and maintain high-performance Java applications for processing front-office chat data in real time
  • Design, develop, and optimize batch processing jobs using Apache Spark for large-scale data transformation and analysis
  • Implement config-driven, Spring-based components for data ingestion, transformation, and enrichment
  • Develop and optimize REST APIs for integration with NLP engines, internal systems, and external applications
  • Integrate and manage Apache Kafka for high-throughput, low-latency event streaming
  • Utilize Elasticsearch for efficient indexing and querying of large chat-derived datasets
  • Write optimized Oracle SQL/PLSQL for configuration management
  • Leverage continuous integration pipelines to streamline development and deployment
  • Use Gen AI development tools (Copilot and DevinAI) to write, review, and optimize code efficiently
  • Collaborate with business analysts, product team and developers to ensure system reliability, scalability, and alignment with requirements
  • Fulltime
Read More
Arrow Right

Marketing Technology Manager

Meta empowers millions of businesses to connect with customers and drive real bu...
Location
Location
United States , Menlo Park
Salary
Salary:
124000.00 - 178000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in marketing technology, marketing operations, or a related technical marketing discipline
  • Bachelor's degree in Marketing, Business, Information Systems, or a related field
  • Experience developing agentic workflows, AI-powered automation, or marketing AI agents in a production environment
  • Knowledge of LLMs, AI model evaluation, and building data foundations for AI-powered marketing tools or agentic capabilities
  • Experience developing and executing data strategies, including data governance, enrichment, sharing, and compliance frameworks
  • Experience managing marketing technology platforms and ensuring platform reliability, performance, and scalability
  • Analytical and data-driven decision-making experience, translating complex data and AI challenges into actionable strategies
  • Experience building cross-functional relationships and managing stakeholders across Marketing, Engineering, Product, and Sales teams
  • Communication experience and demonstrated experience presenting technical concepts to both technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Design and implement AI evaluation frameworks, including model performance benchmarking, prompt evaluation, and quality assurance processes to ensure AI agents and LLM-driven outputs meet production-quality standards
  • Drive the integration of LLMs and AI agents into marketing technology workflows, partnering with engineering and data science teams to move agentic capabilities from prototype to production-ready infrastructure
  • Partner with engineering teams to design and build data foundations required to power agentic marketing capabilities, including high-quality data pipelines, knowledge bases, and golden datasets that AI agents depend on for reliable execution
  • Ensure platform reliability, stability, and performance, creating a stable foundation on which agentic systems can operate at scale
  • Support change management efforts to support the organization's transformation to agentic marketing, engaging stakeholders across teams to build understanding, drive adoption, and ensure smooth transitions as AI agents take on greater operational responsibility
  • Partner cross-functionally with Marketing, Engineering, Data Science, Product, and Sales teams to scale platforms, integrate cross-org data sources, and build the shared infrastructure needed to enable a self-service, agent-led marketing future
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right