This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As an ML Infra Engineer (Data Systems), you’ll build and operate the data infrastructure that powers large-scale robot learning. Your systems will sit directly between raw data sources and training/evaluation, enabling us to move faster while maintaining performance, correctness, and reliability at scale. This is a systems role at the intersection of distributed systems, storage, and machine learning infrastructure.
Job Responsibility:
Data Ingestion & Processing: Design and build high-throughput pipelines that validate, transform, and featurize raw multimodal data
Batch & Streaming Systems: Operate large-scale batch and streaming workflows over massive datasets
choose file formats with performance and scalability in mind
Data Lifecycle Management: Build systems for backfills, dataset rebuilds, garbage collection, and large-scale transformations
Training-Time Performance: Optimize dataloaders, sharding, prefetching, caching, and throughput to reduce time from data arrival → model training
Metadata & Indexing: Build scalable metadata stores for datasets, annotations, and training artifacts
Data Movement: Move hundreds of terabytes to petabytes efficiently across clusters and environments
Operational Correctness: Implement observability, validation, and guardrails to prevent silent data regressions
Cross-Functional Collaboration: Work closely with cross-functional teams of researchers, engineers and roboticists to translate evolving data needs into robust systems
Requirements:
Strong software engineering fundamentals
Experience building distributed systems or large-scale data pipelines
Comfort reasoning about performance, memory, I/O, and storage efficiency
Familiarity with batch and/or streaming processing systems
Experience with object storage systems and data format tradeoffs
Ownership mindset: design, build, operate, and iterate on systems end-to-end
Enjoy working closely with researchers and unblocking fast-moving projects
Nice to have:
Experience with large ML training pipelines or dataloading systems
Knowledge of columnar or custom data formats
Experience with systems like ClickHouse, Ray, Flink, Spark, or similar