This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a highly skilled Senior ML / Data Pipeline Engineer who can translate complex machine learning and multimodal concepts into scalable, production-ready pipelines and workflows. This role focuses on building and optimising large-scale video and multimodal data systems, enabling high-throughput ingestion, processing, and model training across distributed cloud environments.
Job Responsibility:
Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure
Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata)
Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference
Develop high-throughput backend systems for video ingestion from desktop and mobile platforms
Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation
Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability
Translate ML and multimodal research into scalable, production-grade cloud architectures
Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers
Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows
Requirements:
5+ years of experience in data engineering, ML pipelines, or distributed systems
Strong experience building scalable data pipelines for large datasets (video/audio preferred)
Hands-on experience with cloud platforms (AWS, Azure, or GCP)
Experience working with GPU-based environments and distributed computing
Strong programming skills in Python, Scala, or similar languages
Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar)
Understanding of ML workflows, training pipelines, and inference systems
Experience designing fault-tolerant, high-availability systems
Strong knowledge of data storage systems (data lakes, object storage, distributed file systems)
Ability to handle high-throughput, large-scale data ingestion and processing
Nice to have:
Experience with multimodal AI (video, audio, NLP) systems
Familiarity with annotation tools and data labeling workflows
Experience with containerization and orchestration (Docker, Kubernetes)
Knowledge of cost optimization strategies for large-scale cloud workloads