This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re seeking a Data Engineer to design and manage the data pipelines, platforms, and tools that power intelligent AI applications. You will work closely with data scientists, AI software engineers, and product teams to ensure our ML and LLM workloads are backed by scalable, secure, and high-performance data infrastructure. This is a hands-on, high-impact role where reliability and flexibility of data architecture is paramount.
Job Responsibility:
Design, build, and maintain data pipelines for structured, unstructured, and semi-structured data sources
Develop and optimize data models, ETL processes, and batch/streaming data infrastructure
Partner with data scientists to support training, evaluation, and deployment of ML and LLM models
Implement scalable architectures for embeddings, vector databases, and retrieval pipelines
Enable real-time and offline analytics workflows using best-in-class data engineering practices
Ensure data quality, lineage, observability, and governance across all data products
Deploy secure, cloud-native data infrastructure (AWS, Azure, GCP) for high-volume AI workloads
Contribute to the design of feature stores and MLOps platforms for continuous learning and model updates
Collaborate on Responsible AI workflows to ensure compliant data usage and access controls
Continuously evaluate new tools and technologies for improving performance, reliability, and agility
Requirements:
5+ years of experience as a Data Engineer building large-scale, production-grade data pipelines
Strong command of SQL, Python, and distributed data processing frameworks (Spark, Flink, Beam)
Hands-on experience with ETL/ELT tools and orchestration systems (Airflow, dbt, Prefect, Dagster)
Familiarity with cloud-native data platforms (Snowflake, BigQuery, Redshift, Databricks)
Experience supporting ML/AI workloads and collaborating with model development teams
Knowledge of vector databases (FAISS, Pinecone, Weaviate) and embeddings management
Understanding of data privacy, access control, and compliance in regulated environments
Proficiency in modern DevOps tooling for data infrastructure (Docker, Terraform, CI/CD)
Ability to work autonomously and thrive in a fast-paced, collaborative environment