This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
At Recraft, we’re building the next generation of generative models across images and text. We’re looking for an ML Data Engineer to scale our data pipelines for unstructured data (primarily images) and keep our training flows fast, reliable, and repeatable. You’ll design and operate high-throughput ingestion and preprocessing on Kubernetes, evolve our internal data-pipeline framework, and work hand-in-hand with ML engineers to ship datasets that move model quality forward.
Job Responsibility:
Develop and maintain data-ingestion pipelines to source and prepare large-scale image (and occasional text/HTML) datasets from open, publicly accessible, and permitted sources
Own the end-to-end flow: raw data → quality/beauty/relevance filtering → dedup/validation → ready-to-train artifacts
Proven track record with unstructured data, especially images (loading, filtering, transforming at scale)
Experience developing data-ingestion or parsing tools for publicly accessible sources, including handling real-world reliability and failure cases gracefully
Comfort with S3/object storage and moving lots of data efficiently and safely
Pragmatic, detail-oriented, ownership mindset
you enjoy making systems reliable and fast
Nice to have:
Familiarity with ML workflows (PyTorch) and downstream training considerations
Experience with image quality scoring, captioning, or image-to-text pipelines
DAG/workflow visualizations or pipeline UX tooling
DevOps fluency: Docker, CI/CD, infra automation
What we offer:
Competitive salary and equity
We’re able to offer Skilled Worker visa sponsorship in the UK for qualified candidates
Real impact on model quality: your pipelines directly power training runs and product improvements
Ownership with support: autonomy to design and improve systems, alongside experienced ML peers
Modern stack: Python, Kubernetes, S3, internal pipeline framework built for scale
Growth: a fast-moving environment where shipping well-engineered systems is the norm