ML Data Engineer Job at Recraft (London)

Job Description

At Recraft, we’re building the next generation of generative models across images and text. We’re looking for an ML Data Engineer to scale our data pipelines for unstructured data (primarily images) and keep our training flows fast, reliable, and repeatable. You’ll design and operate high-throughput ingestion and preprocessing on Kubernetes, evolve our internal data-pipeline framework, and work hand-in-hand with ML engineers to ship datasets that move model quality forward.

Job Responsibility

Develop and maintain data-ingestion pipelines to source and prepare large-scale image (and occasional text/HTML) datasets from open, publicly accessible, and permitted sources
Own the end-to-end flow: raw data → quality/beauty/relevance filtering → dedup/validation → ready-to-train artifacts
Operate and improve our Kubernetes-based data-pipeline framework (distributed jobs, retries, monitoring, automation)
Work with S3-style object storage: efficient layouts, lifecycle, throughput, and cost awareness
Add tooling around pipelines (progress/health visualization, metrics, alerts) for observability and faster iteration
Collaborate closely with ML engineers to align datasets with training needs and accelerate experimentation

Requirements

Strong Python fundamentals
you write clean, maintainable, production-ready code
Solid hands-on Kubernetes experience (containers, jobs, batch/distributed processing)
Proven track record with unstructured data, especially images (loading, filtering, transforming at scale)
Experience developing data-ingestion or parsing tools for publicly accessible sources, including handling real-world reliability and failure cases gracefully
Comfort with S3/object storage and moving lots of data efficiently and safely
Pragmatic, detail-oriented, ownership mindset
you enjoy making systems reliable and fast

Nice to have

Familiarity with ML workflows (PyTorch) and downstream training considerations
Experience with image quality scoring, captioning, or image-to-text pipelines
DAG/workflow visualizations or pipeline UX tooling
DevOps fluency: Docker, CI/CD, infra automation

What we offer

Competitive salary and equity
We’re able to offer Skilled Worker visa sponsorship in the UK for qualified candidates
Real impact on model quality: your pipelines directly power training runs and product improvements
Ownership with support: autonomy to design and improve systems, alongside experienced ML peers
Modern stack: Python, Kubernetes, S3, internal pipeline framework built for scale
Growth: a fast-moving environment where shipping well-engineered systems is the norm

Recraft - All Job Offers

Select Country

ML Data Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

ML Data Engineer

Full-Stack Data Engineer – Data & ML Automation (Databricks)

ML / Data Engineer – Data Science Enablement

Senior Speech & Audio Biomarkers ML Engineer / Data Scientist / LLM Researcher

Senior Software Engineer, ML Data Platform

Big Data Engineer - ML Analytics & Search

Senior Staff Data Engineer- ML & AI Platform

Software Engineer - Data / ML

Data Engineer / ML Ops

Our AI answers in your language