CrawlJobs Logo

ML Data Engineer

Recraft

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

At Recraft, we’re building the next generation of generative models across images and text. We’re looking for an ML Data Engineer to scale our data pipelines for unstructured data (primarily images) and keep our training flows fast, reliable, and repeatable. You’ll design and operate high-throughput ingestion and preprocessing on Kubernetes, evolve our internal data-pipeline framework, and work hand-in-hand with ML engineers to ship datasets that move model quality forward.

Job Responsibility:

  • Develop and maintain data-ingestion pipelines to source and prepare large-scale image (and occasional text/HTML) datasets from open, publicly accessible, and permitted sources
  • Own the end-to-end flow: raw data → quality/beauty/relevance filtering → dedup/validation → ready-to-train artifacts
  • Operate and improve our Kubernetes-based data-pipeline framework (distributed jobs, retries, monitoring, automation)
  • Work with S3-style object storage: efficient layouts, lifecycle, throughput, and cost awareness
  • Add tooling around pipelines (progress/health visualization, metrics, alerts) for observability and faster iteration
  • Collaborate closely with ML engineers to align datasets with training needs and accelerate experimentation

Requirements:

  • Strong Python fundamentals
  • you write clean, maintainable, production-ready code
  • Solid hands-on Kubernetes experience (containers, jobs, batch/distributed processing)
  • Proven track record with unstructured data, especially images (loading, filtering, transforming at scale)
  • Experience developing data-ingestion or parsing tools for publicly accessible sources, including handling real-world reliability and failure cases gracefully
  • Comfort with S3/object storage and moving lots of data efficiently and safely
  • Pragmatic, detail-oriented, ownership mindset
  • you enjoy making systems reliable and fast

Nice to have:

  • Familiarity with ML workflows (PyTorch) and downstream training considerations
  • Experience with image quality scoring, captioning, or image-to-text pipelines
  • DAG/workflow visualizations or pipeline UX tooling
  • DevOps fluency: Docker, CI/CD, infra automation
What we offer:
  • Competitive salary and equity
  • We’re able to offer Skilled Worker visa sponsorship in the UK for qualified candidates
  • Real impact on model quality: your pipelines directly power training runs and product improvements
  • Ownership with support: autonomy to design and improve systems, alongside experienced ML peers
  • Modern stack: Python, Kubernetes, S3, internal pipeline framework built for scale
  • Growth: a fast-moving environment where shipping well-engineered systems is the norm

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Data Engineer

Senior Platform Engineer, ML Data Systems

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet t...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
  • Familiarity with machine learning workflows — from training data preparation to evaluation
  • Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
  • Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Job Responsibility
Job Responsibility
  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
  • Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
  • Generous parental leave
  • An exceptional team that trusts you and gives you the freedom to do your best
  • The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
  • Opportunities to connect through affinity, ally, and social groups
  • 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
  • Fulltime
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
  • Strong Analytical and Problem-Solving Skills with attention to data quality and accuracy
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Software Engineer (Data Engineering)

We are seeking a Software Engineer (Data Engineering) who can seamlessly integra...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years in Data Engineering and AI/ML roles
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
  • Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
  • Amazon S3 (Parquet) with lifecycle management to Glacier
  • AWS Glue Catalog and Crawlers
  • AWS Step Functions, AWS Lambda, Amazon EventBridge
  • CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
  • Amazon Redshift and Redshift Spectrum
  • IAM (least privilege), Secrets Manager, SSM
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
  • Develop and optimize data architectures supporting analytics and ML workflows
  • Ensure data integrity, security, and compliance with organizational and industry standards
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
  • Build predictive and prescriptive models leveraging AI and ML techniques
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
  • Perform feature engineering, statistical analysis, and data preprocessing
  • Continuously monitor and optimize models for accuracy and scalability
  • Integrate AI-driven insights into business processes and strategies
  • Serve as the technical liaison between NStarX and client teams
What we offer
What we offer
  • Competitive salary and performance-based incentives
  • Opportunity to work on cutting-edge AI and ML projects
  • Exposure to global clients and international project delivery
  • Continuous learning and professional development opportunities
  • Competitive base + commission
  • Fast growth into leadership roles
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Data Products

As a Senior Software Engineer, you will play a pivotal role in the development o...
Location
Location
United States , Los Angeles
Salary
Salary:
143000.00 - 180000.00 USD / Year
foxcorporation.com Logo
Fox Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working in Software Engineering, Data Science, ML Engineering
  • Strong background in live media streaming and handling VOD content
  • Expertise in working with live media streaming
  • Experience working with Vector Database
  • Strong understanding of generative AI technologies and their underlying mechanisms
  • Good grasp of distributed system design
  • Experience with TensorFlow, PyTorch etc.
  • REST or GraphQL API Design Experience
  • Proficient with building batch and streaming data pipelines on cloud platforms
Job Responsibility
Job Responsibility
  • Design and implement novel and scalable AI solutions for real business problems
  • Design and implement workflows to generate and manage assets for live streaming and VOD
  • Build workflow orchestrations that can be readily extended to perform new analyses
  • Prototype new approaches and productionize solutions at scale for hundreds of millions of active users
  • Maintain high-level craftsmanship while delivering meaningful results
  • Mentor junior engineers on the team
  • Collaborate with peers, engineering leadership, and product management
What we offer
What we offer
  • Annual discretionary bonus
  • Medical/dental/vision insurance
  • 401(k) plan
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Data Products

As a Senior Software Engineer, you will play a pivotal role in the development o...
Location
Location
United States , Los Angeles
Salary
Salary:
143000.00 - 180000.00 USD / Year
foxnews.com Logo
Fox News Media
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience working in Software Engineering, Data Science, ML Engineering
  • Strong background in live media streaming and handling VOD content
  • Expertise in working with live media streaming
  • Experience working with Vector Database
  • Strong understanding of generative AI technologies and their underlying mechanisms
  • Good grasp of distributed system design
  • Experience with TensorFlow, PyTorch etc.
  • REST or GraphQL API Design Experience
  • Proficient with building batch and streaming data pipelines on cloud platforms
Job Responsibility
Job Responsibility
  • Design and implement novel and scalable AI solutions for real business problems
  • Design and implement workflows to generate and manage assets for live streaming and VOD
  • Build workflow orchestrations that can be readily extended to perform new analyses
  • Prototype new approaches and productionize solutions at scale for hundreds of millions of active users
  • Maintain high-level craftsmanship while delivering meaningful results
  • Mentor junior engineers on the team
  • Collaborate with peers, engineering leadership, and product management
What we offer
What we offer
  • Annual discretionary bonus
  • Medical/dental/vision insurance
  • 401(k) plan
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

At Ingka Investments (Part of Ingka Group – the largest owner and operator of IK...
Location
Location
Netherlands , Leiden
Salary
Salary:
Not provided
https://www.ikea.com Logo
IKEA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Formal qualifications (BSc, MSc, PhD) in computer science, software engineering, informatics or equivalent
  • Minimum 3 years of professional experience as a (Junior) Data Engineer
  • Strong knowledge in designing efficient, robust and automated data pipelines, ETL workflows, data warehousing and Big Data processing
  • Hands-on experience with Azure data services like Azure Databricks, Unity Catalog, Azure Data Lake Storage, Azure Data Factory, DBT and Power BI
  • Hands-on experience with data modeling for BI & ML for performance and efficiency
  • The ability to apply such methods to solve business problems using one or more Azure Data and Analytics services in combination with building data pipelines, data streams, and system integration
  • Experience in driving new data engineering developments (e.g. apply new cutting edge data engineering methods to improve performance of data integration, use new tools to improve data quality and etc.)
  • Knowledge of DevOps practices and tools including CI/CD pipelines and version control systems (e.g., Git)
  • Proficiency in programming languages such as Python, SQL, PySpark and others relevant to data engineering
  • Hands-on experience to deploy code artifacts into production
Job Responsibility
Job Responsibility
  • Contribute to the development of D&A platform and analytical tools, ensuring easy and standardized access and sharing of data
  • Subject matter expert for Azure Databrick, Azure Data factory and ADLS
  • Help design, build and maintain data pipelines (accelerators)
  • Document and make the relevant know-how & standard available
  • Ensure pipelines and consistency with relevant digital frameworks, principles, guidelines and standards
  • Support in understand needs of Data Product Teams and other stakeholders
  • Explore ways create better visibility on data quality and Data assets on the D&A platform
  • Identify opportunities for data assets and D&A platform toolchain
  • Work closely together with partners, peers and other relevant roles like data engineers, analysts or architects across IKEA as well as in your team
What we offer
What we offer
  • Opportunity to develop on a cutting-edge Data & Analytics platform
  • Opportunities to have a global impact on your work
  • A team of great colleagues to learn together with
  • An environment focused on driving business and personal growth together, with focus on continuous learning
  • Fulltime
Read More
Arrow Right

Data Infrastructure Engineer

A venture-backed startup at the intersection of AI and national security is buil...
Location
Location
United States , New York City Metropolitan Area
Salary
Salary:
Not provided
weareorbis.com Logo
Orbis Consultants
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering experience in Python, Go, or C
  • Experience building and scaling production data systems
  • Hands-on expertise with model deployment and ML Ops practices
  • Knowledge of database design, performance tuning, and operations
  • Someone who thrives in early-stage, fast-paced environments and enjoys tackling complex challenges
Job Responsibility
Job Responsibility
  • Build and maintain the data pipelines and infrastructure that power ML applications
  • Deploy and manage models at scale, from training through production
  • Design APIs and services that integrate smoothly into mission-critical workflows
  • Ensure data is handled and secured properly across large, distributed environments
  • Collaborate closely with a small, fast-moving team to solve hard technical problems in real-world settings
What we offer
What we offer
  • Significant equity
  • Strong health & wellness benefits
  • Fulltime
Read More
Arrow Right