This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet the growing platform needs of AI-based tutoring at Khan Academy. We’re looking for someone who can gather internal requirements, design schema based on well-known dataset patterns, and deploy, document, and train people on an internal dataset management framework. The systems you design will need to integrate with trace management and human labeling APIs. You’ll work closely with other AI engineers, platform developers, and labeling teams to ensure our data is clean, representative, and ready for both human and automated evaluation. This role bridges ML operations, data engineering and data science— enabling our AI systems to learn from reliable, well-structured datasets that reflect the diversity and nuance of real learners.
Job Responsibility:
Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
Clean, normalize, and enrich data while preserving semantic meaning and consistency
Prepare and format datasets for human labeling, and integrate results into ML datasets
Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
Implement automated tests and validation to detect data drift or labeling inconsistencies
Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
Contribute to shared tools and documentation for dataset management and AI evaluation
Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
Requirements:
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
Familiarity with machine learning workflows — from training data preparation to evaluation
Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
Attention to detail and an obsession with data quality and reproducibility
Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Nice to have:
Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) or human-in-the-loop systems
Understanding of ML evaluation techniques, including prompt-based and generative model metrics
Exposure to MLOps practices such as model registry, feature store, or continuous evaluation
Background in education technology or other human-centered AI applications.
What we offer:
Competitive salaries
Ample paid time off as needed
8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
Generous parental leave
An exceptional team that trusts you and gives you the freedom to do your best
The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
Opportunities to connect through affinity, ally, and social groups
401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.