This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our Data team powers Liquid Foundation Models across pre-training, vision, audio, and emerging modalities. Public data sources are plateauing. Model performance increasingly depends on purpose-built datasets. We need ML-minded engineers who can collect, filter, and synthesize high-quality data at scale. We treat data as a research problem, not an infrastructure problem. Our engineers run experiments, design ablations, and measure how data decisions move model quality. We will match you to the team where you can grow the fastest and have the most impact: pre-training, post-training RL, vision-language, audio, or multimodal.
Job Responsibility:
Build and maintain data processing, filtering, and selection pipelines at scale
Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
Design and run evaluations and ablations to measure dataset's impact on model performance
Monitor public datasets across text, vision, and audio domains
Collaborate with pre-training, vision, and audio teams on modality-specific data needs
Requirements:
Strong Python skills with the ability to quickly comprehend problems and translate them into clean, working code
Solid ML fundamentals: experience training, evaluating, and iterating on models (PyTorch preferred)
Track record of learning new technical domains quickly
3+ years relevant experience with an M.S., or 1+ year with a Ph.D. (5+ years with a B.S.)
Nice to have:
Experience with synthetic data generation, data curation, or ML evaluation (designing evals, benchmarking, measuring data and model quality)
Experience with LLMs, VLMs, computer vision, or audio data pipelines
Open-source contributions or publications at NeurIPS, ICML, ICLR, or CVPR
What we offer:
Competitive base salary with equity in a unicorn-stage company
We pay 100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year