This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Data Scientist you’ll explore and reason about the data that powers millions of AI evaluations each week. You’ll generate and test hypotheses, identify causal relationships, and uncover insights that help us understand how frontier models behave in the real world. You’ll collaborate with ML researchers and engineers to design experiments, analyze large-scale datasets, and build statistical frameworks that improve the reliability and interpretability of our AI evaluation systems.
Job Responsibility:
Explore and analyze large, complex datasets to uncover patterns, biases, and causal relationships in model behavior and system performance
Formulate hypotheses about data quality, evaluation outcomes, and model performance — then design experiments to validate or refute them
Build reproducible analysis pipelines using Python, Pandas, NumPy, and Spark to process and interrogate large-scale data
Partner with ML researchers and engineers to design metrics and analyses that evaluate how models perform across domains, prompts, and tasks
Develop causal reasoning frameworks and statistical methods that help explain why models behave as they do — not just how well they perform
Communicate insights (for example, via blog posts) clearly to technical and non-technical partners, informing both research direction and infrastructure improvements
Requirements:
6+ years of experience in data science, ML analytics, or applied research, preferably in AI, ML, or large-scale data environments
Strong proficiency in Python, with deep experience in Pandas, NumPy, and distributed frameworks like Spark
Expertise in statistical modeling, causal inference, and experimental design
Experience reasoning about data distributions, sample quality, and the effects of data distribution shifts
Strong communication skills and the ability to collaborate closely with ML researchers and engineers
Nice to have:
Background in AI model evaluation
Experience working with LLM outputs (for example, LLM-as-a-judge), embeddings, or other large-scale model artifacts
Experience with A/B testing
What we offer:
Competitive compensation and equity aligned to the markets where our team members are based
Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs
The opportunity to work on cutting-edge AI with a small, mission-driven team
A culture that values transparency, trust, and community impact