This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As the leading data and evaluation partner for frontier AI companies, Scale plays an integral role in understanding the capabilities and safeguarding AI models and systems. Building on this expertise, Scale Labs has launched a new team focused on policy research, to bridge the gap between AI research and global policymakers to make informed, scientific decisions about AI risks and capabilities. Our research tackles the hardest problems in agent robustness, AI control protocols, and AI risk evaluations to help governments, industry, and the public understand and mitigate AI risk while maximizing AI adoption. This team collaborates broadly across industry, the public sector, and academia and regularly publishes our findings. We are actively seeking talented researchers to join us in shaping this vision.
Job Responsibility
Develop and apply post-training methods and interpretability techniques to make frontier AI systems safer, and better understood by researchers and policymakers
Design and run post-training pipelines to study how training choices affect model safety, robustness, and alignment properties
Develop interpretability-informed evaluations that reveal how and why models produce unsafe, deceptive, or otherwise undesirable behaviors, and use those insights to guide targeted mitigations
Collaborate with policymakers, engineers, and other researchers to translate post-training and interpretability findings into actionable safety standards, evaluation benchmarks, and best practices
Requirements
Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches
A track record of published research in machine learning, particularly in generative AI
At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development
Strong written and verbal communication skills to operate in a cross-functional team
Nice to have
Experience with mechanistic interpretability, probing, or other techniques for understanding model internals
Familiarity with red-teaming or adversarial evaluation of post-trained models
Experience studying failure modes introduced or masked by post-training, such as reward hacking, sycophancy, or alignment faking