This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Scale’s mission is to develop reliable AI systems for the world's most important decisions. Within the Enterprise BU, we build production-grade GenAI applications for the world’s largest companies. As a Strategic Projects Lead (SPL) for Enterprise Evaluations, you will oversee the evaluations that determine if an application is ready for the real world. You will define 'what good looks like' for complex GenAI apps, curate the data needed to measure performance, and serve as one of the final gatekeepers for production readiness.
Job Responsibility:
Partner with enterprise stakeholders and Scale project teams to translate business goals into concrete evaluation strategies
Co-design the frameworks, rubrics, and 'golden datasets'
Determine the 'what, how, and why' of human-in-the-loop data
Own operational scoping & execution
Orchestrate the end-to-end evaluation 'engine'
Identify and resolve operational challenges and technical blockers proactively
Analyze evaluation results to provide the final, data-driven recommendation on whether an application is ready for production
Run open source LLM benchmarks and present insights and recommendations on model performance to engineering teams
Act as a 'cross-pollinator' for the Enterprise BU
Requirements:
Strong technical background (ideal to have a degree in computer science and Python knowledge)
Ability to do data analytics using SQL or Python
5+ years of professional experience in a high-stakes operational role at a fast-growing tech company, management consulting, or investment banking
Strong problem solving capabilities
Systemic Thinking
Research-Adjacent Interest
Full-Stack Ownership
Nice to have:
Experience working on operational challenges or as a consultant