This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We're looking for a Staff Product Manager to own evaluations for AI agents at Workato — both the internal framework that helps our teams ship better AI features, and the customer-facing tools that let builders assess and improve the agents they create.
Job Responsibility:
Define and own the evaluation framework for Workato's internal AI agent features, driving adoption across teams starting with Agent Studio
Build the customer-facing evaluation experience — how builders test, measure, and improve agents they create on Workato
Make hard calls about what evaluation complexity to expose versus abstract, balancing rigor with approachability
Partner closely with the Build Experience PM to ensure evaluation is integrated into the builder journey, not bolted on
Work with ML engineers and platform teams to ground the framework in technical reality while keeping it accessible
Establish metrics for what "good" looks like — both for internal agent quality and for customer evaluation adoption
Spend significant time with customers understanding where they struggle to assess agent performance and what mental models they bring
Requirements:
7+ years in Product Management
Hands-on experience writing evaluations for AI/ML systems (agents, LLMs, or similar)
Track record of shipping technical products to both internal and external users
Experience driving adoption of frameworks or practices across engineering teams
Strong written and verbal communication skills
Bachelor's degree or equivalent experience
Practitioner depth in evaluations
Strong product management experience
Technical translation ability
Internal influence skills
Greenfield comfort
B2B product sensibility
Nice to have:
Experience with agent architectures, RAG systems, or LLM application development
Background in ML engineering, solutions architecture, or technical program management before PM
Experience building developer tools or platform products
Familiarity with evaluation frameworks (e.g., human eval pipelines, automated benchmarks, red-teaming)