This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We're looking for a Model Behavior Architect to help build Perplexity's AI products and evaluations. You'll sit within our AI team and collaborate closely with research and product teams, designing prompt and context engineering strategies to deliver high quality user experiences across multiple domains and models. This role is equal parts craft and science. You'll develop a deep understanding of our answer engine by pressure-testing model capabilities and working across our AI infrastructure (including system and tool prompts, skills, and evaluations) to create a stellar product experience for our users. You'll serve as a go-to expert on prompting, model quality, and behavioral consistency across new product features and model releases.
Job Responsibility:
Context Engineering: Design, test, and optimize context strategies and system prompts that shape answer engine behavior across products, features, and use cases
Evaluation Systems: Build automated and semi-automated evaluation pipelines that measure model quality, catch regressions, and scale across product surfaces
Model Launch Support: Partner with research and engineering to validate model behavior before and during rollouts, ensuring smooth transitions with no degradation
Research & Analysis: Identify inconsistencies and failure modes in model outputs through well-designed research projects — for both internal and production-facing systems
Cross-functional Collaboration: Work closely with design, product, and research teams to translate product goals into concrete model behavior requirements
Knowledge Sharing: Help engineers across teams build intuition for prompt design, context engineering, and evaluation best practices
Staying Current: Track the latest alignment, evaluation, and prompting techniques from industry and academia, and bring the best ideas back to the team
Requirements:
Experience designing evaluations, benchmarks, or metrics for AI systems
Strong written and verbal communication skills, particularly in explaining complex concepts to diverse stakeholders
Ability to manage multiple concurrent projects in a fast-moving environment
Strong experience with Perplexity or other frontier AI models in production settings
Demonstrated experience with Python — you'll prototype, debug, automate, and build systems at scale
3+ years of experience working with LLMs in a product or research setting
Nice to have:
Experience with A/B testing or experimentation frameworks
Track record of improving AI system performance through systematic evaluation and iteration