This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You'll own the quality bar for Notion AI products. You’ll work with product and engineering teams to build systems to define what “good” looks like, measure our progress, and drive changes to deliver reliable and high-quality AI experiences. Your work directly shapes how Notion's AI products behave for millions of users. This isn't a traditional software engineering role. It’s an art & science role. You won't spend your days writing code. Instead, you'll focus on understanding and shaping how our AI products behave through context engineering, designing evaluation systems, and analyzing data. This team sits in our AI engineering team, working directly with engineering, product, design, and data. This role is a unique blend of ops, strategy, and product thinking. Day to day, you'll live in production data, ship prompt fixes, run evals and, in effect, shape our quality strategy. As part of that you'll shape Notion's model strategy and work directly with frontier AI labs (OpenAI, Anthropic, Google) to evaluate and launch new models.
Job Responsibility
Context engineering — Design, test, and iterate on system prompts, tool prompts, and context strategies that shape how Notion's AI products behave
Understand & debug — Live in production data: transcripts, logs, user feedback
Evaluate and launch new models with leading research labs
Drive quality priorities — Work embedded with eng and product teams to surface the most important issues
Build tooling & systems — Help manage AI observability and eval platforms
Requirements
Driver mentality — You treat problems as yours. If something's broken, it's your job to fix it, even if you didn't cause it. You have a bias to action.
Curiosity -You’re excited about exploring the “jagged frontier” of LLM capabilities and how AI products meet reality
Analytical instinct — Your first move is to look at data. You can find signal in noise.
Comfortable working with data — You can self-serve insights from large datasets, whether through SQL, coding agents, or other tools.
Clear communication — You can explain complex issues simply.
Experience with LLMs, prompting, or AI products
Nice to have
Backgrounds in engineering, product, data science, research, consulting
You've built something on your own to solve a problem — side project, startup, tool, whatever