This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You'll own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output quality, systematically improve production prompts, and fine-tune models to ensure millions of users get exceptional results every time they generate content with Gamma. This role sits at the intersection of research rigor and product impact. You'll diagnose failure patterns in AI-generated presentations, docs, and websites, then craft targeted improvements through iterative experimentation. You'll build the tools and workflows that enable rapid testing, validate changes against quality benchmarks, and ensure our AI gets smarter with every iteration. If you're obsessed with output quality and love the challenge of making AI systems work beautifully at scale, this is your role.
Job Responsibility:
Design and maintain evaluation frameworks that measure AI output quality across all Gamma experiences, developing metrics and benchmarks to assess model performance
Systematically improve production prompts through iterative experimentation—diagnosing failure patterns, crafting targeted improvements, and validating against quality benchmarks
Conduct rigorous experiments to understand model behavior, analyze results, and derive insights that inform prompt and model improvements
Build tools and workflows to support rapid experimentation and quality analysis, enabling faster iteration on AI improvements
Fine-tune models on targeted datasets to improve baseline performance, preventing issues like poor layout choices or low-quality outlines
Partner with product and engineering teams to ensure AI quality improvements ship quickly and work reliably at scale
Requirements:
1-2+ years working with AI systems with demonstrated experience shipping production-grade AI products
Deep hands-on experience with prompt engineering, LLM experimentation, and systematic evaluation of AI outputs
Strong experimental mindset with ability to design tests, analyze model performance, and iterate toward quality improvements
Experience with post-training techniques for LLMs including reinforcement learning and supervised fine-tuning
Research-oriented approach to problem-solving with comfort working in ambiguity and exploring novel solutions to AI quality challenges
Exceptional attention to detail and quality obsession—cares deeply about output quality across all dimensions, including less visible aspects
Nice to have:
Bachelor's degree in Computer Science, Machine Learning, or related field, or equivalent hands-on experience with AI research and experimentation