CrawlJobs Logo

Research Engineer - Evaluations

United States; United Kingdom, Palo Alto · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We're seeking a Research Engineer to design and scale the infrastructure that powers our model evaluation efforts. This role is about building the pipelines, metrics, and automated systems that close the loop between model output, evaluation, and improvement. You'll work across research, engineering, and product teams to ensure our models are measured rigorously, consistently, and in ways that directly inform development.

Job Responsibility

  • Design and implement scalable pipelines for automated evaluation of generative models, with a focus on visual and multimodal outputs (image, video, text, audio)
  • Develop novel metrics and evaluation models that capture qualities like fidelity, coherence, temporal consistency, and alignment with human intent
  • Integrate evaluation signals into training loops (including reinforcement learning and reward modeling) to continuously improve model performance
  • Build infrastructure for large-scale regression testing, benchmarking, and monitoring of multimodal generative models
  • Collaborate with researchers running human studies to translate human evaluation frameworks into automated or semi-automated systems
  • Partner with model researchers to identify failure cases and build targeted evaluation harnesses
  • Maintain dashboards, reporting tools, and alerting systems to surface evaluation results to stakeholders
  • Stay current with emerging evaluation techniques in generative AI, multimodal LLMs, and perceptual quality assessment

Requirements

  • Master's or PhD in Computer Science, Machine Learning, or a related technical field (or equivalent industry experience)
  • 3+ years of experience building ML evaluation systems, model pipelines, or large-scale infrastructure
  • Hands-on experience working with visual data (images and/or video), including evaluation, modeling, or data preparation
  • Proficiency in Python and ML frameworks (PyTorch, JAX, or TensorFlow)
  • Familiarity with human-in-the-loop evaluation workflows and how to scale them with automation
  • Strong background in machine learning, with experience in generative models (diffusion, LLMs, multimodal architectures)
  • Strong software engineering skills (CI/CD, testing, data pipelines, distributed systems)

Nice to have

  • Experience with reinforcement learning or reward modeling
  • Prior work on perceptual metrics, multimodal evaluation benchmarks, or retrieval-based evaluation
  • Background in large-scale model training or evaluation infrastructure
  • Experience designing metrics for perceptual quality
  • Familiarity with creative media workflows (film, VFX, animation, digital art)
  • Contributions to open-source evaluation libraries or benchmarks

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Research Engineer - Evaluations

8 matching positions

Research Engineer, Evaluations (Tech Leadership) - Meta Superintelligence Labs

Meta is seeking Research Engineers to join the Evaluations team within Meta Supe...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 5+ years of industrial experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Experience identifying, designing and completing medium to large technical features independently, without guidance
  • Demonstrated software engineering practices including version control, testing, and code review practices
  • Ability to work independently and adapt to rapidly changing priorities
Job Responsibility
Job Responsibility
  • Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  • Develop and implement evaluation environments, including environments for novel model capabilities and modalities
  • Collaborate with external data vendors to source and prepare high-quality evaluation datasets
  • Execute on the technical vision of research scientists designing new benchmarks and evaluations
  • Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas
  • Contribute to evaluation tooling that measures the quality and reliability of evaluation suites
  • Mentor and support other engineers on the team by providing technical guidance and feedback, and helping raise the quality and velocity of evaluation development
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Research Engineer, Evaluations - Meta Superintelligence Labs

Meta is seeking Research Engineers to join the Evaluations team within Meta Supe...
Location
Location
United States , Menlo Park
Salary
Salary:
257000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 4+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Experience identifying, designing and completing medium to large technical features independently, without guidance
  • Demonstrated experience in software engineering practices including version control, testing, and code review practices
  • Ability to work independently and adapt to rapidly changing priorities
Job Responsibility
Job Responsibility
  • Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  • Develop and implement evaluation environments, including environments for novel model capabilities and modalities
  • Collaborate with external data vendors to source and prepare high-quality evaluation datasets
  • Execute on the technical vision of research scientists designing new benchmarks and evaluations
  • Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas
  • Contribute to evaluation tooling that measures the quality and reliability of evaluation suites
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right

Research Engineer, Evaluations - Meta Superintelligence Labs

Meta is seeking Research Engineers to join the Evaluations team within Meta Supe...
Location
Location
United States , Menlo Park
Salary
Salary:
219000.00 - 301000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 5+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Experience identifying, designing and completing medium to large technical features independently, without guidance
  • Software engineering practices including version control, testing, and code review practices
  • Demonstrated experience of working independently and adapting to rapidly changing priorities
Job Responsibility
Job Responsibility
  • Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  • Develop and implement evaluation environments, including environments for novel model capabilities and modalities
  • Collaborate with external data vendors to source and prepare high-quality evaluation datasets
  • Execute on the technical vision of research scientists designing new benchmarks and evaluations
  • Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas
  • Contribute to evaluation tooling that measures the quality and reliability of evaluation suites
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

AI Research Engineer, Enterprise Evaluations

Scale AI is seeking a technically rigorous and driven AI Research Engineer to jo...
Location
Location
United States , San Francisco; New York
Salary
Salary:
179400.00 - 224250.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience
  • 2+ years of experience in Machine Learning or Applied Research, focused on applied ML systems or evaluation infrastructure
  • Hands-on experience with Large Language Models (LLMs) and Generative AI in professional or research environments
  • Strong understanding of frontier model evaluation methodologies and the current research landscape
  • Proficiency in Python and major ML frameworks (e.g., PyTorch, TensorFlow)
  • Solid engineering and statistical analysis foundation, with experience developing data-driven methods for assessing model quality
Job Responsibility
Job Responsibility
  • Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems
  • Analyze feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments
  • Design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems. This includes creating models that critique, grade, and explain agent outputs (e.g., RLAIF, model-judging-model setups), along with scalable evaluation pipelines and diagnostic tools
  • Pursue research initiatives that explore new methodologies for automatically analyzing, evaluating, and improving the behavior of enterprise agents, pushing the boundaries of how AI systems are assessed and optimized in real-world contexts
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend
  • equity grant
  • Fulltime
Read More
Arrow Right

Research Engineer / Research Scientist - Foundations Retrieval Lead

The Foundations Research team works on high-risk, high-reward ideas that could s...
Location
Location
United States , San Francisco
Salary
Salary:
445000.00 - 555000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading high-performance teams of researchers or engineers in ML infrastructure or foundational research
  • Deep technical expertise in representation learning, embedding models, or vector retrieval systems
  • Familiarity with transformer-based LLMs and how embedding spaces can interact with language model objectives
  • Research experience in areas such as contrastive learning, supervised or unsupervised embedding learning, or metric learning
  • A track record of building or scaling large machine learning systems, particularly embedding pipelines in production or research contexts
  • A first-principles mindset for challenging assumptions about how retrieval and memory should work for large models
Job Responsibility
Job Responsibility
  • Lead research into embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning
  • Manage a team of researchers and engineers building end-to-end infrastructure for training, evaluating, and integrating embeddings into frontier models
  • Drive innovation in dense, sparse, and hybrid representation techniques, metric learning, and learning-to-retrieve systems
  • Collaborate closely with Pretraining, Inference, and other Research teams to integrate retrieval throughout the model lifecycle
  • Contribute to OpenAI’s long-term vision of AI systems with memory and knowledge access capabilities rooted in learned representations
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Research Engineer / Research Scientist, Post-Training

The Post-Training team is responsible for training and improving pre-trained mod...
Location
Location
United States , San Francisco
Salary
Salary:
295000.00 - 555000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep understanding of machine learning and machine learning applications
  • Working knowledge of relevant models, and building evaluations for model capability improvement
  • Comfortable diving into a large ML codebase to debug
  • Thrive in a dynamic and technically complex environment
  • Strong ML engineering skills and research experience, especially with novel and highly capable models
  • Passionate about product-driven research
Job Responsibility
Job Responsibility
  • Own and pursue a research agenda to improve model capability and performance
  • Collaborate closely with the other research and product teams, allowing customers to optimize their own models
  • Build robust evaluations for tracking modeling improvements
  • Design, implement, test, and debug code across our research stack
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Research Engineer - Applied Research

At Luma, the Applied Research team brings our most advanced generative models to...
Location
Location
United States; United Kingdom , Palo Alto; London
Salary
Salary:
Not provided
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering skills in Python and deep learning frameworks (preferably PyTorch)
  • comfortable moving between research prototypes and production systems
  • Hands-on experience with modern visual generative models (diffusion, transformers, or related architectures)
  • Demonstrated ability to tune, refine, and deploy models in real products using human feedback and creative evaluation
  • Curiosity and passion for multimodal AI - understanding how models perceive, generate, and evolve in the wild
Job Responsibility
Job Responsibility
  • Develop and maintain model variants purpose-built for specific product features and partner applications - adapting architectures, datasets, and fine-tuning strategies
  • Drive continual improvements to Luma's core model-powered experiences, leading iterations that push quality, reliability, and creative depth across versions
  • Collaborate closely with Product, Research, and Design to translate creative intent and user feedback into refined model behavior, intuitive controls, and new capabilities
  • Build internal tools and workflows that accelerate model iteration and evaluation - enabling faster experimentation, deeper insight, and tighter feedback loops
  • Contribute to applied research in safety, authenticity, and control - spanning topics like moderation, watermarking, fairness, and color science
  • Fulltime
Read More
Arrow Right

Research Scientist / Engineer – Foundation Model: Core Research

This is a rare and foundational opportunity to define the future of multimodal A...
Location
Location
United States , Palo Alto
Salary
Salary:
250000.00 - 450000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A Bachelor's, Master's, or PhD degree in Computer Science, Machine Learning, Physics, or Mathematics is essential
  • A 'first-principles' intuition for scaling
  • Fluent in the language of frontier AI
  • Proven ability to design and rigorously analyze experiments and to articulate complex technical concepts effectively
  • Practical experience with distributed or high-performance computing environments, particularly managing and optimizing training runs on large-scale GPU clusters
Job Responsibility
Job Responsibility
  • Unified Modeling & Efficiency Drive the core research that powers all of Luma's products — co-designing multimodal representations, advancing core algorithms for long-context training, and establishing rigorous scaling laws to predict performance across compute budgets
  • Alignment & Evaluation Close the gap between training loss and user experience. Develop proxy tasks and automated metrics that serve as the compass for research decisions — ensuring our models optimize for what actually matters to users, not just benchmarks
  • Research Infrastructure Build the engine for high-velocity research. Maintain production-research parity, ensure reproducibility, and design systems for rapid experimentation — so that novel ideas go from hypothesis to validated result as fast as possible
  • Fulltime
Read More
Arrow Right