CrawlJobs Logo

Qualitative Evaluation Engineer

lumalabs.ai Logo

Luma AI

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development. This is not a checkbox metrics role - it's about building evaluative systems that match the complexity of human perception, creativity, and intention.

Job Responsibility:

  • Evaluate generative model performance across diverse tasks, prompts, and modalities
  • Identify key failure modes, regression patterns, and edge cases that impact product quality
  • Develop and maintain qualitative evaluation frameworks that are scalable and reusable
  • Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases
  • Translate high-level product goals into concrete evaluative criteria
  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts
  • Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX
  • Stay informed about emerging evaluation standards in generative AI and creative tools

Requirements:

  • Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field
  • 5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment
  • Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX)
  • Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms
  • Experience working cross-functionally with engineers, researchers, and creatives
  • Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights

Nice to have:

  • Background in motion, visual effects, or storytelling pipelines
  • Experience evaluating AI-generated media (video, images, 3D)
  • Prior work on building internal tools for qualitative data collection or scoring
  • Familiarity with prompt engineering and reference-based input methods

Additional Information:

Job Posted:
January 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Qualitative Evaluation Engineer

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is building the fastest, most capable open-source-aligned LLMs and i...
Location
Location
United States , San Francisco
Salary
Salary:
220000.00 - 270000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
Job Responsibility
Job Responsibility
  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • Fulltime
Read More
Arrow Right

Staff Product Manager

We’re hiring a Staff Product Manager, AI Platform & Experiences to help shape ho...
Location
Location
Canada , Toronto
Salary
Salary:
140000.00 - 170000.00 CAD / Year
fullscript.com Logo
Fullscript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of product management experience, with a track record of leading complex, cross-functional initiatives
  • Strong systems thinking and comfort working at the intersection of product, engineering, and data
  • Experience collaborating deeply with engineering teams on technically complex products, including contributing to discussions around architecture, tradeoffs, and sequencing of technical work
  • Ability to understand and reason about how modern AI-powered systems are built and operated, even if you’re not hands-on with implementation
  • Familiarity with concepts such as APIs, distributed systems, experimentation, observability, and system reliability, and how they impact product quality and user experience
  • Experience using AI both as a product capability and as a tool to improve discovery, delivery, and decision-making across the product development lifecycle
  • Demonstrated curiosity and informed perspectives on emerging AI patterns (e.g., copilots, agents, automation) and how they may shape future product opportunities
  • A track record of elevating teams by sharing practical insights, patterns, or approaches others can learn from
Job Responsibility
Job Responsibility
  • Set direction and strategy for practitioner-facing AI experiences
  • Define the vision, strategy, and roadmap
  • Bring focus and cohesion to AI initiatives across the platform
  • Translate ambiguous problem spaces into clear product opportunities
  • Help establish guardrails for how AI is used at Fullscript
  • Make informed product decisions that account for system-level constraints
  • Help align short-term product delivery with longer-term technical investments
  • Deliver meaningful AI-powered experiences
  • Lead the evolution of assistive and embedded AI experiences
  • Ensure AI features are grounded in accurate platform context
What we offer
What we offer
  • Flexible time off
  • Competitive compensation + equity
  • Retirement support – RRSP match
  • Comprehensive health coverage – Premium, flexible benefits including paramedical services and HSA options
  • Product-driven learning culture – Dedicated learning budget
  • Remote-first with flexibility
  • Fullscript product access – Discounts on practitioner-grade wellness products
  • Fulltime
Read More
Arrow Right

Senior / Lead AI Engineer

Omio is building the future of travel. We’re moving from manual, rule-based syst...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
foodlabs.com Logo
FoodLabs & Atlantic Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10+ years of experience in software engineering, developing complex models and algorithms
  • Proven track record in designing, implementing, and deploying production-grade AI solutions at scale
  • Strong communication and presentation skills, with the ability to influence and collaborate effectively with non-technical stakeholders
  • Self-motivated and capable of working independently, driving initiatives with minimal supervision
  • Prior experience deploying scalable AI and LLM-based solutions for real-time, high-performance systems is highly desirable
  • Experience with diverse model evaluation techniques (quantitative and qualitative) and an iterative approach to improving AI system performance and user outcomes
  • Expertise in building AI applications using large language models such as OpenAI, Claude, Gemini, LLaMA
  • Experience with LLM orchestration frameworks like LangChain, LangGraph, vLLM, LMDeploy
  • Strong programming skills in Java, Python, and SQL
  • Familiarity with data preprocessing, feature engineering, model evaluation, MLOps, and LLMOps best practices
Job Responsibility
Job Responsibility
  • Develop AI solutions leveraging LLMs to improve productivity and deliver strong business impact
  • Lead the end-to-end development lifecycle from ideation to deployment of AI-powered solutions across various domains
  • Build scalable AI systems that support Omio’s global expansion goals
  • Act as an evangelist for AI adoption by demonstrating clear value to stakeholders
  • Collaborate with Business, Product, and Engineering teams to integrate AI into workflows and drive adoption
  • Present models, results, and systems to both technical and non-technical audiences, including C-level stakeholders
  • Fulltime
Read More
Arrow Right

Senior Lead AI Engineer

Omio is building the future of travel. We’re moving from manual, rule-based syst...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
foodlabs.com Logo
FoodLabs & Atlantic Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10+ years of experience in software engineering, developing complex models and algorithms
  • Proven track record in designing, implementing, and deploying production-grade AI solutions at scale
  • Strong communication and presentation skills, with the ability to influence and collaborate effectively with non-technical stakeholders
  • Self-motivated and capable of working independently, driving initiatives with minimal supervision
  • Prior experience deploying scalable AI and LLM-based solutions for real-time, high-performance systems is highly desirable
  • Experience with diverse model evaluation techniques (quantitative and qualitative) and an iterative approach to improving AI system performance and user outcomes
  • Expertise in building AI applications using large language models such as OpenAI, Claude, Gemini, LLaMA
  • Experience with LLM orchestration frameworks like LangChain, LangGraph, vLLM, LMDeploy
  • Strong programming skills in Java, Python, and SQL
  • Familiarity with data preprocessing, feature engineering, model evaluation, MLOps, and LLMOps best practices
Job Responsibility
Job Responsibility
  • Develop AI solutions leveraging LLMs to improve productivity and deliver strong business impact
  • Lead the end-to-end development lifecycle from ideation to deployment of AI-powered solutions across various domains
  • Build scalable AI systems that support Omio’s global expansion goals
  • Act as an evangelist for AI adoption by demonstrating clear value to stakeholders
  • Collaborate with Business, Product, and Engineering teams to integrate AI into workflows and drive adoption
  • Present models, results, and systems to both technical and non-technical audiences, including C-level stakeholders
  • Fulltime
Read More
Arrow Right

ML Research Engineer

We are looking for experienced Research Engineers to develop and optimise AI mod...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
Recraft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in training large-scale generative models (e.g., image generation, audio synthesis, or large language models)
  • Expertise in evaluating generative models, with a deep understanding of qualitative and quantitative metrics
  • Strong proficiency in PyTorch and familiarity with state-of-the-art neural network architectures
  • Hands-on experience building end-to-end machine learning pipelines, including data collection and pre-processing, model training and evaluation, inference serving
  • Strong software engineering skills, ability to design, implement and maintain high-quality code
Job Responsibility
Job Responsibility
  • Develop and train large-scale generative models, pushing the boundaries of AI capabilities
  • Design and run experiments to explore and refine model architectures, analyze results, and implement data-driven improvements
  • Continuously innovate by developing and deploying new features that enhance user experiences
What we offer
What we offer
  • Competitive salary and equity
  • Opportunities for professional growth and leadership
  • A collaborative, research-driven environment
  • The chance to work on cutting-edge AI projects that redefine creative workflows
  • The opportunity to contribute to open-source AI research and publications
  • Fulltime
Read More
Arrow Right

Sustainable transport economist

We aim to recruit a junior/intermediate consultant with expertise in transport e...
Location
Location
Italy , Milan
Salary
Salary:
Not provided
trt.it Logo
TRT TRASPORTI E TERRITORIO SRL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master degree or PhD in Economics, Engineering, Statistics or equivalent
  • Solid theoretical knowledge and/or previous work experience in transport policy, energy policy and economics studies
  • Familiarity with quantitative methods
  • Interest in studying and exploring innovative topics
  • Problem solving attitude
  • Excellent verbal and written communication skills (minute taking, drafting reports and delivering presentations)
  • A very good command of English (reading, writing and speaking)
  • Readiness for travel in EU and outside EU
Job Responsibility
Job Responsibility
  • Develop qualitative and quantitative analysis
  • Support the drafting and publication of reports
  • Source and write scientific articles
  • Contribute to communication, information and dissemination tasks within the projects
What we offer
What we offer
  • Possibility to have flexible working arrangements
  • Fulltime
Read More
Arrow Right

AI Engineer

You'll own the core models and prompts that power Gamma. We weave together text,...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 300000.00 USD / Year
gamma.app Logo
Gamma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prompt hacker: tinkerer who loves seeing how far you can push the limits of a foundation model, with experience building and evaluating prompts at scale
  • Software engineer: Experienced developer comfortable in TypeScript and Python, excited about mixing prompt engineering with traditional software engineering
  • Data-driven: You embrace using data to raise the bar of AI quality, with skills in writing evals, designing metrics, and turning qualitative feedback into quantitative measures
  • Self-sufficient in gathering and cleaning data to inform prompt improvements and model evaluations
Job Responsibility
Job Responsibility
  • Own our existing LLM and image prompts, measuring and continuously improving quality at scale
  • Develop complex prompts for new features using AI JSX, balancing creativity with reliability
  • Build evaluation frameworks for our prompts and models, monitoring metrics and qualitative feedback to create better test sets
  • Drive the roadmap based on quality gaps, constantly evaluating new frontier models and methods
  • Curate datasets for fine-tuning open source models and launch new modalities like voice and video
  • Build analytics and tracking systems while owning uptime, latency, and costs across our AI infrastructure
What we offer
What we offer
  • competitive equity
  • Fulltime
Read More
Arrow Right

UX Researcher, Qualitative

Our UX Research team is designing for the broad spectrum of human needs, which r...
Location
Location
United States , Bellevue
Salary
Salary:
164000.00 - 227000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 10+ years of relevant experience in user experience, applied research and/or product research and development or a Master’s degree and 8+ years relevant experience, or PhD and 5+ years relevant experience
  • Experience conducting In-Depth Interviews or Focus Groups and Concept Testing or Usability Testing
  • Interest in and experience executing hands-on, primary research
  • Experience translating research findings into strategic narratives
Job Responsibility
Job Responsibility
  • Work closely with product and business teams to identify research topics
  • Act as a thought leader in the domain of research, while advocating for the people who could use our products
  • Design and execute end-to-end custom primary research using a wide variety of methods
  • Design studies that address both user behavior and attitudes
  • Work independently and autonomously
  • Effectively manage and prioritize research plans through ambiguous and fast-changing environments, align and efficiently execute critical insights and work with a large group of stakeholders
  • Communicate results and illustrate suggestions in compelling and unique ways
  • Work cross-functionally with design, product management, content strategy, engineering and marketing
  • Generate insights that both fuel ideation and evaluate designs
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right