CrawlJobs Logo

Qualitative Evaluation Engineer

lumalabs.ai Logo

Luma AI

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development. This is not a checkbox metrics role - it's about building evaluative systems that match the complexity of human perception, creativity, and intention.

Job Responsibility:

  • Evaluate generative model performance across diverse tasks, prompts, and modalities
  • Identify key failure modes, regression patterns, and edge cases that impact product quality
  • Develop and maintain qualitative evaluation frameworks that are scalable and reusable
  • Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases
  • Translate high-level product goals into concrete evaluative criteria
  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts
  • Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX
  • Stay informed about emerging evaluation standards in generative AI and creative tools

Requirements:

  • Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field
  • 5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment
  • Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX)
  • Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms
  • Experience working cross-functionally with engineers, researchers, and creatives
  • Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights

Nice to have:

  • Background in motion, visual effects, or storytelling pipelines
  • Experience evaluating AI-generated media (video, images, 3D)
  • Prior work on building internal tools for qualitative data collection or scoring
  • Familiarity with prompt engineering and reference-based input methods

Additional Information:

Job Posted:
January 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Qualitative Evaluation Engineer

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is building the fastest, most capable open-source-aligned LLMs and i...
Location
Location
United States , San Francisco
Salary
Salary:
220000.00 - 270000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
Job Responsibility
Job Responsibility
  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • Fulltime
Read More
Arrow Right

Product Content Engineer

Product Content Engineering is a horizontal function supporting initiatives acro...
Location
Location
United States , Menlo Park
Salary
Salary:
162000.00 - 227000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience working collaboratively with product, engineering, design, and user research teams
  • 1+ years working with generative AI products, AI evaluation, prompt engineering, annotation, and/or content labeling and analysis
  • Experience designing and implementing evaluation frameworks, annotation guidelines, or quality rubrics for AI/ML systems
  • Demonstrated data analysis skills, with experience exploring data, identifying patterns, and producing actionable insights
  • Experience building new products or platform/ecosystem products
  • Critical thinking, experience leading data-driven analyses to inform product or content decisions, and experience communicating to executive leadership
  • Proven track record of cross-functional collaboration and delivering results in environments with evolving requirements and competing priorities
Job Responsibility
Job Responsibility
  • Define content quality standards and use them to systematically evaluate how AI models are performing across our products and content experiences
  • Design golden sets, taxonomies, and guidelines that enable consistent, repeatable content quality assessments
  • Build repeatable workflows for collecting, annotating, and analyzing AI outputs so evaluations can run efficiently as models evolve
  • Evaluate successive model releases through structured comparison, documenting what improved, what regressed, and what to prioritize next
  • Design evaluation frameworks that integrate qualitative and quantitative signals to measure dimensions like user trust, content depth, and topical relevance
  • Develop processes to track content quality and model performance over time and flag regressions
  • Synthesize evaluation results into structured error patterns and concrete recommendations that engineering and product teams can act on
  • Work cross-functionally with engineers, data scientists, product managers, and content strategists to align AI behaviors with real-world user expectations
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior / Lead AI Engineer

Omio is building the future of travel. We’re moving from manual, rule-based syst...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
foodlabs.com Logo
FoodLabs & Atlantic Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10+ years of experience in software engineering, developing complex models and algorithms
  • Proven track record in designing, implementing, and deploying production-grade AI solutions at scale
  • Strong communication and presentation skills, with the ability to influence and collaborate effectively with non-technical stakeholders
  • Self-motivated and capable of working independently, driving initiatives with minimal supervision
  • Prior experience deploying scalable AI and LLM-based solutions for real-time, high-performance systems is highly desirable
  • Experience with diverse model evaluation techniques (quantitative and qualitative) and an iterative approach to improving AI system performance and user outcomes
  • Expertise in building AI applications using large language models such as OpenAI, Claude, Gemini, LLaMA
  • Experience with LLM orchestration frameworks like LangChain, LangGraph, vLLM, LMDeploy
  • Strong programming skills in Java, Python, and SQL
  • Familiarity with data preprocessing, feature engineering, model evaluation, MLOps, and LLMOps best practices
Job Responsibility
Job Responsibility
  • Develop AI solutions leveraging LLMs to improve productivity and deliver strong business impact
  • Lead the end-to-end development lifecycle from ideation to deployment of AI-powered solutions across various domains
  • Build scalable AI systems that support Omio’s global expansion goals
  • Act as an evangelist for AI adoption by demonstrating clear value to stakeholders
  • Collaborate with Business, Product, and Engineering teams to integrate AI into workflows and drive adoption
  • Present models, results, and systems to both technical and non-technical audiences, including C-level stakeholders
  • Fulltime
Read More
Arrow Right

Senior Lead AI Engineer

Omio is building the future of travel. We’re moving from manual, rule-based syst...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
foodlabs.com Logo
FoodLabs & Atlantic Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10+ years of experience in software engineering, developing complex models and algorithms
  • Proven track record in designing, implementing, and deploying production-grade AI solutions at scale
  • Strong communication and presentation skills, with the ability to influence and collaborate effectively with non-technical stakeholders
  • Self-motivated and capable of working independently, driving initiatives with minimal supervision
  • Prior experience deploying scalable AI and LLM-based solutions for real-time, high-performance systems is highly desirable
  • Experience with diverse model evaluation techniques (quantitative and qualitative) and an iterative approach to improving AI system performance and user outcomes
  • Expertise in building AI applications using large language models such as OpenAI, Claude, Gemini, LLaMA
  • Experience with LLM orchestration frameworks like LangChain, LangGraph, vLLM, LMDeploy
  • Strong programming skills in Java, Python, and SQL
  • Familiarity with data preprocessing, feature engineering, model evaluation, MLOps, and LLMOps best practices
Job Responsibility
Job Responsibility
  • Develop AI solutions leveraging LLMs to improve productivity and deliver strong business impact
  • Lead the end-to-end development lifecycle from ideation to deployment of AI-powered solutions across various domains
  • Build scalable AI systems that support Omio’s global expansion goals
  • Act as an evangelist for AI adoption by demonstrating clear value to stakeholders
  • Collaborate with Business, Product, and Engineering teams to integrate AI into workflows and drive adoption
  • Present models, results, and systems to both technical and non-technical audiences, including C-level stakeholders
  • Fulltime
Read More
Arrow Right

Clinical Innovation Specialist

We are on a mission to ensure everyone has access to medical expertise, no matte...
Location
Location
Denmark , København
Salary
Salary:
Not provided
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MD with some postgraduate clinical experience
  • A PhD demonstrating high quality research output involving significant data manipulation and analysis
  • The ability to move seamlessly between clinical reasoning and quantitative thinking
  • A genuine love for data, statistics and scientific inquiry, not just as tools but as a way of thinking
  • Curiosity about how things work at a technical level, and humility to learn from engineers and colleagues with very different backgrounds
  • Comfort with ambiguity: you can define problems, propose methods, and move forward without predefined answers
  • Openness to working with agentic coding as a core workflow—you don’t need to be an expert on day one, but curiosity and a genuine desire to learn
Job Responsibility
Job Responsibility
  • Collaborate with our engineers to develop and refine evaluation approaches for clinical AI, both in the backend and in pragmatic evaluations of the final product
  • Run experiments and build prototypes: you will test ideas hands-on and tinker with their settings and architecture
  • Bring statistical and methodological rigour to product decisions, partner discussions, and regulatory inputs
  • Critically assess relevant clinical literature and translate insights into product improvements
  • Represent Corti's scientific approach in selected partner and customer engagements
  • Translate qualitative clinical observations and feedback into measurable, testable evaluation criteria
  • Close the loop between clinician experience and system performance
  • Contribute to prompt and workflow improvements tailored to customer and partner needs
  • Produce high-quality scientific output (case studies, white papers, regulatory contributions, conference presence) that builds Corti's credibility with clinical decision-makers, procurement, and regulators
  • Contribute to clear, compelling artifacts that that communicate our methodology and evidence base
  • Fulltime
Read More
Arrow Right

ML Research Engineer

We are looking for experienced Research Engineers to develop and optimise AI mod...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
Recraft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in training large-scale generative models (e.g., image generation, audio synthesis, or large language models)
  • Expertise in evaluating generative models, with a deep understanding of qualitative and quantitative metrics
  • Strong proficiency in PyTorch and familiarity with state-of-the-art neural network architectures
  • Hands-on experience building end-to-end machine learning pipelines, including data collection and pre-processing, model training and evaluation, inference serving
  • Strong software engineering skills, ability to design, implement and maintain high-quality code
Job Responsibility
Job Responsibility
  • Develop and train large-scale generative models, pushing the boundaries of AI capabilities
  • Design and run experiments to explore and refine model architectures, analyze results, and implement data-driven improvements
  • Continuously innovate by developing and deploying new features that enhance user experiences
What we offer
What we offer
  • Competitive salary and equity
  • Opportunities for professional growth and leadership
  • A collaborative, research-driven environment
  • The chance to work on cutting-edge AI projects that redefine creative workflows
  • The opportunity to contribute to open-source AI research and publications
  • Fulltime
Read More
Arrow Right

Sustainable transport economist

We aim to recruit a junior/intermediate consultant with expertise in transport e...
Location
Location
Italy , Milan
Salary
Salary:
Not provided
trt.it Logo
TRT TRASPORTI E TERRITORIO SRL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master degree or PhD in Economics, Engineering, Statistics or equivalent
  • Solid theoretical knowledge and/or previous work experience in transport policy, energy policy and economics studies
  • Familiarity with quantitative methods
  • Interest in studying and exploring innovative topics
  • Problem solving attitude
  • Excellent verbal and written communication skills (minute taking, drafting reports and delivering presentations)
  • A very good command of English (reading, writing and speaking)
  • Readiness for travel in EU and outside EU
Job Responsibility
Job Responsibility
  • Develop qualitative and quantitative analysis
  • Support the drafting and publication of reports
  • Source and write scientific articles
  • Contribute to communication, information and dissemination tasks within the projects
What we offer
What we offer
  • Possibility to have flexible working arrangements
  • Fulltime
Read More
Arrow Right

AI Engineer

You'll own the core models and prompts that power Gamma. We weave together text,...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 300000.00 USD / Year
gamma.app Logo
Gamma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Prompt hacker: tinkerer who loves seeing how far you can push the limits of a foundation model, with experience building and evaluating prompts at scale
  • Software engineer: Experienced developer comfortable in TypeScript and Python, excited about mixing prompt engineering with traditional software engineering
  • Data-driven: You embrace using data to raise the bar of AI quality, with skills in writing evals, designing metrics, and turning qualitative feedback into quantitative measures
  • Self-sufficient in gathering and cleaning data to inform prompt improvements and model evaluations
Job Responsibility
Job Responsibility
  • Own our existing LLM and image prompts, measuring and continuously improving quality at scale
  • Develop complex prompts for new features using AI JSX, balancing creativity with reliability
  • Build evaluation frameworks for our prompts and models, monitoring metrics and qualitative feedback to create better test sets
  • Drive the roadmap based on quality gaps, constantly evaluating new frontier models and methods
  • Curate datasets for fine-tuning open source models and launch new modalities like voice and video
  • Build analytics and tracking systems while owning uptime, latency, and costs across our AI infrastructure
What we offer
What we offer
  • competitive equity
  • Fulltime
Read More
Arrow Right