CrawlJobs Logo

Qualitative Evaluation Engineer

United States, Palo Alto · Job Posted January 13, 2026
Apply Position
Job Link Share

Job Description

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development. This is not a checkbox metrics role - it's about building evaluative systems that match the complexity of human perception, creativity, and intention.

Job Responsibility

  • Evaluate generative model performance across diverse tasks, prompts, and modalities
  • Identify key failure modes, regression patterns, and edge cases that impact product quality
  • Develop and maintain qualitative evaluation frameworks that are scalable and reusable
  • Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases
  • Translate high-level product goals into concrete evaluative criteria
  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts
  • Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX
  • Stay informed about emerging evaluation standards in generative AI and creative tools

Requirements

  • Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field
  • 5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment
  • Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX)
  • Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms
  • Experience working cross-functionally with engineers, researchers, and creatives
  • Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights

Nice to have

  • Background in motion, visual effects, or storytelling pipelines
  • Experience evaluating AI-generated media (video, images, 3D)
  • Prior work on building internal tools for qualitative data collection or scoring
  • Familiarity with prompt engineering and reference-based input methods

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Qualitative Evaluation Engineer

8 matching positions

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is building the fastest, most capable open-source-aligned LLMs and i...
Location
Location
United States , San Francisco
Salary
Salary:
220000.00 - 270000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong engineering skills with Python, evaluation tooling, and distributed workflows
  • Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
  • Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
  • Experience designing experiments, building datasets, and interpreting noisy behavioral signals
  • Understanding of function calling and structured output formats
  • Familiarity with GPU or distributed compute environments
  • Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
  • Experience with multi-turn or multi-step reasoning tasks
  • Familiarity with inference systems, distributed infrastructure, or post-training workflows
  • Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures
Job Responsibility
Job Responsibility
  • Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
  • Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
  • Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
  • Tool-augmented interactions — search, retrieval, code execution, API-driven actions
  • Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
  • Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
  • Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
  • Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
  • Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • Fulltime
Read More
Arrow Right
New

AI Technical Product Manager

At HubX we build mobile apps, used and loved by millions all around the world. W...
Location
Location
Turkey , Istanbul
Salary
Salary:
Not provided
hubx.co Logo
HubX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Software/Computer/Electronics/Industrial Engineering or related disciplines
  • Experience using and applying Large Language Models. You should know how to prompt-engineer them, how to evaluate them, and understand how they are tuned
  • Hands-on experience building or using LLM-powered agentic solutions
  • A healthy blend of general technical knowledge with a customer-focused mindset and empathy
  • Great communication skills and background knowledge for customers, stakeholders, and technical team
  • General knowledge of the ins and outs of development, UI/UX design and flows, agile framework, software development, scrum approach, and IT infrastructure
  • A desire to work in a fast-paced, results-driven environment
Job Responsibility
Job Responsibility
  • Creating business analysis documents, test cases, mockups, and prototypes to assist with development, including leveraging AI tools (e.g., LLMs) to accelerate documentation, generate scenarios, and validate edge cases
  • Analyzing metrics to understand customer and product performance, incorporating AI-driven insights (e.g., model outputs, user interactions with AI features, prompt success rates) to guide product decisions
  • Staying in sync with team members using agile approaches, while adapting workflows to support rapid experimentation and iteration of AI-powered features
  • Attending grooming sessions with the development team to clarify requirements, including defining AI-specific behaviors (e.g., prompts, expected outputs, fallback scenarios) and ensuring the team understands both product and model-related objectives
  • Partnering closely with technical teams by bringing clear, structured requirements to engineers, and actively contributing to discussions around LLM capabilities, constraints, and implementation trade-offs
  • Running qualitative and quantitative research to define ambiguous problems, including evaluating where AI/LLMs can provide value and validating hypotheses through data and experimentation
  • Designing, testing, and iterating on prompt strategies
  • understanding prompt engineering techniques and how they impact model performance and user experience
  • Evaluating LLM outputs using structured approaches (e.g., accuracy, consistency, safety, and relevance), and working with teams to improve model performance through fine-tuning or prompt adjustments
  • Collaborating on or contributing to the development of LLM-powered or agentic solutions, understanding how multi-step AI workflows operate and how they integrate into the product experience
What we offer
What we offer
  • Huge impact
  • Ownership and opportunity to take responsibility from day one
  • A competitive compensation package
  • A brand new Macbook and welcome kit
  • Private Medical Insurance & HPV Vaccine & Critical Women’s Health Coverage
  • Gym Reimbursement
  • A unique and top-notch office
  • Unlimited coffee from XPresso
  • Limitless Snacks & breakfast
  • Continuous education
  • Fulltime
Read More
Arrow Right
New

Sr Analysts, Credit Risk Mgmt

Sr Analysts, Credit Risk Management is located in Bellevue, WA will analyze cust...
Location
Location
United States , Bellevue
Salary
Salary:
146182.00 - 153800.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Programming, Computer Engineering, Business Administration, or related, and 5 years of relevant work experience in any occupation in which the required experience is gained
  • Master's degree in Computer Science, Computer Programming, Computer Engineering, Business Administration, or related, and 3 years of experience in any occupation in which the required experience is gained
  • SQL, Excel VBA or analytical programming language R to manipulate and analyze large-scale datasets, derive critical insights, and translate complex findings into clear, actionable recommendations tailored for Executive Leadership
  • Snowflake, CV, CUW, or Teradata to extract, transform, and integrate data from multiple sources, with knowledge in managing the full Exploratory Data Analysis (EDA) lifecycle, including advanced querying, feature engineering, and building/managing data tables to ensure accuracy
  • Lead data-driven initiatives from requirements gathering to analytical framework design, with proficiency in leveraging statistical methods in R or Python including decision trees, regression models, and K-means clustering to enhance customer segmentation and credit risk strategies
  • Develop executive-level visualizations and performance tracking dashboards using Tableau, Power BI, SQL, Excel/VBA, and Python including pandas, matplotlib, and seaborn by performing ETL/data engineering including Extraction, Transformation, and Loading to deliver insights and monitor key metrics
  • Develop financial models and forecasts: Customer Lifetime Value prediction, cohort analysis, scenario modeling, using Excel, SQL, Python including NumPy, pandas, and scikit-learn to evaluate strategies and drive growth
  • Manage credit risk and underwriting by applying knowledge of credit structures and leveraging transactional, payment, and consumer behavioral data to build predictive models and develop optimization models in Excel and Python to design and implement new credit initiatives
  • At least 18 years of age
  • Legally authorized to work in the United States
Job Responsibility
Job Responsibility
  • Forecast financial trends to support strategic decision-making
  • Evaluate and optimize the effectiveness of credit policies and outcomes
  • Develop customer risk segments to improve credit management and performance
  • Utilize statistical segmentation techniques to identify new opportunities
  • Performing complex qualitative and quantitative analysis of credit polices to ensure financial goals are being attained
  • Developing predictive financial and analytical models using the appropriate statistical methodologies, including trend and regression analysis
  • Participate and perform the analysis of new data and statistical products by external vendors
  • Performing loss forecasting analysis
  • Extracting, processing and transforming data from multiple disparate sources
  • Analyzing credit bureau data and alternative credit data
What we offer
What we offer
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Free year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Flexible spending account
  • Paid time off
  • Fulltime
Read More
Arrow Right
New

Product Managers, Technical

Product Managers, Technical located in Bellevue, WA will utilize statistical seg...
Location
Location
United States , Bellevue
Salary
Salary:
134400.00 - 181800.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Finance, Economics, Mathematics, Statistics, Computer Engineering, or related field, or its foreign equivalent, and 5 years of relevant work experience
  • Master's degree in Finance, Economics, Mathematics, Statistics, Computer Engineering, or related field, or its foreign equivalent, and 3 years of relevant work experience
  • Utilizing SAS, performing Data mining and quantitative analysis
  • Design and configure SAS scripts to use in Volume batch testing to have flawless releases
  • Performing Business Requirements Elicitation, Process Modeling, Gap analysis, Risk analysis and development of Functional Specifications and Traceability Matrix using JIRA Align, JIRA, SharePoint, and qTest
  • Designing and Solutioning credit decisioning systems using the FICO platform, with experience in configuring, testing, and supporting roles
  • Designing, Configuring, executing test strategy, test scenarios and validate test results
  • Supporting BAU releases (in-house and vendor based) on web-based vendor platforms and executing business strategies around them
  • Presenting technical solutions, anticipating the implication and consequences of situation and taking appropriation action
  • At least 18 years of age
Job Responsibility
Job Responsibility
  • Performing complex qualitative and quantitative analysis of credit policies to ensure financial goals are being attained
  • Implement credit strategies using FICO platform for credit decisioning
  • Solution and design credit decisioning system for TMobile
  • Developing complex predictive financial and analytical models
  • Evaluating new risk products offered by vendors
  • Perform trend/regression analysis and forecasting
  • Extracting data from multiple disparate sources
  • Responsible for other Duties and Projects as assigned by business management as needed
What we offer
What we offer
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Family building benefits
  • Fulltime
Read More
Arrow Right
New

Senior Human Factors Engineer

This is where new knowledge is discovered. Baxter’s Research and Development tea...
Location
Location
United States , Batesville; Cincinnati
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent experience in Human Factors Engineering, Human-Computer Interaction, Psychology, Biomedical Engineering, Industrial Engineering, or a related area
  • 3+ years of human factors/usability engineering experience in medical devices or another regulated industry
  • Experience applying human factors methods, including user research, task analysis, use-related risk analysis, and usability testing
  • Knowledge of IEC 62366-1, FDA Human Factors Guidance, ANSI/AAMI HE75, ISO 14971, and medical device design controls
  • Experience authoring human factors documentation, protocols, and reports to support development controls and regulatory submissions
  • Strong understanding of user-centered design principles, usability evaluation methods, and qualitative data analysis
  • Ability to independently solve complex usability challenges, develop actionable recommendations, and collaborate across cross-functional teams
  • Applicants must be authorized to work for any employer in the U.S.
Job Responsibility
Job Responsibility
  • Plan and complete human factors engineering activities, including project plans, timelines, and deliverables
  • Conduct use-related risk assessments, identify potential use errors, and support risk mitigation activities
  • Plan, complete, and detail formative and summative usability studies in compliance with applicable regulations and standards
  • Develop and maintain human factors documentation, including user needs, task analyses, interface specifications, and study reports
  • Communicate human factors progress, findings, and risks to project teams and key collaborators
  • Lead and deliver human factors workstreams for complex programs or multiple concurrent projects with minimal supervision
  • Collaborate with R&D, marketing, clinical, and engineering teams to evaluate concepts, improve usability, and drive user-centered design decisions
  • Apply human factors, usability, and user experience guidelines to support continuous improvement and successful product development
What we offer
What we offer
  • Support for Parents
  • Continuing Education/ Professional Development
  • Employee Heath & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • medical and dental coverage that start on day one
  • basic life, accident, short-term and long-term disability, and business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan (RSP)
  • Flexible Spending Accounts
Read More
Arrow Right
New

Product Manager - Treasury

We are looking for an ambitious, driven Product Manager to join our team. At Sok...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
sokin.com Logo
Sokin
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Excited by, and have experience in, building products that customers love within the fintech space
  • Driven to make a difference and want to fully own your product roadmap
  • Someone that enjoys getting into the details and understanding the mechanics of your product and the data behind it
  • Comfortable working closely with financial, risk, and legal stakeholders, ensuring products are designed in a user-centric way while remaining compliant
  • Enjoy working in cross-functional teams alongside engineers, designers, and other functions to deliver value to users
  • Have a good understanding of technical concepts and can work closely with engineers to find the best solution to a problem
  • Love solving problems in innovative and creative ways
  • Will have the right to work in the jurisdiction that they are looking to work in
Job Responsibility
Job Responsibility
  • Own the strategy, roadmap, and performance of Sokin's Treasury product, focused on solving the complexity businesses face when managing cash across multiple currencies, banks, and entities
  • Define how we evolve from a transactional payments platform into a unified treasury solution, identifying the highest-impact opportunities to improve liquidity visibility, reduce operational friction, and enable customers to manage their cash more effectively
  • Deeply understand customer workflows, defining clear problem statements, and prioritising the initiatives that deliver the greatest value for both customers and the business
  • Shape and deliver a 'single pane of glass' experience for treasury
  • Work closely with design and engineering to simplify how customers view balances, move funds, execute FX, and manage liquidity across accounts
  • Break down complex treasury problems into intuitive product experiences, from real-time cash visibility and reporting, to automation such as sweeping rules and intelligent fund movements
  • Evaluate dependencies across banking partners, payment rails, and data integrations, ensuring we build a scalable and reliable platform that works seamlessly across regions
  • Deliver measurable improvements in how customers adopt and use treasury capabilities, increasing wallet activity, FX volumes, and overall capital efficiency, while reducing manual processes and time spent managing cash
  • Define and own key metrics, continuously test and iterate on product improvements, and partner cross-functionally with operations, compliance, and commercial teams to ensure we balance usability, control, and regulatory requirements
  • Play a key role in positioning Sokin as a strategic financial platform, not just a payments provider, driving long-term customer value and revenue growth
  • Fulltime
Read More
Arrow Right
New

Mid-Level Model Based Systems Engineer

Location
Location
United States , Crane
Salary
Salary:
110000.00 - 180000.00 USD / Year
amentum.com Logo
Amentum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of Systems Engineering experience
  • 3-5 years of experience creating SysML models, analyses, and simulation using Cameo Systems Modeler
  • Able to generate CDRL documents from the MBSE models such as Requirements, Architecture, Interface Documents
  • Must have an Active US Government Top Secret Clearance with the ability to obtain and maintain SCI eligibility. Please note US Citizenship is required to obtain a Secret/ TS/SCI Clearance.
  • Bachelor's degree from ABET-accredited engineering program, or computer science major, or mathematics major
  • Bachelor's degree with 8+ years of SE and/or MBSE experience, master's degree with 5+ years of MBSE experience
Job Responsibility
Job Responsibility
  • Apply MBSE methodologies using SysML and supporting MBSE tools to capture, maintain, and visualize complete system solutions within a unified digital model, enabling clear design communication and automated integration across engineering domains
  • Develop models of complex system architectures using standards-based languages (SysML)
  • Support architecture evaluations through both qualitative and quantitative analysis methods
  • Identify and characterize uncertainties within system architectures and define associated risks and opportunities
  • Contribute to requirements management, interface management, and architecture change control processes
  • Convert analytical findings into clear, actionable recommendations for U.S. Government stakeholders
What we offer
What we offer
  • Health, dental, and vision insurance
  • Paid time off and holidays
  • Retirement benefits (including 401(k) matching)
  • Educational reimbursement
  • Parental leave
  • Employee stock purchase plan
  • Tax-saving options
  • Disability and life insurance
  • Pet insurance
  • Fulltime
Read More
Arrow Right
New

Senior Human Factors Engineer

This role involves collaborating with clinicians, engineers, and cross-functiona...
Location
Location
United States , Batesville
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent experience in Human Factors Engineering, Human-Computer Interaction, Psychology, Biomedical Engineering, Industrial Engineering, or a related area
  • 3+ years of human factors/usability engineering experience in medical devices or another regulated industry
  • Experience applying human factors methods, including user research, task analysis, use-related risk analysis, and usability testing
  • Knowledge of IEC 62366-1, FDA Human Factors Guidance, ANSI/AAMI HE75, ISO 14971, and medical device design controls
  • Experience authoring human factors documentation, protocols, and reports to support development controls and regulatory submissions
  • Strong understanding of user-centered design principles, usability evaluation methods, and qualitative data analysis
  • Ability to independently solve complex usability challenges, develop actionable recommendations, and collaborate across cross-functional teams
  • Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time
  • Advanced degree or equivalent experience in a related field along with experience conducting IRB-reviewed studies, FDA submissions, or human factors research within clinical or healthcare settings
Job Responsibility
Job Responsibility
  • Plan and complete human factors engineering activities, including project plans, timelines, and deliverables
  • Conduct use-related risk assessments, identify potential use errors, and support risk mitigation activities
  • Plan, complete, and detail formative and summative usability studies in compliance with applicable regulations and standards
  • Develop and maintain human factors documentation, including user needs, task analyses, interface specifications, and study reports
  • Communicate human factors progress, findings, and risks to project teams and key collaborators
  • Lead and deliver human factors workstreams for complex programs or multiple concurrent projects with minimal supervision
  • Collaborate with R&D, marketing, clinical, and engineering teams to evaluate concepts, improve usability, and drive user-centered design decisions
  • Apply human factors, usability, and user experience guidelines to support continuous improvement and successful product development
What we offer
What we offer
  • Support for Parents
  • Continuing Education/ Professional Development
  • Employee Heath & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • medical and dental coverage that start on day one
  • insurance coverage for basic life, accident, short-term and long-term disability, and business travel accident insurance
  • Employee Stock Purchase Plan (ESPP), with the ability to purchase company stock at a discount
  • 401(k) Retirement Savings Plan (RSP), with options for employee contributions and company matching
  • Flexible Spending Accounts
  • Fulltime
Read More
Arrow Right