Qualitative Evaluation Engineer Job at Luma AI (Palo Alto)

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is building the fastest, most capable open-source-aligned LLMs and i...

Location

United States , San Francisco

Salary:

220000.00 - 270000.00 USD / Year

Together AI

Expiration Date

Until further notice

Requirements

Strong engineering skills with Python, evaluation tooling, and distributed workflows
Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
Experience designing experiments, building datasets, and interpreting noisy behavioral signals
Understanding of function calling and structured output formats
Familiarity with GPU or distributed compute environments
Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
Experience with multi-turn or multi-step reasoning tasks
Familiarity with inference systems, distributed infrastructure, or post-training workflows
Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures

Job Responsibility

Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery
Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences
Tool-augmented interactions — search, retrieval, code execution, API-driven actions
Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

What we offer

competitive compensation
startup equity
health insurance
other benefits

Fulltime

New

AI Technical Product Manager

At HubX we build mobile apps, used and loved by millions all around the world. W...

Location

Turkey , Istanbul

Salary:

Not provided

HubX

Expiration Date

Until further notice

Requirements

BS/MS in Software/Computer/Electronics/Industrial Engineering or related disciplines
Experience using and applying Large Language Models. You should know how to prompt-engineer them, how to evaluate them, and understand how they are tuned
Hands-on experience building or using LLM-powered agentic solutions
A healthy blend of general technical knowledge with a customer-focused mindset and empathy
Great communication skills and background knowledge for customers, stakeholders, and technical team
General knowledge of the ins and outs of development, UI/UX design and flows, agile framework, software development, scrum approach, and IT infrastructure
A desire to work in a fast-paced, results-driven environment

Job Responsibility

Creating business analysis documents, test cases, mockups, and prototypes to assist with development, including leveraging AI tools (e.g., LLMs) to accelerate documentation, generate scenarios, and validate edge cases
Analyzing metrics to understand customer and product performance, incorporating AI-driven insights (e.g., model outputs, user interactions with AI features, prompt success rates) to guide product decisions
Staying in sync with team members using agile approaches, while adapting workflows to support rapid experimentation and iteration of AI-powered features
Attending grooming sessions with the development team to clarify requirements, including defining AI-specific behaviors (e.g., prompts, expected outputs, fallback scenarios) and ensuring the team understands both product and model-related objectives
Partnering closely with technical teams by bringing clear, structured requirements to engineers, and actively contributing to discussions around LLM capabilities, constraints, and implementation trade-offs
Running qualitative and quantitative research to define ambiguous problems, including evaluating where AI/LLMs can provide value and validating hypotheses through data and experimentation
Designing, testing, and iterating on prompt strategies
understanding prompt engineering techniques and how they impact model performance and user experience
Evaluating LLM outputs using structured approaches (e.g., accuracy, consistency, safety, and relevance), and working with teams to improve model performance through fine-tuning or prompt adjustments
Collaborating on or contributing to the development of LLM-powered or agentic solutions, understanding how multi-step AI workflows operate and how they integrate into the product experience

What we offer

Huge impact
Ownership and opportunity to take responsibility from day one
A competitive compensation package
A brand new Macbook and welcome kit
Private Medical Insurance & HPV Vaccine & Critical Women’s Health Coverage
Gym Reimbursement
A unique and top-notch office
Unlimited coffee from XPresso
Limitless Snacks & breakfast
Continuous education

Fulltime

New

Sr Analysts, Credit Risk Mgmt

Sr Analysts, Credit Risk Management is located in Bellevue, WA will analyze cust...

Location

United States , Bellevue

Salary:

146182.00 - 153800.00 USD / Year

T-Mobile

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Computer Programming, Computer Engineering, Business Administration, or related, and 5 years of relevant work experience in any occupation in which the required experience is gained
Master's degree in Computer Science, Computer Programming, Computer Engineering, Business Administration, or related, and 3 years of experience in any occupation in which the required experience is gained
SQL, Excel VBA or analytical programming language R to manipulate and analyze large-scale datasets, derive critical insights, and translate complex findings into clear, actionable recommendations tailored for Executive Leadership
Snowflake, CV, CUW, or Teradata to extract, transform, and integrate data from multiple sources, with knowledge in managing the full Exploratory Data Analysis (EDA) lifecycle, including advanced querying, feature engineering, and building/managing data tables to ensure accuracy
Lead data-driven initiatives from requirements gathering to analytical framework design, with proficiency in leveraging statistical methods in R or Python including decision trees, regression models, and K-means clustering to enhance customer segmentation and credit risk strategies
Develop executive-level visualizations and performance tracking dashboards using Tableau, Power BI, SQL, Excel/VBA, and Python including pandas, matplotlib, and seaborn by performing ETL/data engineering including Extraction, Transformation, and Loading to deliver insights and monitor key metrics
Develop financial models and forecasts: Customer Lifetime Value prediction, cohort analysis, scenario modeling, using Excel, SQL, Python including NumPy, pandas, and scikit-learn to evaluate strategies and drive growth
Manage credit risk and underwriting by applying knowledge of credit structures and leveraging transactional, payment, and consumer behavioral data to build predictive models and develop optimization models in Excel and Python to design and implement new credit initiatives
At least 18 years of age
Legally authorized to work in the United States

Job Responsibility

Forecast financial trends to support strategic decision-making
Evaluate and optimize the effectiveness of credit policies and outcomes
Develop customer risk segments to improve credit management and performance
Utilize statistical segmentation techniques to identify new opportunities
Performing complex qualitative and quantitative analysis of credit polices to ensure financial goals are being attained
Developing predictive financial and analytical models using the appropriate statistical methodologies, including trend and regression analysis
Participate and perform the analysis of new data and statistical products by external vendors
Performing loss forecasting analysis
Extracting, processing and transforming data from multiple disparate sources
Analyzing credit bureau data and alternative credit data

What we offer

Annual stock grant
Employee stock purchase plan
401(k)
Free year-round money coaches
Annual bonus or periodic sales incentive or bonus
Medical insurance
Dental insurance
Vision insurance
Flexible spending account
Paid time off

Fulltime

New

Product Managers, Technical

Product Managers, Technical located in Bellevue, WA will utilize statistical seg...

Location

United States , Bellevue

Salary:

134400.00 - 181800.00 USD / Year

T-Mobile

Expiration Date

Until further notice

Requirements

Bachelor's degree in Finance, Economics, Mathematics, Statistics, Computer Engineering, or related field, or its foreign equivalent, and 5 years of relevant work experience
Master's degree in Finance, Economics, Mathematics, Statistics, Computer Engineering, or related field, or its foreign equivalent, and 3 years of relevant work experience
Utilizing SAS, performing Data mining and quantitative analysis
Design and configure SAS scripts to use in Volume batch testing to have flawless releases
Performing Business Requirements Elicitation, Process Modeling, Gap analysis, Risk analysis and development of Functional Specifications and Traceability Matrix using JIRA Align, JIRA, SharePoint, and qTest
Designing and Solutioning credit decisioning systems using the FICO platform, with experience in configuring, testing, and supporting roles
Designing, Configuring, executing test strategy, test scenarios and validate test results
Supporting BAU releases (in-house and vendor based) on web-based vendor platforms and executing business strategies around them
Presenting technical solutions, anticipating the implication and consequences of situation and taking appropriation action
At least 18 years of age

Job Responsibility

Performing complex qualitative and quantitative analysis of credit policies to ensure financial goals are being attained
Implement credit strategies using FICO platform for credit decisioning
Solution and design credit decisioning system for TMobile
Developing complex predictive financial and analytical models
Evaluating new risk products offered by vendors
Perform trend/regression analysis and forecasting
Extracting data from multiple disparate sources
Responsible for other Duties and Projects as assigned by business management as needed

What we offer

Annual stock grant
Employee stock purchase plan
401(k)
Access to free, year-round money coaches
Annual bonus or periodic sales incentive or bonus
Medical, dental and vision insurance
Flexible spending account
Paid time off and up to 12 paid holidays
Paid parental and family leave
Family building benefits

Fulltime

New

Senior Human Factors Engineer

This is where new knowledge is discovered. Baxter’s Research and Development tea...

Location

United States , Batesville; Cincinnati

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree or equivalent experience in Human Factors Engineering, Human-Computer Interaction, Psychology, Biomedical Engineering, Industrial Engineering, or a related area
3+ years of human factors/usability engineering experience in medical devices or another regulated industry
Experience applying human factors methods, including user research, task analysis, use-related risk analysis, and usability testing
Knowledge of IEC 62366-1, FDA Human Factors Guidance, ANSI/AAMI HE75, ISO 14971, and medical device design controls
Experience authoring human factors documentation, protocols, and reports to support development controls and regulatory submissions
Strong understanding of user-centered design principles, usability evaluation methods, and qualitative data analysis
Ability to independently solve complex usability challenges, develop actionable recommendations, and collaborate across cross-functional teams
Applicants must be authorized to work for any employer in the U.S.

Job Responsibility

Plan and complete human factors engineering activities, including project plans, timelines, and deliverables
Conduct use-related risk assessments, identify potential use errors, and support risk mitigation activities
Plan, complete, and detail formative and summative usability studies in compliance with applicable regulations and standards
Develop and maintain human factors documentation, including user needs, task analyses, interface specifications, and study reports
Communicate human factors progress, findings, and risks to project teams and key collaborators
Lead and deliver human factors workstreams for complex programs or multiple concurrent projects with minimal supervision
Collaborate with R&D, marketing, clinical, and engineering teams to evaluate concepts, improve usability, and drive user-centered design decisions
Apply human factors, usability, and user experience guidelines to support continuous improvement and successful product development

What we offer

Support for Parents
Continuing Education/ Professional Development
Employee Heath & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
medical and dental coverage that start on day one
basic life, accident, short-term and long-term disability, and business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan (RSP)
Flexible Spending Accounts

New

Product Manager - Treasury

We are looking for an ambitious, driven Product Manager to join our team. At Sok...

Location

Serbia , Belgrade

Salary:

Not provided

Sokin

Expiration Date

Until further notice

Requirements

Excited by, and have experience in, building products that customers love within the fintech space
Driven to make a difference and want to fully own your product roadmap
Someone that enjoys getting into the details and understanding the mechanics of your product and the data behind it
Comfortable working closely with financial, risk, and legal stakeholders, ensuring products are designed in a user-centric way while remaining compliant
Enjoy working in cross-functional teams alongside engineers, designers, and other functions to deliver value to users
Have a good understanding of technical concepts and can work closely with engineers to find the best solution to a problem
Love solving problems in innovative and creative ways
Will have the right to work in the jurisdiction that they are looking to work in

Job Responsibility

Own the strategy, roadmap, and performance of Sokin's Treasury product, focused on solving the complexity businesses face when managing cash across multiple currencies, banks, and entities
Define how we evolve from a transactional payments platform into a unified treasury solution, identifying the highest-impact opportunities to improve liquidity visibility, reduce operational friction, and enable customers to manage their cash more effectively
Deeply understand customer workflows, defining clear problem statements, and prioritising the initiatives that deliver the greatest value for both customers and the business
Shape and deliver a 'single pane of glass' experience for treasury
Work closely with design and engineering to simplify how customers view balances, move funds, execute FX, and manage liquidity across accounts
Break down complex treasury problems into intuitive product experiences, from real-time cash visibility and reporting, to automation such as sweeping rules and intelligent fund movements
Evaluate dependencies across banking partners, payment rails, and data integrations, ensuring we build a scalable and reliable platform that works seamlessly across regions
Deliver measurable improvements in how customers adopt and use treasury capabilities, increasing wallet activity, FX volumes, and overall capital efficiency, while reducing manual processes and time spent managing cash
Define and own key metrics, continuously test and iterate on product improvements, and partner cross-functionally with operations, compliance, and commercial teams to ensure we balance usability, control, and regulatory requirements
Play a key role in positioning Sokin as a strategic financial platform, not just a payments provider, driving long-term customer value and revenue growth

Fulltime

New

Mid-Level Model Based Systems Engineer

Location

United States , Crane

Salary:

110000.00 - 180000.00 USD / Year

Amentum

Expiration Date

Until further notice

Requirements

6-10 years of Systems Engineering experience
3-5 years of experience creating SysML models, analyses, and simulation using Cameo Systems Modeler
Able to generate CDRL documents from the MBSE models such as Requirements, Architecture, Interface Documents
Must have an Active US Government Top Secret Clearance with the ability to obtain and maintain SCI eligibility. Please note US Citizenship is required to obtain a Secret/ TS/SCI Clearance.
Bachelor's degree from ABET-accredited engineering program, or computer science major, or mathematics major
Bachelor's degree with 8+ years of SE and/or MBSE experience, master's degree with 5+ years of MBSE experience

Job Responsibility

Apply MBSE methodologies using SysML and supporting MBSE tools to capture, maintain, and visualize complete system solutions within a unified digital model, enabling clear design communication and automated integration across engineering domains
Develop models of complex system architectures using standards-based languages (SysML)
Support architecture evaluations through both qualitative and quantitative analysis methods
Identify and characterize uncertainties within system architectures and define associated risks and opportunities
Contribute to requirements management, interface management, and architecture change control processes
Convert analytical findings into clear, actionable recommendations for U.S. Government stakeholders

What we offer

Health, dental, and vision insurance
Paid time off and holidays
Retirement benefits (including 401(k) matching)
Educational reimbursement
Parental leave
Employee stock purchase plan
Tax-saving options
Disability and life insurance
Pet insurance

Fulltime

New

Senior Human Factors Engineer

This role involves collaborating with clinicians, engineers, and cross-functiona...

Location

United States , Batesville

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree or equivalent experience in Human Factors Engineering, Human-Computer Interaction, Psychology, Biomedical Engineering, Industrial Engineering, or a related area
3+ years of human factors/usability engineering experience in medical devices or another regulated industry
Experience applying human factors methods, including user research, task analysis, use-related risk analysis, and usability testing
Knowledge of IEC 62366-1, FDA Human Factors Guidance, ANSI/AAMI HE75, ISO 14971, and medical device design controls
Experience authoring human factors documentation, protocols, and reports to support development controls and regulatory submissions
Strong understanding of user-centered design principles, usability evaluation methods, and qualitative data analysis
Ability to independently solve complex usability challenges, develop actionable recommendations, and collaborate across cross-functional teams
Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time
Advanced degree or equivalent experience in a related field along with experience conducting IRB-reviewed studies, FDA submissions, or human factors research within clinical or healthcare settings

Job Responsibility

Plan and complete human factors engineering activities, including project plans, timelines, and deliverables
Conduct use-related risk assessments, identify potential use errors, and support risk mitigation activities
Plan, complete, and detail formative and summative usability studies in compliance with applicable regulations and standards
Develop and maintain human factors documentation, including user needs, task analyses, interface specifications, and study reports
Communicate human factors progress, findings, and risks to project teams and key collaborators
Lead and deliver human factors workstreams for complex programs or multiple concurrent projects with minimal supervision
Collaborate with R&D, marketing, clinical, and engineering teams to evaluate concepts, improve usability, and drive user-centered design decisions
Apply human factors, usability, and user experience guidelines to support continuous improvement and successful product development

What we offer

Support for Parents
Continuing Education/ Professional Development
Employee Heath & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
medical and dental coverage that start on day one
insurance coverage for basic life, accident, short-term and long-term disability, and business travel accident insurance
Employee Stock Purchase Plan (ESPP), with the ability to purchase company stock at a discount
401(k) Retirement Savings Plan (RSP), with options for employee contributions and company matching
Flexible Spending Accounts

Fulltime

Select Country

Qualitative Evaluation Engineer

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?