CrawlJobs Logo

Model Evaluation QA Lead

United States 180000.00 - 230000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

As Model Evaluation QA Lead, you’ll be the technical owner of model quality assurance across Deepgram’s AI pipeline—from pre-training data validation and provenance through post-deployment monitoring. Reporting to the QA Engineering Manager, you will partner directly with our Active Learning and Data Ops teams to build and operate the evaluation infrastructure that ensures every model Deepgram ships meets objective quality bars across languages, domains, and deployment contexts. This is a hands-on, high-impact role at the intersection of QA engineering and ML operations. You will design automated evaluation frameworks, integrate model quality gates into release pipelines, and drive industry-standard benchmarking—ensuring Deepgram maintains its position as the accuracy and latency leader in voice AI.

Job Responsibility

  • Model Evaluation Automation: Design, build, and maintain automated model evaluation pipelines that run against every candidate model before release
  • Release Gate Integration: Embed model quality checkpoints into CI/CD and release pipelines
  • Agent & Model Evaluation Frameworks: Stand up and operate evaluation tooling for end-to-end voice agent testing
  • Active Learning & Data Ingestion Testing: Partner with the Active Learning team to validate data ingestion infrastructure, annotation pipelines, and retraining automation
  • Industry Benchmark Automation: Automate execution and reporting of industry-standard benchmarks
  • Language & Domain Validation: Build and maintain test suites for multi-language and domain-specific model validation
  • Retraining Automation Support: Validate the end-to-end retraining pipeline across all data sources
  • Manual Test Feedback Loop: Design and operate human-in-the-loop evaluation workflows for subjective quality assessment

Requirements

  • 4–7 years of experience in QA engineering, ML evaluation, or a related technical role with a focus on predictive and generative model and data quality
  • Hands-on experience building automated test/evaluation pipelines for ML models and connecting software features
  • Strong programming skills in Python
  • experience with ML evaluation libraries, data processing frameworks (Pandas, NumPy), and scripting for pipeline automation
  • Familiarity with speech/audio ML concepts: WER, SER, MOS, acoustic models, language models, or similar evaluation metrics
  • Experience with CI/CD integration for ML workflows (e.g., GitHub Actions, Jenkins, Argo, MLflow, or equivalent)
  • Ability to design and maintain reproducible benchmark environments across multiple model versions and configurations
  • Strong communication skills—you can translate model quality metrics into actionable insights for engineering, research, and product stakeholders
  • Detail-oriented and systematic, with a bias toward automation over manual process

Nice to have

  • Experience with model evaluation platforms (Coval, Braintrust, Weights & Biases, or custom evaluation harnesses)
  • Background in speech recognition, NLP, or audio processing domains
  • Experience with distributed evaluation at scale—running evals across GPU clusters or large dataset partitions
  • Familiarity with human-in-the-loop evaluation design and annotation pipeline tooling
  • Experience with multi-language model evaluation and localization quality assurance
  • Prior work in a company where ML model quality directly impacted revenue or customer SLAs

What we offer

  • Medical, dental, vision benefits
  • Annual wellness stipend
  • Mental health support
  • Life, STD, LTD Income Insurance Plans
  • Unlimited PTO
  • Generous paid parental leave
  • Flexible schedule
  • 12 Paid US company holidays
  • Quarterly personal productivity stipend
  • One-time stipend for home office upgrades
  • 401(k) plan with company match
  • Tax Savings Programs
  • Learning / Education stipend
  • Participation in talks and conferences
  • Employee Resource Groups
  • AI enablement workshops / sessions
  • Offers Equity
  • Offers Bonus
  • 10% annual bonus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Model Evaluation QA Lead

8 matching positions

Spd Technician Qa Lead

The SPD Technician Quality Assurance Lead is accountable for precepting and over...
Location
Location
United States , Madera
Salary
Salary:
26.65 - 37.90 USD / Hour
valleychildrens.org Logo
Valley Children's Healthcare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School Diploma/G.E.D. (required)
  • CIS - Certified Instrument Specialist - HSPA (required)
  • CBSPDT - Certified Sterile Processing and Distribution Technician - Certified Board of Sterile Processing and Distribution (required) Or CRCST - Certified Registered Central Service Technician - HSPA (required)
  • Minimum five (5) years Progressively responsible experience, which includes specialty instrumentation, devices, and equipment. (required)
  • Excellent verbal and written communication skills
  • Computer Skills Demonstrates proficiency in computer skills.
Job Responsibility
Job Responsibility
  • Precepting and overseeing the training of new hires and preceptors, re-evaluation, and documentation in collaboration with the SPD Educator
  • daily audits of sterilization and cleaning equipment to meet requirements (i.e. for CDPH and The Joint Commission), along with collaboration on process improvement with SPD Leadership
  • perform audits of instrumentation, medical devices, and processes
  • performing audits on patient care equipment in CED in collaboration with the CED Supervisor
  • print various reports in an instrument tracking system
  • serve as a role model in Sterile Processing standards and regulations for staff.
  • Fulltime
Read More
Arrow Right

Qa Lead – Ai/Ml Systems

We are looking for an experienced QA Lead – AI Systems to lead the validation an...
Location
Location
India , Pune
Salary
Salary:
Not provided
Codvo AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, Biomedical Engineering, or related field
  • 8–12 years of total QA experience, with 2–3 years directly in AI/ML or GenAI QA
  • Proven experience in AI Model Validation, LLM/RAG Testing, and AI Evaluation Metrics
  • Strong knowledge of MLOps concepts and cloud platforms (Azure ML, AWS Sagemaker, Vertex AI)
  • Understanding of FDA AI/ML Compliance, SaMD testing, and regulated software QA
  • Hands-on expertise in Python automation and testing AI/ML pipelines
  • Excellent documentation and communication skills with ability to produce traceable validation artifacts
Job Responsibility
Job Responsibility
  • Own and drive QA strategy for AI systems across the model, data, and product lifecycle
  • Lead AI model validation efforts, covering LLM/RAG testing, bias analysis, and performance evaluation
  • Define and implement AI Evaluation Metrics (accuracy, fairness, drift, explainability) aligned with business and regulatory expectations
  • Establish frameworks for Explainability Testing (SHAP, LIME, XAI) and ensure interpretability of AI outcomes
  • Collaborate with Data Science and MLOps teams to validate models within cloud environments (Azure ML, AWS Sagemaker, Vertex AI)
  • Drive verification and validation (V&V) for AI models and applications under FDA and SaMD compliance frameworks
  • Ensure test traceability, documentation, and audit readiness in line with ISO 13485, IEC 62304, and ISO 14971
  • Develop Python-based automation for AI testing, data validation, and model evaluation pipelines
  • Provide technical leadership to QA teams, ensuring alignment with AI/ML development best practices
  • Collaborate with cross-functional teams (Product, Data, Regulatory, Engineering) to identify risks, gaps, and opportunities for test optimization
  • Fulltime
Read More
Arrow Right

QA Engineering Lead, AI Native

Meta is seeking a QA Engineering Lead with expertise in AI product and model tes...
Location
Location
United States , Menlo Park
Salary
Salary:
138000.00 - 191000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 5+ years of experience in quality assurance, test engineering, and test automation
  • 1+ years of hands-on experience testing AI-powered products (web, iOS, and/or Android) that generate or transform text, images, and/or voice, including end-to-end feature validation and user experience quality
  • 1+ years of hands-on experience testing, debugging, and evaluating LLM/multimodal model behavior, including defining and applying quality standards for accuracy, relevance, grounding, safety/policy compliance, and cultural/locale sensitivity, and driving model-quality regressions to resolution
  • Experience effectively utilizing AI technologies and tools (e.g., large language models, agents, etc.) to enhance QA workflows
  • Experience collaborating cross-functionally and contributing to technical decisions through influence, communication, and execution
  • Experience changing priorities quickly and adapt effectively in a fast-moving product development cycle
Job Responsibility
Job Responsibility
  • Build and foster a quality-driven engineering environment that enables rapid, confident product releases, ensuring that quality is embedded throughout the development lifecycle
  • Develop and implement robust evaluation processes for AI models, including prompt engineering, scenario-based, and adversarial testing for text, image, and voice AI systems
  • Drive the quality for products and features, assess risks, and ensure features ship with a high quality bar, balancing speed and experience
  • Plan, develop, and execute comprehensive test strategies across core Meta products and platforms, leveraging both manual and automated approaches
  • Lead quality assurance efforts that align with product objectives, developing scalable solutions to support rapid product iteration and deployment
  • Solve cross-platform engineering challenges and contribute impactful ideas to improve quality, reliability, and user experience across diverse product surfaces
  • Implement and evolve QA processes to obtain effective test signals and scale testing efforts across multiple products, ensuring continuous improvement
  • Define quality metrics and implement measurements to determine test effectiveness, testing efficiency, and overall product quality, using data-driven insights to guide decisions
  • Partner with engineering and infrastructure teams to leverage automation for scalable solutions, preventing regressions and ensuring the reliability of products and AI models
  • Apply Responsible AI practices including safety, ethics, alignment, and explainability by building safeguards and quality controls to validate AI outputs, ensuring transparency, and compliance with ethical standards
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right
New

Sr. Staff Machine Learning Engineer

As a Principal Software Engineer, you will provide technical leadership in desig...
Location
Location
United States , Santa Clara
Salary
Salary:
141000.00 - 228075.00 USD / Year
paloaltonetworks.it Logo
Palo Alto Networks Italia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong background on machine learning and ML frameworks (e.g., TensorFlow, PyTorch)
  • Experience with Infrastructure-as-Code (IaC) tools like Terraform or CloudFormation
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 10+ years of software development experience, with a focus on cloud-native and SaaS applications
  • Proven experience in designing and building large-scale, distributed systems on public cloud platforms (AWS, GCP, Azure)
  • Strong proficiency in at least one modern programming language such as Python, Go, or Java
  • Demonstrated experience with the full machine learning lifecycle, including model deployment and MLOps
Job Responsibility
Job Responsibility
  • Provide technical leadership for end-to-end solution delivery, collaborating with cross-functional teams (Product, SRE, QA, and Support) to align engineering efforts with business objectives
  • Drive the development of scalable cloud security architecture through a balance of strategic planning and hands-on coding
  • Establish and evangelize best practices for model versioning, reproducibility, auditing, and compliance to ensure code quality and data privacy across the organization
  • Architect and lead the entire ML lifecycle, from initial development and training to production deployment and real-time inference
  • Build and maintain automated, resilient systems for continuous integration, delivery (CI/CD), and monitoring of backend and machine learning components
  • Continuously evaluate and integrate cutting-edge MLOps tools and frameworks to enhance system scalability, reliability, and efficiency
  • Design and implement robust, next-generation cloud security solutions to resolve complex backend infrastructure and ML model challenges
  • Strategically manage and optimize ML infrastructure and pipelines to improve performance, ensure smooth production integration, and reduce operational costs
  • Fulltime
Read More
Arrow Right
New

Senior Full-Stack Software Machine Learning Engineer

Are you an experienced full-stack software engineer with a passion for building ...
Location
Location
France; Switzerland; United States , Bidart; Rolle; Boston
Salary
Salary:
Not provided
sophiagenetics.com Logo
SOPHiA GENETICS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-7 years’ experience in the web development field
  • Bachelor’s degree in Computer Science or Engineering or equivalent professional experience
  • demonstrated experience in developing reliable web-based services and have a firm grasp on the underlying challenges of releasing web components to production
  • solid proficiency of front-end framework: React, Next JS, Node, JSX
  • proficiency in Java for back-end development
  • solid understanding of backend technologies and microservices architecture
  • experience with databases like MySQL, MariaDB, and DuckDB, (plus for design and query optimization)
  • experience with AI use in SDLC
  • experience with machine learning engineering in a production environment, including building and evaluating RAG pipelines, LLM evaluation methodologies and general lifecycle management
  • familiarity with Linux environments and GitLab for version control and CI/CD pipelines
Job Responsibility
Job Responsibility
  • Lead the development and delivery of web-based solutions that directly support Sophia Genetics’ business objectives in digital healthcare
  • work closely with product, QA, and cross-functional teams to ensure our platform meets market needs, regulatory standards, and performance goals
  • deeply understand business needs and translate them into effective technical solutions
  • develop and optimize web-based services and components, leveraging modern frameworks (React, Next.js, Node.js) and robust backend technologies (Java, microservices)
  • own and drive the full lifecycle of complex, high-performance software systems
  • enhance system performance through database optimization (MySQL, MariaDB, DuckDB) and efficient query design
  • leverage AI and modern tools to enhance SDLC, observability, monitoring, and system intelligence
  • design, integrate and evolve machine learning components into a production environment, including RAG pipelines and ML model evaluation frameworks
  • orchestrate cross-functional projects by collaborating with Product, QA, Architecture, and other teams
  • lead code reviews and mentor junior engineers
What we offer
What we offer
  • Sickness and Accident coverage through Helsana
  • Meal Vouchers at 90CHF PM with our partner cafeteria
  • a fun and engaging work environment, with Rest & Entertainment space, full stocked free coffee machine and free fruit/snacks
  • free parking in an easy to access location
  • a strong social committee
  • Fulltime
Read More
Arrow Right
New

Senior Java Technology Lead - Vice President

The Applications Development Technology Lead Analyst is a senior level position ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree/University degree or equivalent experience, Master's degree preferred
  • 12+ years of relevant Technology experience in Application Development with at least 3+ years in technical leadership
  • Extensive experience in system analysis and in programming of software applications
  • Strong experience in implementing and delivering successful projects
  • Subject Matter Expert (SME) in Securities Financing Transactions Processing is a plus
  • Ability to adjust priorities quickly as circumstances dictate
  • Demonstrated Strong leadership skills
  • Consistently demonstrates clear and concise written and verbal communication
  • Strong core Java skills, JDBC/JPA, Restful web services
  • Strong Experience with frameworks like Hibernate, Junit, Spring Boot/Microservice-style application development
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Own end-to-end solution architecture, including application design, system patterns, data models, APIs, and integration strategies aligned to industry-specific requirements
  • Design and build highly scalable, resilient, and secure enterprise applications using modern architectural patterns (microservices, event-driven, cloud-native)
What we offer
What we offer
  • Discover the top benefits offered to our global workforce, designed to support your well-being, growth and work-life balance
  • Fulltime
Read More
Arrow Right
New

Senior Business Analyst

Join NLS as a Senior Business Analyst! Lead our Implementation team, mentor BAs,...
Location
Location
United States;Puerto Rico;Honduras , Springfield; San Juan; Tegucigalpa
Salary
Salary:
Not provided
nlsnow.com Logo
Next Level Solutions ltd.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced proficiency in business analysis tools (e.g., ADO, JIRA, Confluence, FigJam, Visio, and Lucidchart)
  • Demonstrated experience in process modeling, workflow automation, and requirement documentation
  • Expertise in Excel, SQL, Tableau, Power BI, or similar platforms
  • Strong knowledge of Agile, Scrum, and Waterfall methodologies
  • Deep knowledge of the P&C insurance lifecycle including policy, billing, and claims (Duck Creek experience strongly preferred)
  • Familiarity with machine learning, AI, or predictive analytics
  • Demonstrated ability to mentor and peer-review junior analysts with actionable feedback
  • Strategic problem-solving skills with a data-driven mindset
  • Excellent communication skills
  • Sound decision-making in ambiguous situations
Job Responsibility
Job Responsibility
  • Acts as a strategic partner to stakeholders by clarifying needs, managing expectations, and ensuring solutions align with business goals, while leveraging tools and technology, including AI, to improve team efficiency and decision-making
  • Independently gather, analyze, and document highly complex business, functional, and non-functional requirements
  • maintain requirements traceability matrices linking business objectives through requirements to test cases
  • Design Epics, Features, and User Stories in ADO or JIRA
  • define measurable acceptance criteria using Gherkin (Given-When-Then) methodology
  • Maintain and manage requirements repository—requirement-level change control including impact analysis, handoff facilitation with Dev, UX and QA, and stakeholder communication
  • Conduct deep-dive analysis to identify opportunities for process improvement, automation, and efficiency
  • Design and implement dashboards for strategic decision-making using tools like: SQL, Power BI, and Tableau
  • develop advanced analytics models to forecast demand, capacity, or customer behavior
  • Collaborate with technology teams to design and implement solutions that enhance business operations
What we offer
What we offer
  • Competitive compensation package
  • Annual bonus opportunities
  • Comprehensive health, dental, and vision benefits
  • Flexible work arrangements
  • Professional development and continuing education opportunities
  • Career advancement pathways
  • Paid time off and company holidays
  • Retirement savings programs
  • Wellness initiatives and employee support programs
  • Company-sponsored events and team-building activities
  • Fulltime
Read More
Arrow Right

Sr. Demand Generation Specialist

Location
Location
United States
Salary
Salary:
52.00 - 62.00 USD / Hour
addisongroup.com Logo
Addison Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Marketing, Communications, Business, or a related discipline
  • 5+ years of experience in B2B demand generation, performance marketing, or digital campaign management, ideally within SaaS, UCaaS, or CCaaS
  • Proven hands-on experience managing CSE/PPL programs and other BOFU digital tactics
  • Strong understanding of B2B funnel stages, lead management, and pipeline measurement
  • Experience with campaign trafficking, QA, reporting, and analytics, including lead-to-pipeline tracking and ROAS measurement
  • Proficiency with CRM systems (Salesforce) and BI/reporting tools (e.g., Tableau)
  • Experience working with digital partners, affiliates, and third-party vendors
  • Data-driven mindset with strong analytical and problem-solving skills
  • Excellent communication and collaboration skills with the ability to work cross-functionally
Job Responsibility
Job Responsibility
  • CSE & Affiliate Program Management
  • Own and optimize the CSE (Comparison Shopping Engine) channel end-to-end, including partner onboarding, campaign setup, budget pacing, IO/PO management, performance reviews, and ongoing partner relationship management with direct partners
  • Manage CPL/PPL programs, including vendor vetting, campaign trafficking, QA, execution, and optimization to drive incremental lead volume and pipeline contribution
  • Maintain competitive P1–P3 positioning across all core CSE partner sites to maximize visibility, lead volume, and conversion efficiency
  • Vet and onboard new CSE partners (as needed), evaluating traffic quality, cost models (CPL/CPC/CPA), geographic coverage, and competitive positioning
  • Reporting & Analytics
  • Maintain reporting and dashboards (Tableau, Salesforce) to track campaign performance, lead flow, conversion metrics, and revenue impact
  • Reconcile and deliver end-of-month (EOM) partner reports within the first 3–5 business days of each month, confirming billable vs. non-billable leads per partner terms
  • Monitor automated partner reports and proactively flag performance trends, pacing risks, and optimization opportunities
  • Track full-funnel impact — from lead capture through MQL, pipeline contribution, ROAS, and closed-won revenue
What we offer
What we offer
  • Negotiated high salaries using U.S. Bureau of Labor Statistics
  • Medical, dental, vision insurance benefits
  • 401K
  • Monetary bonuses
  • Potential permanent employment
  • Direct connection with hiring managers from renowned organizations
  • Multiple employment options near home
  • Hiring process advice
  • Resume revision
  • Employment term negotiation
Read More
Arrow Right