CrawlJobs Logo

Model Evaluation QA Lead

deepgram.com Logo

Deepgram

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

180000.00 - 230000.00 USD / Year

Job Description:

As Model Evaluation QA Lead, you’ll be the technical owner of model quality assurance across Deepgram’s AI pipeline—from pre-training data validation and provenance through post-deployment monitoring. Reporting to the QA Engineering Manager, you will partner directly with our Active Learning and Data Ops teams to build and operate the evaluation infrastructure that ensures every model Deepgram ships meets objective quality bars across languages, domains, and deployment contexts. This is a hands-on, high-impact role at the intersection of QA engineering and ML operations. You will design automated evaluation frameworks, integrate model quality gates into release pipelines, and drive industry-standard benchmarking—ensuring Deepgram maintains its position as the accuracy and latency leader in voice AI.

Job Responsibility:

  • Model Evaluation Automation: Design, build, and maintain automated model evaluation pipelines that run against every candidate model before release
  • Release Gate Integration: Embed model quality checkpoints into CI/CD and release pipelines
  • Agent & Model Evaluation Frameworks: Stand up and operate evaluation tooling for end-to-end voice agent testing
  • Active Learning & Data Ingestion Testing: Partner with the Active Learning team to validate data ingestion infrastructure, annotation pipelines, and retraining automation
  • Industry Benchmark Automation: Automate execution and reporting of industry-standard benchmarks
  • Language & Domain Validation: Build and maintain test suites for multi-language and domain-specific model validation
  • Retraining Automation Support: Validate the end-to-end retraining pipeline across all data sources
  • Manual Test Feedback Loop: Design and operate human-in-the-loop evaluation workflows for subjective quality assessment

Requirements:

  • 4–7 years of experience in QA engineering, ML evaluation, or a related technical role with a focus on predictive and generative model and data quality
  • Hands-on experience building automated test/evaluation pipelines for ML models and connecting software features
  • Strong programming skills in Python
  • experience with ML evaluation libraries, data processing frameworks (Pandas, NumPy), and scripting for pipeline automation
  • Familiarity with speech/audio ML concepts: WER, SER, MOS, acoustic models, language models, or similar evaluation metrics
  • Experience with CI/CD integration for ML workflows (e.g., GitHub Actions, Jenkins, Argo, MLflow, or equivalent)
  • Ability to design and maintain reproducible benchmark environments across multiple model versions and configurations
  • Strong communication skills—you can translate model quality metrics into actionable insights for engineering, research, and product stakeholders
  • Detail-oriented and systematic, with a bias toward automation over manual process

Nice to have:

  • Experience with model evaluation platforms (Coval, Braintrust, Weights & Biases, or custom evaluation harnesses)
  • Background in speech recognition, NLP, or audio processing domains
  • Experience with distributed evaluation at scale—running evals across GPU clusters or large dataset partitions
  • Familiarity with human-in-the-loop evaluation design and annotation pipeline tooling
  • Experience with multi-language model evaluation and localization quality assurance
  • Prior work in a company where ML model quality directly impacted revenue or customer SLAs
What we offer:
  • Medical, dental, vision benefits
  • Annual wellness stipend
  • Mental health support
  • Life, STD, LTD Income Insurance Plans
  • Unlimited PTO
  • Generous paid parental leave
  • Flexible schedule
  • 12 Paid US company holidays
  • Quarterly personal productivity stipend
  • One-time stipend for home office upgrades
  • 401(k) plan with company match
  • Tax Savings Programs
  • Learning / Education stipend
  • Participation in talks and conferences
  • Employee Resource Groups
  • AI enablement workshops / sessions
  • Offers Equity
  • Offers Bonus
  • 10% annual bonus

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Model Evaluation QA Lead

New

Manager, Support Quality

As the Support Quality Manager at Replit, you’ll build and lead the program that...
Location
Location
United States , Foster City
Salary
Salary:
140000.00 - 175000.00 USD / Year
Replit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in Support Quality, Support Operations, Technical Support, or similar roles in a technology company
  • 2+ years in a people management or team leadership capacity
  • Experience building or significantly evolving a QA program, framework, or evaluation system
  • Strong understanding of customer support workflows, ticket lifecycle, and escalation patterns
  • Experience working with support platforms (Zendesk or similar) and QA tooling or review workflows
  • Strong analytical mindset with experience using data to identify trends and drive performance improvements
  • Experience working cross-functionally with Support leadership, Operations, and Enablement or Training teams
  • Strong written and verbal communication skills, including delivering structured performance feedback and coaching guidance
  • Experience working in fast-moving product environments with frequent releases and evolving workflows
  • Hands-on experience using AI tools (e.g., Replit, Claude, ChatGPT, or similar) to improve workflows, knowledge creation, training, or support operations
Job Responsibility
Job Responsibility
  • Build and lead the Support QA program, including evaluation frameworks, scoring models, review workflows, and calibration processes
  • Hire, develop, and manage QA specialists or analysts as the program scales
  • Define quality standards across ticket support, technical troubleshooting, and customer communication
  • Establish QA coverage strategy across FTE and vendor support teams
  • Lead calibration programs to ensure consistent quality standards across reviewers, teams, and regions
  • Partner with Learning & Knowledge to turn QA insights into training, onboarding improvements, and coaching strategies
  • Partner with Support Operations to embed quality signals into dashboards, reporting, and performance frameworks
  • Define and evolve quality standards for AI-assisted support, including agent assist usage, automation handoffs, and AI-generated content quality
  • Utilize Replit to build internal tooling and recommend external tooling when necessary to improve QA workflows and program scalability
  • Define and track key quality metrics (QA trends, CSAT correlation, escalation rate, escalation rate, repeat contact rate, policy adherence) and report insights to Support leadership
What we offer
What we offer
  • Competitive Salary & Equity
  • 401(k) Program with a 4% match
  • Health, Dental, Vision and Life Insurance
  • Short Term and Long Term Disability
  • Paid Parental, Medical, Caregiver Leave
  • Commuter Benefits
  • Monthly Wellness Stipend
  • Autonomous Work Environment
  • In Office Set-Up Reimbursement
  • Flexible Time Off (FTO) + Holidays
  • Fulltime
Read More
Arrow Right

Technical Artist

This is a rare opportunity to sit at the intersection of frontier research and c...
Location
Location
United States , Palo Alto
Salary
Salary:
125000.00 - 250000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrable experience with Visual GenAI, either as a creator, fine-tuner, researcher, or engineer
  • Strong generalist engineering skills with proficiency in Python
  • Deep experience with open-source generative AI tools such as ComfyUI, Automatic1111, Kohya, or similar frameworks
  • A hobbyist or professional background in art/design leading to a deep understanding of aesthetics and the ability to articulate visual choices using standard artistic language
  • Experience with Low-Rank Adaptation (LoRA) techniques and other methods for efficient model fine-tuning
  • Exceptionally motivated and passionate about building amazing models, with a 'high agency' drive to solve problems independently
Job Responsibility
Job Responsibility
  • Architect and Build Pipelines: Design and implement robust ComfyUI-style pipelines to productionize our models
  • Own Model Fine-Tuning: Develop and implement LoRA-based fine-tuning strategies to enhance model controllability
  • Bridge Research and Design: Collaborate closely with the Research and Design teams to evaluate early model versions
  • Drive Model Improvement: Take ownership of identifying model deficiencies and implementing fine-tuning solutions or data enhancements
  • QA for Aesthetics: Work with the Data team to ensure final models meet exceptionally high aesthetic quality standards
  • Prototyping: Rapidly prototype new creative workflows and controllability features to test new research opportunities
  • Fulltime
Read More
Arrow Right

Senior Program Manager, Tech - Uber AI Solutions

At Uber AI Solutions, we deliver high-quality scaled programs in operations, tec...
Location
Location
United States , San Francisco
Salary
Salary:
167000.00 - 185500.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of overall experience, with specific familiarity in ML operations, data annotation and AI infrastructure domains
  • Familiarity and experience in leading or managing client interactions (i.e., AI labs, foundation VLM, and robotics companies) for data annotation, training, evaluation and performance benchmarking
  • Experience in client-facing service delivery management, solutioning, governance - with external client stakeholders at senior levels and/or their AI/Science/Research teams
  • Familiarity with strategies for delivery and QA processes in this domain is required
  • Track record of driving innovation and thought leadership in AI training and evaluation services - e.g., via model benchmarking, opportunity identification based on emerging AI industry trends
  • Strong ability to communicate, bring clarity of thought in messaging for senior management as well as broader teams
  • Strong collaboration skills and abilities - working across silos and team structures to drive impact effectively
  • Ability to work in a global organization across locations and time zones
Job Responsibility
Job Responsibility
  • Client engagement for presales support - partner with Sales to interact with prospective clients (AI frontier labs, robotics & video generation companies in the physical AI space) to shape the project scope, evangelise our capabilities, design the delivery solution, and governance approach
  • Client engagement for program delivery - represent the service delivery organization (located globally with a predominant India/offshore footprint) and collaborate with them to drive ongoing governance, enable troubleshooting, find up/cross-sell opportunities, bring thought leadership with client teams
  • Program delivery - help to manage the delivery of annotation, training & evaluation of AI models in the physical AI space
  • Innovation and thought leadership - demonstrate deep understanding and expertise of computer vision related AI training and evaluations including robotics, video synthesis, VLA, etc., with prospective clients
  • leverage this expertise to drive talent strategy, tech platform and tooling, and any other relevant new capabilities to advance the maturity of this area
  • Tech platform capability and roadmap inputs - collaborate with our Product and Engineering teams to help develop a roadmap for tech and tooling, and make it best in class
  • Stakeholder management - represent the coding and data AI capabilities at senior leadership level interactions and forums, evangelise our capabilities, drive sponsorship and backing for initiatives
  • Best practices - continually improve ways of work, enhance delivery maturity, elevate governance, and impact
What we offer
What we offer
  • Eligibility to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible for various benefits (details at provided link)
  • Fulltime
Read More
Arrow Right

Delivery Quality Engineer, AI Business

As a Delivery Quality Engineer within Prolific AI Data Services, you will be the...
Location
Location
Mexico , Mexico City
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in quality engineering, data or annotation quality, analytics engineering, trust and integrity, or ML/LLM evaluation operations
  • Strong proficiency in Python and SQL, with comfort applying statistical concepts such as sampling strategies, confidence levels, and agreement metrics
  • A proven track record of turning ambiguous or messy quality problems into clear metrics, automated checks, and durable process improvements
  • Strong quality systems thinking, with the ability to translate complex edge cases into clear rules, tests, rubrics, and governance mechanisms
  • Hands-on experience instrumenting workflows and implementing pragmatic automation that catches quality and integrity issues early
  • Demonstrated ability to influence cross-functional teams (Product, Engineering, Operations, Client teams) and drive change without direct authority
  • Strong customer empathy, with a clear understanding of what “useful, trustworthy data” means for research, AI training, and evaluation use cases
Job Responsibility
Job Responsibility
  • Own end-to-end quality design for Prolific managed service studies, including rubrics, acceptance criteria, defect taxonomies, severity models, and clear definitions of done
  • Define, implement, and maintain quality measurement systems, including sampling plans, golden sets, calibration protocols, agreement targets, adjudication workflows, and drift detection
  • Build and deploy automated quality checks and launch gates using Python and SQL, such as schema and format validation, completeness checks, anomaly detection, consistency testing, and label distribution monitoring
  • Design and run launch readiness processes, including pre-launch checks, pilot calibration, ramp criteria, full-launch thresholds, and pause/rollback mechanisms
  • Partner with Product and Engineering to embed in-study quality controls and authenticity checks into workflows, tooling, and escalation paths
  • Write and continuously improve guidelines and training materials to keep participants, reviewers, and internal teams aligned on evolving quality standards
  • Investigate quality and integrity issues end to end, running root-cause analysis across guidelines, UX, screening, training, and operations, and driving corrective and preventive actions (CAPAs)
  • Build dashboards and operating cadences to track defect rates, rework, throughput versus quality trade-offs, integrity events, and SLA adherence
  • Lead calibration sessions and coach QA leads and reviewers to improve decision consistency, rubric application, and overall quality judgement
  • Translate one-off quality fixes into repeatable, scalable playbooks across customers, programs, and study types
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
  • equity
  • opportunity to earn a cash variable element, such as a bonus or commission
  • Fulltime
Read More
Arrow Right

Data Quality Engineer, AI Business

As a Data Quality Engineer within Prolific AI Data Services, you will be the qua...
Location
Location
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in quality engineering, data or annotation quality, analytics engineering, trust and integrity, or ML/LLM evaluation operations
  • Strong proficiency in Python and SQL, with comfort applying statistical concepts such as sampling strategies, confidence levels, and agreement metrics
  • A proven track record of turning ambiguous or messy quality problems into clear metrics, automated checks, and durable process improvements
  • Strong quality systems thinking, with the ability to translate complex edge cases into clear rules, tests, rubrics, and governance mechanisms
  • Hands-on experience instrumenting workflows and implementing pragmatic automation that catches quality and integrity issues early
  • Demonstrated ability to influence cross-functional teams (Product, Engineering, Operations, Client teams) and drive change without direct authority
  • Strong customer empathy, with a clear understanding of what “useful, trustworthy data” means for research, AI training, and evaluation use cases
Job Responsibility
Job Responsibility
  • Own end-to-end quality design for Prolific managed service studies, including rubrics, acceptance criteria, defect taxonomies, severity models, and clear definitions of done
  • Define, implement, and maintain quality measurement systems, including sampling plans, golden sets, calibration protocols, agreement targets, adjudication workflows, and drift detection
  • Build and deploy automated quality checks and launch gates using Python and SQL, such as schema and format validation, completeness checks, anomaly detection, consistency testing, and label distribution monitoring
  • Design and run launch readiness processes, including pre-launch checks, pilot calibration, ramp criteria, full-launch thresholds, and pause/rollback mechanisms
  • Partner with Product and Engineering to embed in-study quality controls and authenticity checks into workflows, tooling, and escalation paths
  • Write and continuously improve guidelines and training materials to keep participants, reviewers, and internal teams aligned on evolving quality standards
  • Investigate quality and integrity issues end to end, running root-cause analysis across guidelines, UX, screening, training, and operations, and driving corrective and preventive actions (CAPAs)
  • Build dashboards and operating cadences to track defect rates, rework, throughput versus quality trade-offs, integrity events, and SLA adherence
  • Lead calibration sessions and coach QA leads and reviewers to improve decision consistency, rubric application, and overall quality judgement
  • Translate one-off quality fixes into repeatable, scalable playbooks across customers, programs, and study types
Read More
Arrow Right

Machine Learning Team Lead

TradingView is the world’s #1 platform for all things investing. 100M+ users tru...
Location
Location
Cyprus; Georgia , Tbilisi
Salary
Salary:
Not provided
tradingview.com Logo
TradingView
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience in managing technical teams with the ability to organize workflows and build effective processes
  • Deep understanding of the ML project lifecycle: from idea and prototype to production and maintenance
  • Strong knowledge of NLP/LLM technologies: text generation and classification, embeddings, RAG, and other modern techniques
  • Excellent communication skills and experience working with various teams (ML, backend, QA, product, analytics)
  • Ability to define and maintain roadmaps and make system-level engineering decisions
  • Experience in prioritization, risk assessment, and managing technical debt
  • Proficiency in Python and modern development tools (Git, CI/CD, Docker, Kubernetes)
  • Experience in operating ML systems in production (monitoring, metrics, A/B testing, incident handling)
Job Responsibility
Job Responsibility
  • Develop and enhance projects related to news processing (sentiment analysis, NER, classification, search, etc.)
  • Perform data analysis and preprocessing, prepare datasets, and build model pipelines
  • Design monitoring systems and evaluate the performance of ML systems
  • Lead a team of ML engineers working on NLP and LLM projects (news, content generation, recommendations, search, and chat systems)
  • Set tasks, prioritize work, manage deadlines, and ensure timely delivery
  • Collaborate with product and analytics teams to align goals and approaches
  • Support the technical growth of the team through mentoring, reviewing solutions, and assisting in system design
  • Improve development and deployment processes for ML solutions in production
  • Contribute to engineering efforts as a senior developer: design and implement key components, perform code reviews, and drive technical improvements
What we offer
What we offer
  • Flexible working hours and a hybrid work format
  • Well-equipped offices for focused and collaborative work
  • A global, distributed team of 500+ professionals
  • Learning, mentorship, and long-term career growth
  • Relocation support and private health insurance
  • Performance-based bonuses
  • TradingView Premium access
  • Regular team events and company-wide meetups
Read More
Arrow Right

Data Science: Team Lead

The team is focused on building and evolving products related to news content pr...
Location
Location
Georgia , Tbilisi
Salary
Salary:
Not provided
tradingview.com Logo
TradingView
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience in managing technical teams with the ability to organize workflows and build effective processes
  • Deep understanding of the ML project lifecycle: from idea and prototype to production and maintenance
  • Strong knowledge of NLP/LLM technologies: text generation and classification, embeddings, RAG, and other modern techniques
  • Excellent communication skills and experience working with various teams (ML, backend, QA, product, analytics)
  • Ability to define and maintain roadmaps and make system-level engineering decisions
  • Experience in prioritization, risk assessment, and managing technical debt
  • Proficiency in Python and modern development tools (Git, CI/CD, Docker, Kubernetes)
  • Experience in operating ML systems in production (monitoring, metrics, A/B testing, incident handling)
Job Responsibility
Job Responsibility
  • Develop and enhance projects related to news processing (sentiment analysis, NER, classification, search, etc.)
  • Perform data analysis and preprocessing, prepare datasets, and build model pipelines
  • Design monitoring systems and evaluate the performance of ML systems
  • Lead a team of ML engineers working on NLP and LLM projects (news, content generation, recommendations, search, and chat systems)
  • Set tasks, prioritize work, manage deadlines, and ensure timely delivery
  • Collaborate with product and analytics teams to align goals and approaches
  • Support the technical growth of the team through mentoring, reviewing solutions, and assisting in system design
  • Improve development and deployment processes for ML solutions in production
  • Contribute to engineering efforts as a senior developer: design and implement key components, perform code reviews, and drive technical improvements
What we offer
What we offer
  • Flexible Working Hours
  • Hybrid Work Policy
  • Relocation Package
  • Private Health Insurance
  • Performance Bonus
  • Work alongside experienced professionals and mentors offering ongoing training and growth opportunities
  • Premium TradingView Subscription
  • Annual Team Events
  • A comfortable, well-equipped workspace with exclusive perks like a gym and much more
  • Fulltime
Read More
Arrow Right

Senior Software Engineer (Laravel), AI Solutions

PactFi is seeking a Senior Software Engineer with a strong focus on building and...
Location
Location
United States , New York
Salary
Salary:
150000.00 - 170000.00 USD / Year
fin.capital Logo
Fin Capital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive hands-on experience with Laravel building production-grade backend systems
  • Strong understanding of backend application architecture, data modeling, and scalable system design
  • Practical experience working with AI/LLM tooling, such as: LLM APIs and agent frameworks, Document parsing and extraction, Vector databases and embeddings, RAG pipelines and model orchestration
  • Ability to evaluate architectural trade-offs across performance, cost, security, and business value
  • Strong problem-solving skills and comfort working with ambiguous requirements
  • Excellent communication skills and a collaborative, ownership-driven mindset
Job Responsibility
Job Responsibility
  • Design, build, and maintain robust backend systems using Laravel, with a focus on reliability, scalability, and long-term maintainability
  • Develop and evolve internal services, core platform components, and shared libraries that power PactFi’s products and operations
  • Collaborate with Product Managers, engineering leadership, and QA to shape technical solutions and support high-quality releases through thoughtful design, testing, debugging, and iteration
  • Contribute to architectural discussions, technical standards, and platform-level improvements that support PactFi’s long-term growth
  • Take initiative on internal projects and exploratory work aimed at improving developer productivity, system reliability, and operational efficiency
  • Identify high-value AI use cases across the PactFi platform, focusing on real business impact
  • Design and implement AI-powered backend workflows, including: LLM-driven features and automations, Intelligent agents and decision-support systems, Automated document ingestion and extraction pipelines for large, unstructured, and semi-structured documents (e.g., PDFs, contracts, financial statements)
  • Lead the integration of AI models and platforms such as OpenAI, AWS Bedrock, Azure AI, Google Gemini, or similar
  • Design and implement AI workflows for inference, orchestration, evaluation, and production readiness, with attention to reliability, observability, security, and cost
  • Partner with engineering and data teams to ensure AI solutions integrate cleanly with existing systems, APIs, and data pipelines
What we offer
What we offer
  • Offers Equity
  • Healthcare coverage
  • 401k
  • Fulltime
Read More
Arrow Right