CrawlJobs Logo

QA Engineer, Automation and Evals

aciinfotech.com Logo

ACI Infotech

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The QA Engineer, Automation and Evals will design, implement, and maintain automated test suites across APIs, UIs, and workflows to ensure product reliability and faster release cycles. This role involves building evaluation pipelines for AI and RAG systems, integrating tests into CI/CD pipelines, and driving higher test coverage in multi-tenant environments. The ideal candidate will combine strong technical testing skills with the ability to communicate risks and quality tradeoffs effectively.

Job Responsibility:

  • Design and maintain automated test suites across services and UI
  • Build evaluation pipelines for ML and RAG systems
  • Integrate quality gates into CI pipelines and monitor flake rate
  • Create and manage test data strategies for multi-tenant environments
  • Drive high automation coverage on APIs and UI
  • Reduce regression rates and improve release confidence

Requirements:

  • 3+ years of QA automation experience in SaaS environments
  • Strong skills with testing frameworks such as Playwright, Cypress, or PyTest
  • Familiarity with CI/CD test integration
  • Proven ability to design test strategies for APIs, UIs, and workflows
  • Experience building reliable test harnesses and CI integrations
  • Strong communication skills for articulating risk and quality tradeoffs
  • Ability to analyze logs and traces to triage failures quickly

Nice to have:

  • Knowledge of evals for AI and LLM applications
  • Experience in security or load testing
  • Background in regression testing for RAG pipelines
  • Test data management in multi-tenant environments

Additional Information:

Job Posted:
December 14, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for QA Engineer, Automation and Evals

Sr. SDET

As a Sr. SDET in Agentic QA, you will own the test automation and quality framew...
Location
Location
Canada , Vancouver
Salary
Salary:
150500.00 - 175250.00 CAD / Year
dialpad.com Logo
Dialpad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in software engineering or SDET roles with an emphasis on software development
  • Strong programming skills in Python (preferred), Java, or JavaScript
  • Experience testing distributed, cloud-native SaaS systems and APIs
  • Demonstrated proficiency in coding with AI agents to accelerate development and improve code quality
  • Hands-on exposure to LLMs or AI/ML systems (e.g., OpenAI, Claude, Gemini, or similar platforms)
  • Understanding of non-deterministic systems and probabilistic testing approaches
  • Experience building test frameworks and scalable automation systems
  • Familiarity with AI evaluation techniques (benchmarking, golden datasets, human-in-the-loop validation)
  • Experience with CI/CD pipelines (e.g., Jenkins, GitHub Actions)
  • Strong collaboration skills with the ability to work across distributed teams and time zones
Job Responsibility
Job Responsibility
  • Own end-to-end quality for agentic features and workflows, including strategy, development, execution, and release qualification
  • Design and build automation tooling and frameworks for AI/LLM-driven systems, including prompt flows, agent orchestration, and tool integrations
  • Develop and maintain evaluation frameworks (evals) to measure response quality, accuracy, and hallucination rates
  • Drive automation coverage (80%+ for critical AI workflows) using deterministic + probabilistic validation approaches
  • Integrate AI quality checks into CI/CD pipelines with fast feedback cycles (<15 minutes for PR validation)
  • Build tooling for LLM observability and debugging, including prompt tracing and response analysis
  • Partner with Applied AI teams on prompt engineering, model selection, and evaluation strategies
  • Design and execute performance and load tests for AI services (latency, throughput, cost efficiency)
  • Identify and mitigate risks related to hallucinations, bias, safety, and edge cases
  • Define and track AI quality KPIs (task success rates, precision/recall, latency, etc.)
What we offer
What we offer
  • Competitive salary, comprehensive benefits, and real opportunities for growth
  • Fulltime
Read More
Arrow Right
New

Voice Ai Technical Lead

Whitehall Resources are looking for a Voice AI Technical Lead. This role is hybr...
Location
Location
United Kingdom , Windsor
Salary
Salary:
Not provided
whitehallresources.com Logo
Whitehall Resources Ltd
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15 years of domain experience and 5-6 years Voice AI / IVA / voice
  • Proven track record of delivering Voice AI / IVA / voice bot solutions into production at meaningful scale – not PoCs or demos, but real services handling real customer calls
  • Strong hands-on technical background, comfortable reviewing architectures, reading code, challenging latency budgets and prototyping when needed
  • Direct experience with LLM-based voice platforms such as Amazon Nova Sonic, ElevenLabs, OpenAI Realtime, Google Gemini Live or equivalents, and a clear view on the tradeoffs between them
  • Experience integrating conversational AI with contact-centre infrastructure – IVR, CTI, telephony, CRM, billing and knowledge systems – and delivering clean, context-aware handoffs to human agents
  • Demonstrable ability to estimate, scope and size features accurately in an Agile delivery environment, and to explain and defend those estimates to stakeholders
  • Experience coaching and upskilling multidisciplinary teams – engineers, designers, BAs and QA – through pairing, mentoring, code review and written guidance, without formal line-management authority
  • Comfort working in regulated / high-compliance environments, including GDPR, PII handling, PEN testing and security governance
Job Responsibility
Job Responsibility
  • Lead the Voice AI engineering team as a hands-on technical lead. Setting direction, reviewing designs and code, pairing with engineers, and writing production code yourself
  • Drive accurate sizing and estimation by establishing the engineering building blocks, reference implementations and reusable components that let the team break new features down into well-understood units of work
  • Raise the engineering bar across the team – introduce and enforce best practice for prompt engineering, evals, regression testing, latency budgeting, observability, CI/CD and release management for LLM-driven systems – through code review, pairing, internal guilds, brown-bags and written playbooks
  • Build the evaluation and test harness that every voice agent is measured against – automated scenario coverage, regression suites, latency and load testing, live call replay – so we know objectively whether each release is better than the last
  • Integrate voice agents cleanly with our contact-centre platform, CRM, billing, knowledge and identity systems, and design the handoff patterns that let us escalate to a human agent with full context
  • Partner with Data Security, InfoSec and our governance forums to streamline the engineering path to production – resolving incident runbooks, ownership models and PEN test blockers – and shorten our cycle time for every future release
  • Confirm and communicate the capability ceiling of our current stack, identify where different tooling is needed, and feed this back into the roadmap so we scope future packages based on engineering reality
Read More
Arrow Right

Senior AI Software Developer

The Senior AI Engineer owns end-to-end delivery of AI features—from design to pr...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master’s degree in computer science, engineering, data science, machine learning, artificial intelligence, or closely related quantitative discipline
  • Typically, 7-10 years’ experience
  • LLMs & Agents: Prompt engineering, function/tool calling, orchestration frameworks, RAG
  • ML/DS: Evaluation metrics (precision/recall, BLEU/ROUGE where relevant), error analysis
  • Data/RAG: Embeddings, similarity (cosine/IP), chunking, rerankers, vector DB operations
  • Backend: Python (FastAPI/Flask), microservices patterns
  • MLOps/Infra: Docker, Kubernetes, CI/CD, artifact management, GPU scheduling
  • Observability: Metrics/logging/tracing, dashboards, automated evaluation pipelines
  • Frameworks: PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex
  • Data: Pandas, SQL/NoSQL, Parquet/Arrow, Kafka/queues
Job Responsibility
Job Responsibility
  • Translate high-level designs into clear component contracts, APIs, and service boundaries
  • Implement LLM integrations, RAG pipelines, agents, tool/function calling, and prompt strategies
  • Own feature delivery for sprints/releases
  • maintain high code quality and documentation
  • Fine-tune models when needed
  • design evaluation harnesses and metrics
  • Build A/B testing setups
  • track accuracy, latency, robustness, and task success rates
  • Conduct error analysis
  • iterate using feedback efficacy loops and prompt refinement
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right

Manager, Content Engineering — AI Content Understanding

Product Content Engineering is a horizontal function supporting initiatives acro...
Location
Location
United States , Menlo Park
Salary
Salary:
162000.00 - 227000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in content strategy, content operations, AI evaluation, or a related field
  • 2+ years of people management experience, including hiring, developing, and performance-managing direct reports
  • Experience managing cross-functional programs with engineering, data science, and product partners in fast-paced environments
  • 1+ years working with generative AI products, AI evaluation, prompt engineering, annotation, and/or content labeling and analysis
  • Experience designing or operationalizing evaluation frameworks, annotation guidelines, or quality rubrics for AI/ML systems
  • Demonstrated ability to manage multiple concurrent workstreams with competing priorities and tight deadlines
  • Proven analytical skills with experience interpreting evaluation data and communicating findings to technical and non-technical audiences
  • Track record of building team operational processes and quality standards from the ground up or during periods of significant change
Job Responsibility
Job Responsibility
  • Manage and develop a team of Content Engineers and contingent workers, setting clear goals, providing regular feedback, and supporting career growth
  • Own the execution of continuous CU model evaluations — coordinating sprint planning, reviewer assignments, QA processes, and delivery timelines across multiple concurrent workstreams
  • Drive the creation and maintenance of golden datasets that serve as ground truth for model benchmarking and auto-eval calibration
  • Partner with engineering, data science, and product teams to translate evaluation insights into actionable recommendations for model improvement and prompt optimization
  • Lead the team's contribution to LLM-as-a-Judge (auto-eval) initiatives — ensuring human evaluation data is used to calibrate, validate, and improve automated evaluation systems
  • Define and maintain evaluation guidelines, rubrics, and quality standards in partnership with Lead Content Engineers, ensuring consistency across reviewers and use cases
  • Build repeatable operational processes for evaluation sprints, including reviewer training, calibration sessions, and escalation workflows
  • Manage CW workforce planning — hiring, onboarding, allocation across workstreams, and performance management
  • Synthesize evaluation results into structured reports and present findings to cross-functional leadership, including engineering leads and lead product stakeholders
  • Identify and mitigate operational risks — staffing gaps, timeline conflicts, quality regressions — before they impact delivery
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

LLM Inference Performance & Evals Engineer

Join the inference model team dedicated to bring up the state-of-the-art models,...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cerebras.net Logo
Cerebras Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building high-performance ML or systems software
  • Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration
  • Prior experience in modeling, compilers or crafting benchmarks or performance studies
  • not just black-box QA tests
  • Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity
Job Responsibility
Job Responsibility
  • Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
  • Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
  • Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
  • Keep pace with the latest open- and closed-source models
  • run them first on wafer scale to expose new optimization opportunities
What we offer
What we offer
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs
Read More
Arrow Right

Provider Operations & Support

OpenRouter is an LLM marketplace that lets developers use frontier models in one...
Location
Location
United States
Salary
Salary:
Not provided
openrouter.ai Logo
OpenRouter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2-3 years in a startup, solutions engineering, product ops, or similar
  • Comfortable building demos and scripts
  • can read API docs and troubleshoot with logs/cURL/Postman
  • Experience with TypeScript/JavaScript and/or Python
  • Git-literate
  • CS/Engineering degree is a plus, not required with demonstrated technical aptitude
Job Responsibility
Job Responsibility
  • Run end-to-end launch playbooks: scoping, test plans, latency/quality checks, pricing & quotas, docs, and announcement assets
  • Coordinate with model providers to integrate, QA, and hit ship dates
  • Maintain clear versioning and release notes
  • manage deprecations and migrations
  • Build internal tools to help speed up the model onboarding process
  • Build internal evals to evaluate models and endpoints quickly
  • Document internal processes and prime them for automation
What we offer
What we offer
  • Competitive compensation, benefits, and equity
  • Fulltime
Read More
Arrow Right
New

Software Engineer II

Dynamics 365 is Microsoft’s suite of enterprise software that powers many of the...
Location
Location
United States , Redmond
Salary
Salary:
102100.00 - 202200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Job Responsibility
Job Responsibility
  • Design and develop highly usable, scalable application capabilities, integrating AI models and enhancing existing features to meet evolving customer needs
  • Write clean, efficient, and well-documented code using industry best practices and coding standards
  • Tackle ambiguity, create clarity for the team, generate energy in execution, identify and communicate risks throughout all stages of the software development lifecycle
  • Collaborate with product managers, architects, designers, and engineers from diverse engineering teams to solve challenging problems
  • Be Accountable for the quality, usability & performance of design, implementation, schedule, delivery, of your team and services
  • Build and debug production-grade code in distributed systems
  • Troubleshoot live site issues as part of both product development and live site support rotations, ensuring rapid resolution and learning
  • Ensure high reliability and performance of applications and services through intelligent monitoring, alerting, and proactive failover strategies
What we offer
What we offer
  • Eligible for benefits and other compensation
  • Fulltime
Read More
Arrow Right
New

Marketing Growth Manager

As our Growth Marketing Manager, you’ll play a pivotal role in helping more mort...
Location
Location
United Kingdom , Derby
Salary
Salary:
Not provided
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B2B lead generation/demand generation
  • Running multi-channel campaigns and ongoing optimisation
  • KPI tracking, reporting, and insight-driven action
  • Briefing and coordinating agencies, freelancers, and stakeholders
  • Website/landing page optimisation and conversion best practice
  • Working closely with Sales teams on lead follow-up and nurture
Job Responsibility
Job Responsibility
  • Demand generation planning
  • Content coordination
  • Campaign performance
  • Website conversion
  • Sales enablement
  • Tools & processes
  • Reporting & insight
  • Compliance & QA
  • Line management
What we offer
What we offer
  • Competitive salary + performance-related bonus
  • 25 days holiday + your birthday off + extra days for length of service
  • Hybrid working - from Derby City Centre and home
  • Free breakfast, lunch, snacks, and drinks in the office
  • Company pension with optional matched contributions
  • 2 volunteering days + 1 team volunteering day
  • 2 half-days of paid "me time" for wellbeing
  • Employee recognition schemes and retail discounts
  • Salary sacrifice options (home tech, cycle to work, pension)
  • Gym subsidy, company sick pay, EAP support
  • Fulltime
Read More
Arrow Right