CrawlJobs Logo

QA LLM Engineer

· Job Posted January 02, 2026
Apply Position
Job Link Share

Job Description

A QA Automation Engineer with strong experience in LLMs and GenAI who can ensure the accuracy, stability, and performance of AI-driven applications.

Job Responsibility

  • Design and execute QA strategies for LLM-based and search-driven products
  • Validate data pipelines involving indexing, chunking, embeddings, cosine similarity and keyword search
  • Evaluate retrieval-augmented generation (RAG) and recommendation system quality using precision, recall, and relevance metrics
  • Develop prompt test suites to measure accuracy, consistency, and bias
  • Monitor LLM observability metrics such as latency, token usage, hallucination rate and cost performance
  • Automate end-to-end test scenarios using Playwright and integrate with CI/CD pipelines
  • Collaborate with ML engineers and developers to improve model responses and user experience
  • Contribute to test frameworks and datasets for LLM regression and benchmark testing

Requirements

  • BE/BTech in Computer Science, Data Engineering, or a related field from a top institute (like IIT, NIT, BITS, etc.)
  • 3.5 to 5.5 years of experience in QA engineering
  • At least 1+ years of experience in GenAI or LLM-based systems
  • Strong understanding of indexing, chunking, embeddings, similarity search, and retrieval workflows
  • Experience with prompt engineering, LLM evaluation, and output validation techniques
  • Proficiency with Playwright, API automation, and modern QA frameworks
  • Knowledge of observability tools for LLMs
  • Solid scripting experience in Python
  • Knowledge of different LLM providers (OpenAI, Gemini, Anthropic, Mistral, etc.)
  • Exposure to RAG pipelines, recommendation systems, or model performance benchmarking
  • Strong analytical and debugging skills, with a detail-oriented mindset

What we offer

  • A culture of innovation
  • Endless learning opportunities
  • Talented peers
  • Work-life balance
  • Flexible schedules
  • Remote work options
  • A great culture
  • Recognition & rewards

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

QA LLM Engineer

8 matching positions

Senior QA Engineer (AI-based)

We are seeking a detail-oriented Senior QA Engineer to keep the quality assuranc...
Location
Location
United Arab Emirates , Dubai
Salary
Salary:
Not provided
parserdigital.com Logo
Parser Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in QA engineering, with proven experience in testing data systems or AI applications
  • Strong analytical skills and attention to detail, capable of spotting subtle inconsistencies in data and agent behavior
  • Proficiency in Python and SQL for data validation, automation scripting, and test preparation
  • Experience with API testing (REST/GraphQL automation and validation)
  • Familiarity with chatbot frameworks, LLMs, or conversational testing
  • (Desirable) Experience or strong interest in using LLM-based assistants (e.g., GitHub Copilot, ChatGPT) in test and execution
  • (Desirable) Exposure to QA automation tools (e.g., PyTest, Postman, dbt tests)
Job Responsibility
Job Responsibility
  • Expertise in quality assurance efforts for data pipelines, APIs, and conversational agents
  • Design and execute generative AI testing scenarios, including exploratory testing for chatbot interactions, intent recognition, tone, and edge-case behavior
  • Experience in LLM test designs, including prompt and grounding validation, golden sets, and non-deterministic assertions
  • Perform crucial safety and quality checks, such as testing for hallucination, bias, toxicity, and PII leakage
  • Validate structured and unstructured data outputs, ensuring consistency, accuracy, and compliance
  • Establish and drive real-time testing approaches, including streaming data validation and API monitoring
  • Collaborate with ML and NLP teams to define comprehensive evaluation metrics and criteria for agent performance
  • Integrate AI-driven tools (like Copilot) into the QA lifecycle to accelerate test design, documentation, and defect analysis
  • Fulltime
Read More
Arrow Right

Ai Qa Engineer (Agents)

An AI QA Engineer (Agents) is responsible for ensuring the quality, reliability,...
Location
Location
Ireland , Cork
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years' total experience, including 1+ year testing AI/ML applications, LLM integrations, or conversational interfaces
  • Hands-on experience with end-to-end testing and automation for AI/agentic products
  • 3+ years of experience in software quality assurance or testing
  • 1+ years of experience testing AI/ML applications, LLM integrations, or conversational interfaces
  • Strong understanding of software testing principles, methodologies, and best practices
  • Experience writing and maintaining automated tests (unit, integration, or end‑to‑end)
  • Proficiency in at least one programming language (Python, TypeScript, JavaScript, Java, etc.)
  • Experience with API testing tools (Postman, REST Assured, etc.) or frameworks
  • Strong analytical and problem‑solving skills
  • Excellent attention to detail and ability to identify edge cases
Job Responsibility
Job Responsibility
  • Design and execute test plans for AI agents and agentic experiences
  • Write and maintain automated test suites for agent functionality (unit tests, evals integration tests, end‑to‑end tests)
  • Perform (minimal)manual testing of agent interactions, workflows, and business logic
  • Test agent responses, accuracy, and behavior across various scenarios and edge cases
  • Identify, document, and track bugs through resolution
  • Collaborate with engineers, product managers, and business stakeholders to understand requirements and acceptance criteria
  • Participate in test planning, test case design, and test strategy discussions
  • Create and maintain test data, test scenarios, and test environments for agents
  • Participate in feature design sessions, highlighting key testing scenarios and fault zones
  • Execute performance and load testing to ensure agent scalability and response times
  • Fulltime
Read More
Arrow Right

Lead AI Red Teaming & QA Engineer

We are seeking a Lead AI Red Teaming & QA Engineer to design and execute automat...
Location
Location
United Kingdom , City of London
Salary
Salary:
500.00 - 600.00 GBP / Day
https://www.randstad.com Logo
Randstad
Expiration Date
June 16, 2026
Flip Icon
Requirements
Requirements
  • Proven experience testing software within FCA, DORA, or EU AI Act frameworks
  • Hands-on experience configuring, testing, and bypassing Bedrock Guardrails, Agents, and Knowledge Bases (RAG)
  • Solid understanding of Foundation Models, tool use (function calling), OWASP LLM Top 10, and NIST AI RMF
  • Strong Python development skills, experience with AI eval tools (Garak, Pyrit, Ragas), and building complex CI/CD test pipelines
Job Responsibility
Job Responsibility
  • Build and integrate automated red teaming suites into CI/CD pipelines using frameworks like Garak, Pyrit, and AgentDojo to enforce strict safety release gates
  • Develop metrics and continuous testing for core AI risks, including hallucinations, memorisation, algorithmic bias, uncertainty, and model drift
  • Map threat models (OWASP LLM Top 10, Agentic threats) to automated test cases and produce technical testing evidence required by EU AI Act Article 15, DORA, and FCA Operational Resilience guidelines
  • Own the enterprise AI Bill of Materials (AI-BOM), tracking model lineages, dataset versions, and signed artifacts as a centralized evaluation service
  • Fulltime
Read More
Arrow Right

Junior Strong QA Engineer

AI-powered platform that helps sales teams improve customer interactions, provid...
Location
Location
Ukraine
Salary
Salary:
Not provided
startupsoft.com Logo
StartupSoft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience in QA
  • English level B2 is a must
  • Basic understanding of AI / LLM concepts
  • Knowledge of LLM QA / evaluation (core skill)
  • Understanding of regression testing, especially in the context of prompts and AI systems
  • Strong analytical and critical thinking skills
  • Experience or understanding of APIs and integrations
  • Ability to evaluate text quality beyond pass/fail and handle ambiguity
  • Experience or ability to: Build test datasets
  • Assess correctness, completeness, tone, and hallucinations in model outputs
Job Responsibility
Job Responsibility
  • Work with and build understanding of AI / LLM systems
  • Perform LLM QA and evaluation as a core part of daily work
  • Apply regression testing on a regular basis with a strong focus on prompt changes and their impact
  • Analyze and evaluate text quality beyond simple pass/fail criteria, including: correctness of content
  • completeness (e.g., all action items included)
  • detection of hallucinations
  • appropriateness of tone
  • Form and apply your own evaluation criteria when working with ambiguous or uncertain cases
  • Work with APIs and integrations
  • Build and maintain test datasets for evaluation purposes
What we offer
What we offer
  • Global collaboration opportunities
  • Core team membership
  • Equity and ownership potential
  • Premier workspaces
  • Competitive compensation package
  • Cutting-edge technology environment
  • Impactful project contributions
  • Collaborative company culture
  • Fulltime
Read More
Arrow Right

Senior QA Engineer – AI & Conversational Products

We're looking for a Senior QA Engineer who's passionate about quality, thrives i...
Location
Location
Israel
Salary
Salary:
Not provided
radancy.com Logo
Radancy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of QA engineering experience, with at least 2 years in a senior or lead capacity
  • Hands-on experience testing AI/ML features, LLM-based products, or conversational systems (a plus)
  • Strong proficiency with test automation frameworks (e.g., Playwright, Selenium, Cypress, Pytest)
  • Solid understanding of API testing (REST/GraphQL) and tools like Postman or similar
  • Experience with CI/CD pipelines (GitHub Actions, Jenkins, or equivalent)
  • Ability to think like a user and an engineer — identifying failure modes others miss
  • Excellent communication skills and a collaborative approach to cross-functional teamwork
Job Responsibility
Job Responsibility
  • Own the end-to-end quality assurance process for features across web, API, and backend services within Radancy's talent platform
  • Design and implement test strategies for LLM-based features and AI agents — ensuring accuracy, consistency, and robustness at scale
  • Build and maintain automated test frameworks covering UI, API, performance, and regression testing
  • Develop comprehensive test plans and test cases that address both functional requirements and edge scenarios unique to AI-driven hiring experiences
  • Partner closely with Product, Engineering, and Data teams to surface quality gaps early in the development cycle
  • Analyze and report bugs with precision, advocating for high-impact fixes that protect candidate and employer experience
  • Participate in release processes, CI/CD pipeline validation, and production monitoring
Read More
Arrow Right

Senior QA Engineer with AI experience

N-iX is looking for a Senior QA Engineer with AI experience to join our team. We...
Location
Location
Ukraine
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in manual QA, with exposure to frontend, backend and AI testing
  • Solid experience with API testing using tools such as Postman, REST Client, or similar
  • Experience testing web UIs and understanding of cross-browser/cross-environment considerations
  • Ability to read and interpret API documentation, data schemas, and system architecture descriptions
  • Hands-on experience working on projects that included AI, ML, or NLP components — particularly validating outputs that are probabilistic or context-dependent
  • Familiarity with the concept of RAG (Retrieval-Augmented Generation) or LLM-based systems
  • Strong analytical thinking — especially the ability to assess whether an AI response is contextually correct, not just technically non-null
  • Good understanding of test documentation practices: test plans, test cases, bug reports, traceability
  • English level at least Upper-Intermediate
Job Responsibility
Job Responsibility
  • Follow a phased QA approach — begin with CMS and backend testing to establish a reliable baseline of expected system behavior, then apply those insights to validate AI agent outputs effectively
  • Design and execute test cases for REST APIs, covering functional correctness, edge cases, error handling, authentication, and data integrity
  • Perform UI testing across core user journeys, validating layout, behavior, and integration with backend services
  • Transition into AI output validation once the deterministic layers are stable — using your knowledge of business rules to identify inconsistencies, hallucinations, or degraded outputs in agent responses
  • Document and maintain test cases, test plans, and bug reports in a structured and traceable way
  • Participate in requirement reviews and technical discussions to identify testability gaps early
  • Collaborate with the Lead Big Data/AI Engineer and AI team to understand RAG pipeline behavior, document ingestion flows, and output quality expectations
  • Contribute to building reusable test assets and QA processes as the project scales
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Qa Automation Engineer With German

Our client is a leading provider of digital solutions for tax, accounting, and b...
Location
Location
Romania , Brasov
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Informatics/ or similar field of study/or equivalent working experience is required
  • Minimum 7 years demonstrable experience in a Test Engineer Automation role
  • Strong understanding of testing methodologies, tools, and best practices
  • Experience with test automation frameworks, particularly Playwright
  • Proficiency in programming languages such as JavaScript or TypeScript
  • Experience with test management and bug tracking tools (e.g., Jira, Jira Xray TestRail)
  • Hands-on experience in manual testing and the ability to translate complex requirements into executable test cases
  • Strong communication and collaboration skills to work effectively with cross-functional teams
  • Experience in organizing and conducting User Acceptance Testing
  • Ability to mentor and guide team members, particularly offshore testers
Job Responsibility
Job Responsibility
  • Analyze product documentation, including Use Cases, User Stories, and UML Diagrams, to develop detailed test cases
  • Create and maintain comprehensive test suites to ensure thorough coverage of all functionalities
  • Execute manual testing to validate the functionality and performance of applications
  • Experience with test automation, particularly with tools such as Playwright and Cypress
  • Organize and facilitate User Acceptance Testing (UAT) sessions, ensuring stakeholder involvement and feedback
  • Develop and maintain a robust test strategy that aligns with project objectives and timelines
  • Provide guidance and instruction to offshore testers, ensuring consistency and quality in testing practices
  • Collaborate and align with the TOP Overall Test Lead to ensure cohesive testing efforts across the organization
  • Evaluate LLM output quality using defined benchmarks, including feedback analysis and continuous improvement of evaluation frameworks
  • Contribute to test automation and data-driven validation processes for AI-powered features
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Early bird or night owl? Our projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance
  • Individual coaching sessions or accredited Coaching School
  • Parties or themed events
  • Fulltime
Read More
Arrow Right

Middle QA Automation Engineer

We are seeking a motivated QA Automation Engineer to join our team and contribut...
Location
Location
Salary
Salary:
Not provided
maddevs.io Logo
Mad Devs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of hands-on manual testing experience
  • 2+ years of proven experience as a QA engineer with strong skills in Python-based test automation
  • Proficient in testing web applications and AI-powered platforms on both real devices and emulators
  • Familiar with CI/CD workflows, monitoring tools, and version control systems such as Git
  • Comfortable working in fast-paced, distributed startup environments, demonstrating the ability to work independently without micromanagement
  • Clear and effective communicator, capable of collaborating across teams and time zones while maintaining thorough documentation
  • Language skills: English proficiency at B2-C1 level and Russian at B2 level
Job Responsibility
Job Responsibility
  • Test AI/LLM-based client-server applications, focusing on functionality, performance, and reliability
  • Develop and maintain automated test scripts in Python
  • perform manual testing when necessary
  • Utilize QA tools such as Playwright, Selenium, Postman, and TestRail to ensure thorough testing coverage
  • Guarantee quality across infrastructure components, CI/CD pipelines, and integrations
  • Prepare and update detailed test documentation, test scenarios, and defect reports
  • Collaborate closely with developers, DevOps engineers, and product managers to align priorities and deliver high-quality software releases
What we offer
What we offer
  • Flexible working hours
  • Remote-first culture
  • Long-term projects
  • Salary in dollars
  • Professional communities
  • Onsite business trips
  • Training budget
  • Paid conferences
Read More
Arrow Right