This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
LogicMonitor is looking for a highly skilled Senior QA/SDET with 4 to 5 years of experience to build and scale automated test frameworks for Generative AI features across our observability platform. In this role, you will ensure the quality, reliability, safety, and performance of LLM-based workflows, including AI assistants, Retrieval-Augmented Generation (RAG) pipelines, AI-generated incident summaries, auto-remediation agents, tool calling, and AI-driven insights. You will work closely with engineering, product, and applied AI teams to validate AI experiences in production.
Job Responsibility:
Test Strategy for GenAI Features: Define end-to-end test strategies for GenAI-driven product features
Establish quality standards for AI output
Automation Framework & End-to-End Testing: Build scalable automation test frameworks for API and UI experiences
Automate validation of AI endpoints, workflows, and behaviors
Develop regression test packs
AI Evaluation (LLM Testing): Create and maintain LLM evaluation test suites
Build automated pipelines for drift testing and regression testing
Reliability, Performance & Scale Testing: Design and implement performance and load tests for AI systems
Ensure AI systems meet SLOs and performance targets
Safety, Security & Compliance Testing: Validate AI system robustness against attacks and risks
Build guardrail validation tests
Observability & Debuggability for AI Testing: Collaborate with engineering teams to enhance AI observability
Use monitoring and telemetry to detect regressions
Requirements:
4 to 5 years of experience as an SDET / QA Automation Engineer
Strong understanding of: Microservices and distributed systems testing, Asynchronous workflows and queues, Cloud-native architecture and reliability testing
Proven ability to build test strategies for: Functional, regression, integration, contract, and performance testing
Experience testing LLM-based systems, such as: AI assistants (multi-turn chat), RAG pipelines, Agentic workflows (tool calling, orchestration)
Strong understanding of common GenAI failure patterns: Hallucinations, Prompt injection, Retrieval failures, Toxicity and unsafe responses
Ability to create evaluation datasets and rubrics for AI correctness