This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Lead AI Red Teaming & QA Engineer to design and execute automated adversarial testing for our enterprise Agentic AI platforms. You will move beyond traditional software QA to build continuous safety pipelines, ensuring our non-deterministic LLM agents, RAG systems, and tool integrations are secure, resilient, and compliant before production release.
Job Responsibility:
Automated Adversarial Testing: Build and integrate automated red teaming suites into CI/CD pipelines using frameworks like Garak, Pyrit, and AgentDojo to enforce strict safety release gates
AI Evaluation Frameworks: Develop metrics and continuous testing for core AI risks, including hallucinations, memorisation, algorithmic bias, uncertainty, and model drift
Regulatory Compliance Evidence: Map threat models (OWASP LLM Top 10, Agentic threats) to automated test cases. Produce the technical testing evidence required by EU AI Act Article 15, DORA, and FCA Operational Resilience guidelines
Centralised AI-BOM Platform: Own the enterprise AI Bill of Materials (AI-BOM), tracking model lineages, dataset versions, and signed artifacts as a centralized evaluation service
Requirements:
Regulated Finance: Proven experience testing software within FCA, DORA, or EU AI Act frameworks
AWS Bedrock Ecosystem: Hands-on experience configuring, testing, and bypassing Bedrock Guardrails, Agents, and Knowledge Bases (RAG)
AI Security & Fundamentals: Solid understanding of Foundation Models, tool use (function calling), OWASP LLM Top 10, and NIST AI RMF
Automation Stack: Strong Python development skills, experience with AI eval tools (Garak, Pyrit, Ragas), and building complex CI/CD test pipelines