This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Design and develop testing frameworks for AI/ML models and LLM applications. Build automated pipelines for model validation, regression testing, and benchmarking. Create evaluation datasets, synthetic data, and test scenarios for edge cases. Implement metrics to assess accuracy, robustness, latency, and safety. Develop tools for prompt testing, output validation, and hallucination detection. Collaborate with engineers, and product teams to define test strategies. Monitor model performance in production and build alerting systems. Ensure compliance with ethical AI standards, fairness, and bias testing. Debug model behavior and identify root causes of failures.
Job Responsibility
Design and develop testing frameworks for AI/ML models and LLM applications
Build automated pipelines for model validation, regression testing, and benchmarking
Create evaluation datasets, synthetic data, and test scenarios for edge cases
Implement metrics to assess accuracy, robustness, latency, and safety
Develop tools for prompt testing, output validation, and hallucination detection
Collaborate with engineers, and product teams to define test strategies
Monitor model performance in production and build alerting systems
Ensure compliance with ethical AI standards, fairness, and bias testing
Debug model behavior and identify root causes of failures
Requirements
Bachelor's or Master's degree in Computer Science, AI, Machine Learning, or related field
8+ years of experience in software engineering, QA automation, or ML engineering
Strong programming skills in Python (preferred) or similar languages
Experience with testing frameworks (e.g., PyTest, unittest)
Familiarity with machine learning concepts and model evaluation techniques
Experience working with APIs, distributed systems, and CI/CD pipelines
Knowledge of data structures, algorithms, and software design principles
Nice to have
Experience with LLMs and prompt engineering
Familiarity with evaluation tools like LangChain, OpenAI Evals, or similar frameworks
Knowledge of AI safety, bias detection, and adversarial testing
Experience with cloud platforms (AWS, GCP, and Azure)
Understanding of observability tools and monitoring systems
Exposure to synthetic data generation and simulation environments