This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Adyen is building a world-class AI team to redefine what intelligent systems can do in financial technology. As a Senior AI Research Engineer, you will take on some of the most technically demanding work in applied AI: designing agents that reason over complex, multi-step tasks; building the evaluation infrastructure that makes those systems trustworthy in production; and shaping how humans and AI collaborate at scale within a global payments company. This is not a narrow research role. You will take full ownership of your work, from early research through deployed production systems, influence the team's technical direction, and act as a force multiplier for the broader AI organization — including contributing to custom model development for structured financial data, and working toward our longer-term ambition of defining how humans and AI collaborate at scale across the company.
Job Responsibility
Design and Deploy AI Agents for Complex Tasks: Lead the research, design, and deployment of AI agents built for long-horizon, multi-step tasks in real-world financial contexts
Own Evaluation and Benchmarking: Define and lead the evaluation strategy for the agentic systems and LLMs your team builds and deploys
Provide AI Expertise Across the Organization: Serve as a technical resource for AI initiatives across Adyen
Raise the Bar: Set engineering standards for the team and company, provide mentorship through problem decomposition, research methodology, and code review
Requirements
6+ years of hands-on experience in applied AI/ML research or engineering, with a clear track record of shipping AI systems, including agentic or LLM-powered systems, in production environments
deep expertise in language models and Generative AI, with hands-on depth across several of: architecture, post-training (fine-tuning, RLHF), inference optimization, context engineering, and failure modes at scale
proven experience designing and operating agentic systems at scale, multi-agent orchestration, tool use, memory and context management, state handling for long-running workflows, and human-in-the-loop design
rigorous and systematic about evaluation, designed evaluation frameworks or internal benchmarks that go beyond standard metrics
strong foundation in classical machine learning: supervised learning, ensemble methods, optimization, probabilistic modeling, and statistics