This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
You are a platform or DevOps engineer with strong experience running complex systems in production, ideally with exposure to AI/ML infrastructure and large-scale environments. You understand how to operate workloads reliably at scale and are comfortable working with modern tooling across cloud, Kubernetes, and automation. You are focused on infrastructure and platform engineering rather than data science, with a strong emphasis on reliability, performance, and operational excellence.
Job Responsibility:
Design, build, and maintain a scalable platform for serving LLM workloads in production
Deploy and manage containerised workloads on Kubernetes, including GPU-based infrastructure
Implement and optimise model serving solutions (e.g. vLLM, Triton, TGI)
Set up monitoring and observability using tools such as Prometheus and Grafana
Build and improve CI/CD pipelines and automate infrastructure using Python and Infrastructure as Code
Requirements:
Strong experience with Kubernetes in production and solid Linux systems knowledge
Hands-on experience with GPU infrastructure (e.g. NVIDIA A100/H100) and LLM/ML model serving
Experience with CI/CD tools (Azure DevOps, GitLab CI, Jenkins) and Python scripting
Familiarity with monitoring tools (Prometheus, Grafana) and infrastructure automation (Terraform, Ansible)
Experience in regulated environments or cost optimisation for high-performance workloads is a plus
We specifically need people who have worked with large language models and GPU-based inference at scale
Nice to have:
Experience in regulated environments or cost optimisation for high-performance workloads is a plus