This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a chance to join a fast growing tech unicorn building a next generation AI platform for developers. The company originally made its name by helping engineering teams dramatically optimise cloud spend. That same mindset is now being applied to AI infrastructure, helping developers build, deploy and scale AI powered features faster, more efficiently, and at a significantly lower cost. You’ll be joining early in a brand new team, working on a genuine 0→1 product build where engineering quality and performance really matter. This isn’t about academic benchmarks or theoretical work, it’s about solving real production problems at scale.
Job Responsibility:
Deep in inference optimisation work using tools like vLLM, Triton, SGLang, and TensorRT
Pushing performance in the real world: reducing latency, improving model initialisation times and building distributed systems that make high performance AI both accessible and cost efficient
Requirements:
Expert in Python with proven experience in ML inference optimisation
Hands on experience tuning inference engines and working with production scale systems using tools like vLLM, Triton, SGLang, and TensorRT