This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
A year ago, reliably working agentic systems and sub-second multimodal inference at scale barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood.
Job Responsibility:
We hand you unclear problems and expect you to make them clear
We value engineers who say 'I don't know yet' and then design the benchmark or prototype that finds out
We treat performance, latency, and reliability as first-class product features, not a box to check before launch
Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward
Your work should be visible
Requirements:
Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
Model Acceleration. Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding
High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems