This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.
Job Responsibility:
Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models
run them first on wafer scale to expose new optimization opportunities
Requirements:
3+ years building high-performance ML or systems software
Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly
Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
Strong debugging skills across performance, numerical accuracy, and runtime integration
Prior experience in modeling, compilers or crafting benchmarks or performance studies
not just black-box QA tests
Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity
Nice to have:
Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research
Performance-tuning experience on custom silicon, GPUs, or FPGAs
Proficiency in C/C++ programming and experience with low-level optimization
Proven experience in compiler development, particularly with LLVM and/or MLIR
Publications, repos, or blog posts dissecting model speed-ups
Contributions to open-source agent frameworks
What we offer:
Build a breakthrough AI platform beyond the constraints of the GPU
Publish and open source their cutting-edge AI research
Work on one of the fastest AI supercomputers in the world
Enjoy job stability with startup vitality
Our simple, non-corporate work culture that respects individual beliefs