This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and production systems, supporting both internal and external use cases across various environments. The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging research and engineering to deliver seamless experiences to our customers and accelerate innovation across the company.
Job Responsibility:
Build and maintain fault-tolerant, high-performance systems for serving LLMs workloads at scale
Build an internal platform to empower LLM capability discovery
Collaborate with researchers and engineers to integrate and optimize models for production and research use cases
Conduct architecture and design reviews to uphold best practices in system design and scalability
Develop monitoring and observability solutions to ensure system health and performance
Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment
Requirements:
4+ years of experience building large-scale, high-performance backend systems
Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++)
Experience with LLM serving and routing fundamentals (e.g. rate limiting, token streaming, load balancing, budgets, etc.)
Experience with LLM capabilities and concepts such as reasoning, tool calling, prompt templates, etc.
Experience with containers and orchestration tools (e.g., Docker, Kubernetes)
Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform)
Proven ability to solve complex problems and work independently in fast-moving environments
Nice to have:
Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference