This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Large Language Models (LLMs) continue to push the boundaries of what AI systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for pushing the limits of LLM inference efficiency across our foundation models. We explore and ship breakthroughs across the model execution stack, including: model architecture and MoE routing optimization; decoding and inference-time algorithm improvements; software/hardware co-design for GPU acceleration; performance optimization without compromising model quality.
Job Responsibility:
develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production
Requirements:
Have a PhD in Machine Learning or a related field
Understand LLM architecture, and how to optimize LLM inference given resource constraints
Have significant experience with one or more techniques that enhance model efficiency
Strong software engineering skills
An appetite to work in a fast-paced high-ambiguity start-up environment
Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)
Passion to mentor others
What we offer:
An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend