This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Together AI is building the AI Inference & Model Shaping Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant serverless workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and reasoning models at scale. We are looking for an exceptional MLOps Engineering Lead to partner closely with our cross-functional engineering, infrastructure, research, and sales teams to ensure excellence of our ML API offerings. Your primary focus will be on delivering world-class inference and fine-tuning in our public APIs and customer deployments by building automation and operations processes.
Job Responsibility:
Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments
Own & improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs
Build self-serve tooling and automation to reduce operational toil and enable internal users (MLOps, customer experience) and self-serve offerings
Define and enforce configuration best practices for inference engines (vLLM, tvLLM, Pulsar) to prevent runtime issues
Lead incident response, conduct postmortems, and drive reliability improvements
Hire, mentor, and grow an MLOps engineering team
Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency
Requirements:
5+ years operating production ML inference or training systems at scale
2+ years leading engineering teams, with experience building teams from scratch
Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks
Strong track record owning production SLAs (e.g. availability, TTFT, TPS)
Experience with LLM inference serving systems (vLLM, TRT-LLM, or similar)
Ability to influence cross-functional teams and make deployment/architecture decisions
Nice to have:
Experience building internal developer platforms or self-serve tooling
Background in cost optimization for GPU infrastructure
Contributions to open-source ML infrastructure projects