This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting-edge AI solutions. Join us in shaping the future at Together AI!
Job Responsibility:
Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale
Develop and optimize runtime inference services for large-scale AI applications
Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world
Conduct design and code reviews to ensure high standards of quality
Create services, tools, and developer documentation to support the inference engine
Implement robust and fault-tolerant systems for data ingestion and processing
Requirements:
3+ years of experience writing high-performance, well-tested, production-quality code
Proficiency with Python and PyTorch
Demonstrated experience in building high performance libraries and tooling
Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale
Nice to have:
Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum
Knowledge of AI inference techniques such as speculative decoding