This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As an ML Infra Engineer, you’ll play a key role in building the inference and training frameworks that make it possible to deliver results at scale. You’ll collaborate closely with our ML and Platform teams to scale training across nodes, develop faster and more efficient serving, and create observability across the stack. This is a high-impact role where you’ll help define what high performance ML training and inference look like at Reducto.
Job Responsibility:
Build and maintain our training and inference stack with an emphasis for fast iteration on training + flexibility for exploring new methods and high performance in inference
Develop benchmarks for both sets of stacks to identify bottlenecks
Explore SOTA advances in training and inference and work to apply them
Design systems for scaling model training across multi-node, multi-GPU environments with strong reliability and observability
Scale distributed training and inference workloads across large GPU clusters while improving utilization, reliability, and cost efficiency
Build the tooling, abstractions, and observability that help ML engineers move faster from experiment to production
Requirements:
Hold yourself to a high bar for quality and precision
Enjoy solving complex problems and building from first principles
Strong Python skills + a background in systems engineering
Comfortable with Kubernetes and distributed training frameworks
Love getting your hands dirty with real-world implementation challenges
Operate well in fast-changing, high-growth environments
Collaborate effectively across technical and non-technical teams
Take full ownership from strategy through execution
Nice to have:
Experience at an early-stage or high-growth startup
Developed in open source training/inference stacks in a meaningful way
Excited to set up distributed inference across 100s-1000s of GPUs
Care deeply about combining technical excellence with business impact