This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Boston Dynamics’ mission is to image and create robots that enrich people’s lives. Our mobile robots operate in the most challenging and remote environments imaginable, from industrial sites to disaster zones. We are a passionate team of innovators, thinkers, and builders dedicated to creating products that our users love. To bolster our mission, we're looking for a talented ML Ops Engineer to join the Central Software (CSW) Machine Learning Platform team. In this role, you will play an active role in implementing, scaling, and extending the tools, infrastructure, and pipelines helping unify how our product and research teams leverage ML at Boston Dynamics. This is your chance to work closely with ML/RL engineers and researchers from across the company, supporting advanced product / research activities at the forefront of robotics innovation.
Job Responsibility:
ML Operations: Evolve/scale/optimize fielded solutions and enable orchestration of ML training workloads on GPU clusters
ML Infrastructure Support: Work closely with others to implement, deploy, and maintain ML infrastructure
New Capabilities: Transform proofs of concepts into scalable solutions, helping deliver new robot capabilities to customers
Engagement: Work with stakeholders across BD to understand requirements, ensuring deployed solutions meet end-user needs
Ownership: Own the end-to-end spanning implementation, testing, deploying, and monitoring
Coordination: Participate in agile development process, work with others, identify challenges, and regularly communicate progress
Mentorship: Use your experience to mentor/upskill peers and other contributors across the organization
Requirements:
7+ years experience as an ML Platform engineer
Demonstrated expert-level proficiency in Python (mandatory) and system programming (e.g., Go, C++, Rust)
Demonstrated proficiency managing and configuring cloud resources Infrastructure as Code (e.g., Terraform, Ansible)
Expert in scalable ML deployments via kubernetes
Hands-on knowledge of designing pipelines to automate code deployment, model training, and validation using Argo CD, CI, etc.
Configuring and using observability metrics to drive improvements in system metrics (e.g., CPU/GPU/Latency/Performance)
Experience managing/operating in hybrid hosted/on-prem compute environments
Experience working collaboratively in cross-functional team using Agile, Scrum, or other lean approach
Bachelors in Engineering, Computer Science, or other technical area
Nice to have:
Experience with data processing, data augmentation, and data cleaning techniques
Knowledge of distributed computing & big data technologies for large datasets (e.g., Spark, Scala, etc.) and building data pipelines
Knowledge of ML-related frameworks (PyTorch, Tensorflow, Pandas, Numpy)
Knowledge of Deep Learning methodologies specific to Computer Vision like YOLO
Experience with Annotation tools using SAM, Co-Tracker, etc.