This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks (e.g. PyTorch) onto autonomous vehicle hardware. Our mission is two-fold: build the ML deployment platform that makes model rollouts fast and predictable, and optimize models so they meet the real-time latency and memory budgets required to run on-vehicle. Our work is on the critical path of GM's publicly committed launch of eyes-off (hands-free, eyes-free) autonomous driving in 2028, debuting on the Cadillac Escalade IQ, building on Super Cruise's billion-plus hands-free miles.
Job Responsibility:
Design, build, and operate the ML deployment platform that automates the path from trained model to on-vehicle inference
Drive cross-organization model deployments to the autonomous vehicle stack, partnering with model development teams to take high-value models from training to production on-vehicle
Build agentic tools that diagnose and fix deployment-blocking issues, automating workflows currently performed manually by engineers
Build the developer experience that ML model development teams use day to day: tooling, dashboards, automation, and observability
Drive shift-left validation that surfaces deployment risk (compile, runtime, parity, latency) early in the model development cycle
Build platform tools that integrate the work of our sister teams (kernels, compiler, reduced precision and parity) so their optimization wins land directly in the deployment workflow
Partner with the team's Performance pillar and model development teams across the AV organization
Requirements:
BS, MS, or PhD in Computer Science or a related technical field
3+ years of relevant industry experience
Strong fundamentals and excellent coding ability in Python
Experience building or operating production platform or infrastructure systems where reliability, observability, and extensibility matter
Experience with ML model deployment, inference integration, model optimization workflows, or model serving infrastructure, with at least one prior context where you owned the path from a trained model to a running inference workload
Experience using coding agents (Cursor, Claude Code, GitHub Copilot, or equivalent) as part of your engineering workflow
Experience designing clean, well-tested software with clear interfaces and good abstractions
Strong cross-team collaboration skills
Nice to have:
Experience building agentic or LLM-powered developer tooling
Experience with ML or workflow orchestration frameworks (Airflow, Temporal, Flyte, Ray, Kubeflow, or equivalent)
Familiarity with the NVIDIA GPU stack at the integration level (CUDA-aware Python, TensorRT, Triton inference server, torch.compile, ONNX)
Experience with inference-serving frameworks (Triton, TorchServe, Ray Serve, vLLM) or edge-deployment toolchains
Experience with low-latency or real-time systems
Experience in autonomous vehicles, robotics, or other safety-critical ML deployment domains
Open-source contributions to PyTorch, Ray, Airflow, Temporal, vLLM, TensorRT, or related projects