This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AI efforts. We’re proud to serve as the AI infrastructure platform for teams developing autonomous vehicles (L3/L4/L5), as well as other groups building AI-driven products for GM and its customers. We enable rapid innovation and feature development by optimizing for high-priority, ML-centric use cases. Our platform supports the serving of state-of-the-art (SOTA) machine learning models for experimental and bulk inference, with a focus on performance, availability, concurrency, and scalability. We’re committed to maximizing GPU utilization across platforms (B200, H100, A100, and more) while maintaining reliability and cost efficiency.
Job Responsibility:
Design and implement core platform backend software components
Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms
Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
Lead large-scale technical initiatives across GM’s ML ecosystem
Raise the engineering bar through technical leadership, establishing best practices
Contribute to open source projects
represent GM in relevant communities
Requirements:
8+ years of industry experience, with focus on machine learning systems or high performance backend services
Expertise in either Go, Python, C++ or other relevant coding languages
Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
Strong communication skills and a proven ability to drive cross-functional initiatives
Experience working with cloud platforms such as GCP, Azure, or AWS
Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities
Nice to have:
Hands-on experience building ML infrastructure platforms for model serving/inference
Experience working with or designing interfaces, apis and clients for ML workflows
Experience with Ray framework, and/or vLLM
Experience with distributed systems, and handling large-scale data processing
Familiarity with telemetry, and other feedback loops to inform product improvements
Familiarity with hardware acceleration (GPUs) and optimizations for inference workloads
Contributions to open-source ML serving frameworks