CrawlJobs Logo

Staff ML Engineer, Inference Platform

United States, Sunnyvale 185500.00 - 270000.00 USD / Year · Job Posted March 19, 2026
Apply Position
Job Link Share

Job Description

The ML Inference Platform is part of the AI Compute Platforms organization within Infrastructure Platforms. Our team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AI efforts. We’re proud to serve as the AI infrastructure platform for teams developing autonomous vehicles (L3/L4/L5), as well as other groups building AI-driven products for GM and its customers. We enable rapid innovation and feature development by optimizing for high-priority, ML-centric use cases. Our platform supports the serving of state-of-the-art (SOTA) machine learning models for experimental and bulk inference, with a focus on performance, availability, concurrency, and scalability. We’re committed to maximizing GPU utilization across platforms (B200, H100, A100, and more) while maintaining reliability and cost efficiency.

Job Responsibility

  • Design and implement core platform backend software components
  • Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
  • Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms
  • Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
  • Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
  • Lead large-scale technical initiatives across GM’s ML ecosystem
  • Raise the engineering bar through technical leadership, establishing best practices
  • Contribute to open source projects
  • represent GM in relevant communities

Requirements

  • 8+ years of industry experience, with focus on machine learning systems or high performance backend services
  • Expertise in either Go, Python, C++ or other relevant coding languages
  • Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
  • Strong communication skills and a proven ability to drive cross-functional initiatives
  • Experience working with cloud platforms such as GCP, Azure, or AWS
  • Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities

Nice to have

  • Hands-on experience building ML infrastructure platforms for model serving/inference
  • Experience working with or designing interfaces, apis and clients for ML workflows
  • Experience with Ray framework, and/or vLLM
  • Experience with distributed systems, and handling large-scale data processing
  • Familiarity with telemetry, and other feedback loops to inform product improvements
  • Familiarity with hardware acceleration (GPUs) and optimizations for inference workloads
  • Contributions to open-source ML serving frameworks

What we offer

  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts
  • company vehicle evaluation program

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Staff ML Engineer, Inference Platform

8 matching positions

Member of Technical Staff - Platform Engineer

Platform Engineer to join our team building backend infrastructure for new ML-po...
Location
Location
United States , Palo Alto
Salary
Salary:
175000.00 - 350000.00 USD / Year
inflection.ai Logo
Inflection AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Backend engineering experience with Python, TypeScript, or Node.js
  • Hands-on experience working with production PyTorch models, model checkpoints, and inference logic
  • Strong knowledge of building APIs and services that are scalable, stable, and secure
  • Passion for bridging backend engineering and ML systems, especially at the infrastructure layer
  • Familiarity with tools such as FastAPI, Postgres, Redis, Kubernetes, and React
  • Desire to be hands-on and contribute to shaping the foundation of a new enterprise ML product
  • Have a bachelor’s degree or equivalent in a related field to the offered position requirements
Job Responsibility
Job Responsibility
  • Build and maintain backend services to support LLM integration, inference orchestration, and data flow
  • Write clean, reliable Python code for experimentation, model integration, and production systems
  • Collaborate closely with ML researchers to rapidly iterate on product ideas and deploy features
  • Design and implement infrastructure to handle scalable inference workloads and enterprise-level use cases
  • Own system components and ensure reliability, observability, and maintainability from day one
What we offer
What we offer
  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
  • Competitive stock options
Read More
Arrow Right

Staff Embedded ML Engineer, Edge AI

We are seeking a highly motivated and experienced Embedded Machine Learning Engi...
Location
Location
United States , Boston
Salary
Salary:
183300.00 - 268800.00 USD / Year
simplisafe.com Logo
SimpliSafe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in embedded systems and/or performance engineering, with experience shipping production software on constrained devices
  • Strong C/C++ expertise with deep knowledge of low-level performance topics: CPU architecture, memory hierarchy, concurrency, and real-time considerations
  • Demonstrated experience optimizing ML inference on embedded targets, including operator/kernel tuning and end-to-end pipeline optimization
  • Familiarity with modern vision model families (transformer-based detectors such as DEIM/DFINE/RT-DETR series and CNN-based detectors such as YOLO family or similar) sufficient to optimize their execution characteristics (tensor shapes, attention/conv patterns, post-processing)
  • Experience with on-device inference runtimes and deployment workflows (e.g., TFLite, ONNX Runtime, TensorRT or vendor runtimes), including operator support constraints and graph-level transformations
  • Strong debugging and profiling skills (perf, flame graphs, hardware counters, tracing) and ability to drive performance investigations to closure
  • Ability to lead cross-functionally across ML, firmware, and hardware teams
  • comfortable defining benchmarks/KPIs and making tradeoffs
Job Responsibility
Job Responsibility
  • Own the embedded deployment and performance of on-device ML inference for outdoor monitoring workloads (real-time video/event pipelines)
  • Optimize end-to-end inference performance across CPU/DSP/NPU/GPU (as applicable): latency, throughput (FPS), memory footprint, power, thermals, startup time, and stability
  • Perform kernel/operator-level optimization: vectorization (e.g., SIMD/NEON), tiling, cache-friendly memory layouts
  • reducing bandwidth and memory copies, optimizing post-processing
  • fusing ops, minimizing synchronization/overhead, thread scheduling
  • Integrate and maintain ML models within embedded pipelines: model import/export validation, operator compatibility, graph transforms
  • runtime integration in C/C++ (including pre/post-processing)
  • robust error handling, watchdogs, and safe fallback behavior
  • Drive quantization and deployment readiness from an embedded perspective: validate INT8/FP16 paths, calibration flows, numerical accuracy checks
  • debug quantization edge cases and operator mismatches on target runtimes
What we offer
What we offer
  • A mission- and values-driven culture and a safe, inclusive environment where you can build, grow and thrive
  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
  • Participation in our annual bonus program, equity, and other forms of compensation, in addition to a full range of medical, retirement, and lifestyle benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI Agent Platform

The Geico AI Agent Platform team is seeking an exceptional Staff Software Engine...
Location
Location
United States , Chevy Chase; New York City
Salary
Salary:
115000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, Mathematics, or a related field
  • an advanced degree (master’s or Ph.D.) is highly desirable
  • 6+ years of hands-on experience in designing, implementing, and maintaining multi-tenant AIML systems and platforms in production environments
  • 6+ years of experience working with cloud platforms such as Azure and AWS
  • Extensive expertise in designing and deploying large-scale data pipelines and real-time inference systems and managing the end-to-end AI Agent and/or AIML system development lifecycles, including configuration, evaluation, monitoring, observability and AuthN/AuthR considerations
  • 6+ years of experience working with common backend systems & tools (e.g, Kubernetes, Temporal, OpenSearch, PostgreSQL, Redis, Neo4J, etc.)
  • Deep understanding of Docker, container optimization, and multi-stage builds
  • Experience with Prometheus, Grafana, Open Telemetry and distributed tracing
  • 3+ years of experience building front-end web applications using frameworks such as React and/or Next.JS
  • Deep proficiency in programming languages such as Python, Java, Go, etc., with a strong emphasis on coding excellence
Job Responsibility
Job Responsibility
  • Architect and implement scalable multi-tenant backend systems for building AI agent workflows, including agent configuration, offline evaluation, synthetic data generation, workflow simulation, agent marketplace, etc. using Azure Kubernetes Service (AKS), FastAPI, etc., ensuring economy of scale and control cost of maintenance
  • Collaborate with Design team to architect and implement frontend experiences and workflows for onboarding both technical and non-technical stakeholders, maximizing user adoption and successful AI agent development
  • Develop observability frameworks to ensure 99.9%+ uptime for AI agent platforms through robust monitoring, alerting, and incident response procedures
  • Evaluate and (if desirable) integrate cutting-edge GenAI frameworks, libraries and vendors to maintain a state-of-the-art technology stack, including hybrid cloud solutions with AWS/GCP as backup or specialized use cases
  • Architect and implement scalable, high-performance machine learning platforms and systems capable of processing large data volumes and supporting real-time decision making and workflows
  • Oversee the end-to-end lifecycle of AI agent applications, ensuring robust testing, deployment, and ongoing monitoring
  • Ensure adherence to company production readiness standards, security protocols, and regulatory compliance throughout the development lifecycle
  • Continuously optimize platform performance, reducing latency and improving throughput for AI agent workloads
  • Design and implement backup, recovery, and business continuity plans for hosted platform applications & services
  • Design and maintain robust CI/CD pipelines for ML model deployment using Azure DevOps, GitHub Actions, and MLOps tools
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Sr. Staff Engineer (Conversational/Voice AI)

Uber’s Customer Obsession team builds the platform and AI that powers world‑clas...
Location
Location
United States , Sunnyvale, California; San Francisco, California
Salary
Salary:
267000.00 - 297000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years building production ML/AI systems
  • 4+ years leading complex ML initiatives end‑to‑end
  • Deep expertise in LLM‑driven systems (inference optimization, prompt/program design, fine‑tuning, distillation/LoRA, safety/guardrails, evals)
  • Strong software engineering in Python plus one of Go/Java/C++
  • hands‑on with microservices, gRPC/HTTP, cloud infra, containers, CI/CD, and real‑time telemetry/observability
  • Demonstrated ownership of high‑availability services (SLO/SLA design, incident response, on‑call leadership, postmortems)
  • Track record of shipping customer‑facing intelligent experiences with measurable impact (A/B testing, metrics literacy)
Job Responsibility
Job Responsibility
  • Own the end‑to‑end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on‑brand conversations
  • Ship production systems that handle millions of conversations with rigorous SLOs, fallbacks, and canaries
  • design graceful degradation (e.g., human handoff) and safety guardrails (prompt‑injection, jailbreak, PII redaction)
  • Lead voice agent initiatives: Drive the development of Uber’s voice support agent—covering real-time speech recognition (ASR), text-to-speech, natural turn-taking (barge-in and endpointing), and reliable telephony/WebRTC integration
  • Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded
  • Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM‑as‑judge (with calibrated human review) wired into CI/CD and experiment platforms
  • Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact
  • Mentor/principal‑lead multiple pods
  • set technical strategy and quality bars
  • coach senior engineers on agentic patterns, reliability, and experiment velocity
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • may be offered an equity award & other types of comp
  • eligible for various benefits (details at https://www.uber.com/careers/benefits)
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - Backend Gen Ai

The Media Platform team builds Uber's unified, scalable infrastructure for inges...
Location
Location
United States , Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of backend engineering experience, with deep expertise in distributed systems and large-scale service architecture
  • Strong backend engineering experience (Go, Java, C++, or similar) with expertise in system design, performance optimization, and reliability
  • Experience building high-throughput, low-latency services handling large data volumes (streaming, storage, or media systems)
Job Responsibility
Job Responsibility
  • Architect and scale distributed backend systems that support media ingestion, processing, intelligence, and delivery across global regions
  • Improve performance, reliability, and cost efficiency of high-throughput media pipelines
  • Design infrastructure that enables efficient integration and execution of ML inference workloads within media systems
  • Drive technical strategy and long-term architectural decisions across the Media Platform
  • Mentor engineers and raise the bar for engineering excellence, operational rigor, and system design
What we offer
What we offer
  • Bonus program
  • Equity award
  • 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Palo Alto
Salary
Salary:
90000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right