Staff ML Engineer, Inference Platform Job at General Motors (Sunnyvale)

Member of Technical Staff - Platform Engineer

Platform Engineer to join our team building backend infrastructure for new ML-po...

Location

United States , Palo Alto

Salary:

175000.00 - 350000.00 USD / Year

Inflection AI

Expiration Date

Until further notice

Requirements

Backend engineering experience with Python, TypeScript, or Node.js
Hands-on experience working with production PyTorch models, model checkpoints, and inference logic
Strong knowledge of building APIs and services that are scalable, stable, and secure
Passion for bridging backend engineering and ML systems, especially at the infrastructure layer
Familiarity with tools such as FastAPI, Postgres, Redis, Kubernetes, and React
Desire to be hands-on and contribute to shaping the foundation of a new enterprise ML product
Have a bachelor’s degree or equivalent in a related field to the offered position requirements

Job Responsibility

Build and maintain backend services to support LLM integration, inference orchestration, and data flow
Write clean, reliable Python code for experimentation, model integration, and production systems
Collaborate closely with ML researchers to rapidly iterate on product ideas and deploy features
Design and implement infrastructure to handle scalable inference workloads and enterprise-level use cases
Own system components and ensure reliability, observability, and maintainability from day one

What we offer

Diverse medical, dental and vision options
401k matching program
Unlimited paid time off
Parental leave and flexibility for all parents and caregivers
Support of country-specific visa needs for international employees living in the Bay Area
Competitive stock options

Staff Embedded ML Engineer, Edge AI

We are seeking a highly motivated and experienced Embedded Machine Learning Engi...

Location

United States , Boston

Salary:

183300.00 - 268800.00 USD / Year

SimpliSafe

Expiration Date

Until further notice

Requirements

8+ years of experience in embedded systems and/or performance engineering, with experience shipping production software on constrained devices
Strong C/C++ expertise with deep knowledge of low-level performance topics: CPU architecture, memory hierarchy, concurrency, and real-time considerations
Demonstrated experience optimizing ML inference on embedded targets, including operator/kernel tuning and end-to-end pipeline optimization
Familiarity with modern vision model families (transformer-based detectors such as DEIM/DFINE/RT-DETR series and CNN-based detectors such as YOLO family or similar) sufficient to optimize their execution characteristics (tensor shapes, attention/conv patterns, post-processing)
Experience with on-device inference runtimes and deployment workflows (e.g., TFLite, ONNX Runtime, TensorRT or vendor runtimes), including operator support constraints and graph-level transformations
Strong debugging and profiling skills (perf, flame graphs, hardware counters, tracing) and ability to drive performance investigations to closure
Ability to lead cross-functionally across ML, firmware, and hardware teams
comfortable defining benchmarks/KPIs and making tradeoffs

Job Responsibility

Own the embedded deployment and performance of on-device ML inference for outdoor monitoring workloads (real-time video/event pipelines)
Optimize end-to-end inference performance across CPU/DSP/NPU/GPU (as applicable): latency, throughput (FPS), memory footprint, power, thermals, startup time, and stability
Perform kernel/operator-level optimization: vectorization (e.g., SIMD/NEON), tiling, cache-friendly memory layouts
reducing bandwidth and memory copies, optimizing post-processing
fusing ops, minimizing synchronization/overhead, thread scheduling
Integrate and maintain ML models within embedded pipelines: model import/export validation, operator compatibility, graph transforms
runtime integration in C/C++ (including pre/post-processing)
robust error handling, watchdogs, and safe fallback behavior
Drive quantization and deployment readiness from an embedded perspective: validate INT8/FP16 paths, calibration flows, numerical accuracy checks
debug quantization edge cases and operator mismatches on target runtimes

What we offer

A mission- and values-driven culture and a safe, inclusive environment where you can build, grow and thrive
A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
Free SimpliSafe system and professional monitoring for your home
Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
Participation in our annual bonus program, equity, and other forms of compensation, in addition to a full range of medical, retirement, and lifestyle benefits

Fulltime

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer, AI Agent Platform

The Geico AI Agent Platform team is seeking an exceptional Staff Software Engine...

Location

United States , Chevy Chase; New York City

Salary:

115000.00 - 260000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, Mathematics, or a related field
an advanced degree (master’s or Ph.D.) is highly desirable
6+ years of hands-on experience in designing, implementing, and maintaining multi-tenant AIML systems and platforms in production environments
6+ years of experience working with cloud platforms such as Azure and AWS
Extensive expertise in designing and deploying large-scale data pipelines and real-time inference systems and managing the end-to-end AI Agent and/or AIML system development lifecycles, including configuration, evaluation, monitoring, observability and AuthN/AuthR considerations
6+ years of experience working with common backend systems & tools (e.g, Kubernetes, Temporal, OpenSearch, PostgreSQL, Redis, Neo4J, etc.)
Deep understanding of Docker, container optimization, and multi-stage builds
Experience with Prometheus, Grafana, Open Telemetry and distributed tracing
3+ years of experience building front-end web applications using frameworks such as React and/or Next.JS
Deep proficiency in programming languages such as Python, Java, Go, etc., with a strong emphasis on coding excellence

Job Responsibility

Architect and implement scalable multi-tenant backend systems for building AI agent workflows, including agent configuration, offline evaluation, synthetic data generation, workflow simulation, agent marketplace, etc. using Azure Kubernetes Service (AKS), FastAPI, etc., ensuring economy of scale and control cost of maintenance
Collaborate with Design team to architect and implement frontend experiences and workflows for onboarding both technical and non-technical stakeholders, maximizing user adoption and successful AI agent development
Develop observability frameworks to ensure 99.9%+ uptime for AI agent platforms through robust monitoring, alerting, and incident response procedures
Evaluate and (if desirable) integrate cutting-edge GenAI frameworks, libraries and vendors to maintain a state-of-the-art technology stack, including hybrid cloud solutions with AWS/GCP as backup or specialized use cases
Architect and implement scalable, high-performance machine learning platforms and systems capable of processing large data volumes and supporting real-time decision making and workflows
Oversee the end-to-end lifecycle of AI agent applications, ensuring robust testing, deployment, and ongoing monitoring
Ensure adherence to company production readiness standards, security protocols, and regulatory compliance throughout the development lifecycle
Continuously optimize platform performance, reducing latency and improving throughput for AI agent workloads
Design and implement backup, recovery, and business continuity plans for hosted platform applications & services
Design and maintain robust CI/CD pipelines for ML model deployment using Azure DevOps, GitHub Actions, and MLOps tools

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Sr. Staff Engineer (Conversational/Voice AI)

Uber’s Customer Obsession team builds the platform and AI that powers world‑clas...

Location

United States , Sunnyvale, California; San Francisco, California

Salary:

267000.00 - 297000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

10+ years building production ML/AI systems
4+ years leading complex ML initiatives end‑to‑end
Deep expertise in LLM‑driven systems (inference optimization, prompt/program design, fine‑tuning, distillation/LoRA, safety/guardrails, evals)
Strong software engineering in Python plus one of Go/Java/C++
hands‑on with microservices, gRPC/HTTP, cloud infra, containers, CI/CD, and real‑time telemetry/observability
Demonstrated ownership of high‑availability services (SLO/SLA design, incident response, on‑call leadership, postmortems)
Track record of shipping customer‑facing intelligent experiences with measurable impact (A/B testing, metrics literacy)

Job Responsibility

Own the end‑to‑end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on‑brand conversations
Ship production systems that handle millions of conversations with rigorous SLOs, fallbacks, and canaries
design graceful degradation (e.g., human handoff) and safety guardrails (prompt‑injection, jailbreak, PII redaction)
Lead voice agent initiatives: Drive the development of Uber’s voice support agent—covering real-time speech recognition (ASR), text-to-speech, natural turn-taking (barge-in and endpointing), and reliable telephony/WebRTC integration
Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded
Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM‑as‑judge (with calibrated human review) wired into CI/CD and experiment platforms
Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact
Mentor/principal‑lead multiple pods
set technical strategy and quality bars
coach senior engineers on agentic patterns, reliability, and experiment velocity

What we offer

Eligible to participate in Uber's bonus program
may be offered an equity award & other types of comp
eligible for various benefits (details at https://www.uber.com/careers/benefits)

Fulltime

Staff Software Engineer - Backend Gen Ai

The Media Platform team builds Uber's unified, scalable infrastructure for inges...

Location

United States , Sunnyvale

Salary:

232000.00 - 258000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

10+ years of backend engineering experience, with deep expertise in distributed systems and large-scale service architecture
Strong backend engineering experience (Go, Java, C++, or similar) with expertise in system design, performance optimization, and reliability
Experience building high-throughput, low-latency services handling large data volumes (streaming, storage, or media systems)

Job Responsibility

Architect and scale distributed backend systems that support media ingestion, processing, intelligence, and delivery across global regions
Improve performance, reliability, and cost efficiency of high-throughput media pipelines
Design infrastructure that enables efficient integration and execution of ML inference workloads within media systems
Drive technical strategy and long-term architectural decisions across the Media Platform
Mentor engineers and raise the bar for engineering excellence, operational rigor, and system design

What we offer

Bonus program
Equity award
401(k) plan
Various benefits

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Palo Alto

Salary:

90000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Select Country

Staff ML Engineer, Inference Platform

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?