CrawlJobs Logo

Senior ML Inference Engineer - Platform

United States, Austin Employment contract 128700.00 - 261300.00 USD / Year · Job Posted May 14, 2026
Apply Position
Job Link Share

Job Description

The Model Deployment & Inference Solutions team in GM AV deploys machine learning models from training frameworks (e.g. PyTorch) onto autonomous vehicle hardware. Our mission is two-fold: build the ML deployment platform that makes model rollouts fast and predictable, and optimize models so they meet the real-time latency and memory budgets required to run on-vehicle. Our work is on the critical path of GM's publicly committed launch of eyes-off (hands-free, eyes-free) autonomous driving in 2028, debuting on the Cadillac Escalade IQ, building on Super Cruise's billion-plus hands-free miles.

Job Responsibility

  • Design, build, and operate the ML deployment platform that automates the path from trained model to on-vehicle inference
  • Drive cross-organization model deployments to the autonomous vehicle stack, partnering with model development teams to take high-value models from training to production on-vehicle
  • Build agentic tools that diagnose and fix deployment-blocking issues, automating workflows currently performed manually by engineers
  • Build the developer experience that ML model development teams use day to day: tooling, dashboards, automation, and observability
  • Drive shift-left validation that surfaces deployment risk (compile, runtime, parity, latency) early in the model development cycle
  • Build platform tools that integrate the work of our sister teams (kernels, compiler, reduced precision and parity) so their optimization wins land directly in the deployment workflow
  • Partner with the team's Performance pillar and model development teams across the AV organization

Requirements

  • BS, MS, or PhD in Computer Science or a related technical field
  • 3+ years of relevant industry experience
  • Strong fundamentals and excellent coding ability in Python
  • Experience building or operating production platform or infrastructure systems where reliability, observability, and extensibility matter
  • Experience with ML model deployment, inference integration, model optimization workflows, or model serving infrastructure, with at least one prior context where you owned the path from a trained model to a running inference workload
  • Experience using coding agents (Cursor, Claude Code, GitHub Copilot, or equivalent) as part of your engineering workflow
  • Experience designing clean, well-tested software with clear interfaces and good abstractions
  • Strong cross-team collaboration skills

Nice to have

  • Experience building agentic or LLM-powered developer tooling
  • Experience with ML or workflow orchestration frameworks (Airflow, Temporal, Flyte, Ray, Kubeflow, or equivalent)
  • Familiarity with the NVIDIA GPU stack at the integration level (CUDA-aware Python, TensorRT, Triton inference server, torch.compile, ONNX)
  • Experience with inference-serving frameworks (Triton, TorchServe, Ray Serve, vLLM) or edge-deployment toolchains
  • Experience with low-latency or real-time systems
  • Experience in autonomous vehicles, robotics, or other safety-critical ML deployment domains
  • Open-source contributions to PyTorch, Ray, Airflow, Temporal, vLLM, TensorRT, or related projects
  • 3+ years of relevant industry experience

What we offer

  • Medical
  • Dental
  • Vision
  • Health Savings Account
  • Flexible Spending Accounts
  • Retirement savings plan
  • Sickness and accident benefits
  • Life insurance
  • Paid vacation & holidays
  • Tuition assistance programs
  • Employee assistance program
  • GM vehicle discounts

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior ML Inference Engineer - Platform

8 matching positions

Senior ML Infrastructure Engineer, Inference Platform

About the Team: The ML Inference Platform is part of the AV ML Infrastructure or...
Location
Location
United States , Austin, Texas; Mountain View, California; Sunnyvale, California
Salary
Salary:
155420.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience, with focus on machine learning systems or high performance backend services
  • Expertise in either Python, C++ or other relevant coding languages
  • Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
  • Strong communication skills and a proven ability to drive cross-functional initiatives
  • Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities
Job Responsibility
Job Responsibility
  • Design and implement core platform backend software components
  • Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
  • Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms for highly optimized use of accelerators
  • Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
  • Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
  • Lead technical initiatives across GM’s ML ecosystem
  • Raise the engineering bar through technical leadership, establishing best practices
  • Contribute to open source projects
  • represent GM in relevant communities
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer, AI Platform

We are seeking a skilled and passionate ML Platform Engineer to join our team an...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in backend software development
  • at least 2+ years focus on AI/ML Platform or MLOps infrastructure
  • deep expertise in MLOps practices, including automated deployment pipelines, model optimization, and production lifecycle management
  • proven experience designing and implementing low-latency model serving solutions
  • proficiency in Python
  • skill in writing high-quality, maintainable code
  • experience in design and development of large-scale distributed, high concurrency, low-latency inference, high availability systems
  • excellent communication and mentoring abilities
  • a relevant degree in Computer Science, Mathematics or related fields
Job Responsibility
Job Responsibility
  • Platform Development: Design, build, and maintain the end-to-end MLOps platform using Kubernetes and Cloud Services
  • Infrastructure as Code (IaC): Use Terraform or similar tools to manage, provision, and scale all ML-related infrastructure securely and efficiently
  • Pipeline Automation: Implement and optimize CI/CD/CT (Continuous Integration, Delivery, Training) pipelines to automate model training, testing, packaging, and deployment using tools like Argo and Kubeflow Pipelines
  • Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure
  • Observability: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift
  • Tooling & Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines
  • Security & Compliance: Ensure platform security, implement RBAC (Role-Based Access Control), and manage secrets for sensitive data and production environments
  • Collaboration: Work closely with Data Scientists and ML Engineers to understand their needs and provide technical guidance on best practices for scaling their models
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, ML Platform

We’re looking for a software engineer to join Parafin’s Infrastructure team and ...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 265000.00 USD / Year
parafin.com Logo
Parafin
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, including experience on ML platform/MLOps systems (training, deployment, and/or feature pipelines)
  • Strong Python
  • solid software design and testing fundamentals
  • Proficiency with SQL
  • hands-on Spark/PySpark experience
  • Knowledge of ML fundamentals—probability & statistics, supervised vs. unsupervised learning, bias/variance & regularization, feature engineering, model evaluation metrics, validation strategies, and production concerns like drift, stability, and monitoring
  • Expertise with modern data/ML stacks—AWS, Databricks (workflows, lakehouse, MLflow/registry, Model Serving), and Airflow (or equivalent orchestration)
  • Experience building real-time systems (service design, caching, rate limiting, backpressure) and batch pipelines at scale
  • Practical knowledge of feature-store concepts (offline/online stores, backfills, point-in-time correctness), model registries, experiment tracking, and evaluation frameworks
  • Strong problem-solving skills and a proactive attitude toward ownership and platform health
Job Responsibility
Job Responsibility
  • Turn notebooks into software
  • Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation
  • Create developer-friendly ML abstractions
  • Build SDKs, CLIs, and templates that make it simple to define features, train/evaluate models, and deploy to batch or real-time targets with minimal boilerplate
  • Build our real-time ML inference platform
  • Stand up and scale low-latency model serving
  • Expand batch ML inference
  • Improve scheduling, parallelism, cost controls, observability, and failure/rollback for large-scale batch scoring and post-processing
  • Own and expand the feature store
  • Design offline/online feature definitions, high read/write throughput, and consistent offline/online semantics
What we offer
What we offer
  • Equity grant
  • Medical, dental & vision insurance
  • Work from home flexibility
  • Unlimited PTO
  • Commuter benefits
  • Free lunches
  • Paid parental leave
  • 401(k)
  • Employee assistance program
  • Fulltime
Read More
Arrow Right

Senior Backend Engineer, Inference Platform

Together AI is building the Inference Platform that brings the most advanced gen...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 250000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience building large-scale, fault-tolerant, distributed systems and API microservices
  • Strong background in designing, analyzing, and improving efficiency, scalability, and stability of complex systems
  • Excellent understanding of low-level OS concepts: multi-threading, memory management, networking, and storage performance
  • Expert-level programming in one or more of: Rust, Go, Python, or TypeScript
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Build and optimize global and local request routing, ensuring low-latency load balancing across data centers and model engine pods
  • Develop auto-scaling systems to dynamically allocate resources and meet strict SLOs across dozens of data centers
  • Design systems for multi-tenant traffic shaping, tuning both resource allocation and request handling — including smart rate limiting and regulation — to ensure fairness and consistent experience across all users
  • Engineer trade-offs between latency and throughput to serve diverse workloads efficiently
  • Optimize prefix caching to reduce model compute and speed up responses
  • Collaborate with ML researchers to bring new model architectures into production at scale
  • Continuously profile and analyze system-level performance to identify bottlenecks and implement optimizations
What we offer
What we offer
  • Competitive compensation
  • equity
  • health insurance
  • other competitive benefits
  • Fulltime
Read More
Arrow Right

Senior ML Engineer

We are seeking an experienced Senior ML to join our team and engage in a diverse...
Location
Location
United Kingdom , Bath
Salary
Salary:
Not provided
bmt.org Logo
BMT
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Be a UK sole national
  • Have held no other nationality at any time
  • Have continuously resided in the United Kingdom for the past five years
  • Be able to obtain and maintain full UK security clearance in accordance with government vetting standards
  • Provide satisfactory evidence of identity, nationality, and residency as part of the clearance process
  • Capability to design and implement end‑to‑end ML pipelines
  • Ability to select, train, and tune models (classical ML and deep learning) using frameworks such as PyTorch, TensorFlow, or scikit‑learn
  • Experience containerising and deploying models (e.g., Docker), implement CI/CD, monitoring, drift detection, and automated retraining on Azure/AWS/GCP as appropriate
  • Demonstrated capability to work with data engineers to ensure high‑quality datasets, versioning, lineage, and governance
  • Capable of pairing with data scientists and software engineers, review code, and share best practices
Job Responsibility
Job Responsibility
  • Design, build, and deployment of machine‑learning systems, applying robust software engineering practices and an in‑depth understanding of model behaviour, performance, and limitations
  • Select, prepare, and pipeline data for model training and inference. Implements, trains, evaluates, and optimises machine‑learning models, continually improving them through iterative experimentation and additional data
  • Create scalable and automated ML pipelines, including feature extraction, model training, validation, packaging, deployment, and monitoring
  • Design and implement dashboards, diagnostics, and evaluation tooling to ensure transparency, performance tracking, and operational reliability across the ML lifecycle
  • Within defined delivery goals, refines prototype models into production‑ready components, contributing to development, optimisation, demonstration, and integration activities
  • Apply standardised engineering and evaluation methods, producing clear technical documentation and communicating design choices, performance outcomes, and limitations
  • Contribute to internal knowledge bases and participates in professional ML engineering communities
  • Ensure responsible handling of data throughout the ML lifecycle, including secure storage, access control, data lineage, versioning, and quality checks
  • Evaluate data integrity and suitability for ML workflows, and advises on transformations, feature representation, and schemas needed for efficient training and inference
  • Implement metadata standards, reproducible data pipelines, and automated validation procedures to maintain trustworthy data assets
What we offer
What we offer
  • Private Medical (family coverage)
  • Enhanced Pension
  • 18 weeks enhanced maternity pay (after a qualifying period of 1 year)
  • Family friendly policies
  • Committed to an inclusive culture
  • Wellbeing Fund – an annual fund for personal hobbies or interests
  • 26 Days Annual Leave (plus bank holidays)
  • Holiday Trading
  • Retail Vouchers
  • Professional Subscriptions
  • Fulltime
Read More
Arrow Right

Senior AI / ML Engineer

We are seeking an experienced Senior ML to join our team and engage in a diverse...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
bmt.org Logo
BMT
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Be a UK sole national
  • Have held no other nationality at any time
  • Have continuously resided in the United Kingdom for the past five years
  • Be able to obtain and maintain full UK security clearance in accordance with government vetting standards
  • Provide satisfactory evidence of identity, nationality, and residency as part of the clearance process
  • Ability to select, train, and tune models (classical ML and deep learning) using frameworks such as PyTorch, TensorFlow, or scikit-learn
  • perform robust validation and error analysis
  • Experience containerising and deploying models (e.g., Docker), implement CI/CD, monitoring, drift detection, and automated retraining on Azure/AWS/GCP as appropriate
  • Strong engineering skills in Python (typing, testing, packaging)
  • experience with version control (Git) and code review workflows
Job Responsibility
Job Responsibility
  • Designing, building, testing, and deploying machine-learning systems, applying robust software engineering practices and an in-depth understanding of model behaviour, performance, and limitations
  • Selecting and preparing data pipelines for model training and inference
  • Implementing, training, evaluating, and optimising machine-learning models, continually improving them through iterative experimentation and additional data
  • Creating scalable and automated ML pipelines, including feature extraction, model training, validation, packaging, deployment, and monitoring
  • Applying standardised engineering and evaluation methods, producing clear technical documentation and communicating design choices, performance outcomes, and limitations
  • Evaluating data integrity and suitability for ML workflows, and advising on transformations, feature representation, and schemas needed for efficient training and inference
  • Applying engineering-focused data modelling and system design techniques to create, modify, or maintain ML-relevant data structures, feature stores, and associated components
  • Supporting alignment of data structures, model interfaces, and infrastructure components to ensure efficient and scalable ML system operation
What we offer
What we offer
  • Private Medical (family coverage)
  • Enhanced Pension
  • 18 weeks enhanced maternity pay (after a qualifying period of 1 year)
  • Family friendly policies
  • Committed to an inclusive culture
  • Wellbeing Fund – an annual fund for personal hobbies or interests
  • 26 Days Annual Leave (plus bank holidays)
  • Holiday Trading
  • Retail Vouchers
  • Professional Subscriptions
  • Fulltime
Read More
Arrow Right

Senior CVML Platform Engineer

We are seeking a Senior CVML Platform Engineer to help design, build, and evolve...
Location
Location
United States
Salary
Salary:
160000.00 - 287000.00 USD / Year
bluerivertechnology.com Logo
Blue River Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional engineering experience, with a focus on platform, infrastructure, or systems engineering
  • Strong technical judgment, balancing the evolution of legacy platforms with the design and delivery of new, greenfield components shared across multiple teams and workloads
  • Excellent Python skills, used in production systems, tooling, and platform components
  • Solid understanding of ML systems and the end-to-end model development lifecycle, from experimentation to deployment and iteration
  • Hands-on experience or strong familiarity with cloud platforms (AWS preferred) and container orchestration systems such as Kubernetes and Slurm
  • Ability to partner effectively with ML engineers, infra teams, and product stakeholders to translate requirements into platform capabilities
  • Ability to quickly ramp up on new domains, tools, and complex existing systems
Job Responsibility
Job Responsibility
  • Design, build, and evolve platform capabilities that support ML training, batch inference, and model deployment workflows at scale
  • Own and improve core platform components (e.g., compute orchestration, data pipelines, inference systems) used by multiple teams across Blue River and John Deere
  • Continuously enhance platform reliability, scalability, and performance, with a focus on real-world ML workloads
  • Enable ML engineers to move faster by building intuitive, well-documented platform tools and workflows across the model lifecycle (experimentation, deployment, and iteration)
  • Improve model inference performance and throughput while balancing trade-offs among cost, latency, and reliability
  • Support and scale distributed training and inference systems, including frameworks such as Ray and related tooling
  • Develop and optimize hybrid compute environments (cloud + on-prem/GPU infrastructure) to support large-scale ML workloads
  • Build and maintain infrastructure leveraging Kubernetes, Slurm, and cloud platforms (AWS preferred)
  • Identify and resolve bottlenecks in compute, storage, and data movement pipelines
  • Evaluate existing platform systems and make thoughtful decisions on when to extend, refactor, or rebuild components
What we offer
What we offer
  • bonus and benefit programs
  • Fulltime
Read More
Arrow Right