CrawlJobs Logo

Senior ML Infrastructure Engineer, Inference Platform

gm.com Logo

General Motors

Location Icon

Location:
United States , Austin, Texas

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

155420.00 - 395900.00 USD / Year

Job Description:

About the Team: The ML Inference Platform is part of the AV ML Infrastructure organization. Our team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AI efforts. We’re proud to serve teams developing autonomous vehicles (L3/L4/L5), as well as other groups building AI-driven products for GM and its customers. We enable rapid innovation and feature development by optimizing for high-priority, ML-centric use cases. Our platform supports the serving of state-of-the-art (SOTA) machine learning models for experimental, online and bulk inference, with a focus on performance, availability, concurrency, and scalability. We’re committed to maximizing GPU utilization across platforms (B200, H100, A100, and more) while maintaining reliability and cost efficiency. About the Role: We are seeking a Senior ML Infrastructure engineer to help build and scale robust platforms for ML Inference workflows. In this role, you’ll work closely with ML engineers and researchers to ensure efficient model serving and inference in production, for workflows such as data mining, labeling, model distillation, evaluations, simulations and more. This is a high-impact opportunity to influence the future of AI infrastructure at GM. You will play a key role in shaping the architecture, roadmap and user-experience of a robust ML inference service supporting real-time, batch, and experimental inference needs. The ideal candidate brings experience in designing distributed systems for ML, strong problem-solving skills, and a product mindset focused on platform usability and reliability.

Job Responsibility:

  • Design and implement core platform backend software components
  • Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
  • Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms for highly optimized use of accelerators
  • Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
  • Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
  • Lead technical initiatives across GM’s ML ecosystem
  • Raise the engineering bar through technical leadership, establishing best practices
  • Contribute to open source projects
  • represent GM in relevant communities

Requirements:

  • 5+ years of industry experience, with focus on machine learning systems or high performance backend services
  • Expertise in either Python, C++ or other relevant coding languages
  • Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
  • Strong communication skills and a proven ability to drive cross-functional initiatives
  • Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities

Nice to have:

  • Deep expertise building zero-to-one ML infrastructure platforms
  • Experience working with or designing interfaces, apis and clients for ML workflows
  • Experience with Ray framework, and/or vLLM
  • Experience with distributed systems, and handling large-scale data processing
  • Familiarity with telemetry, and other feedback loops to inform product improvements
  • Familiarity with hardware acceleration (GPUs) and optimizations for inference workloads
What we offer:
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts
  • relocation benefits

Additional Information:

Job Posted:
March 03, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Infrastructure Engineer, Inference Platform

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Infrastructure Engineer

As a Senior ML Infrastructure Engineer at Plus, you will design scalable archite...
Location
Location
United States , Santa Clara
Salary
Salary:
160000.00 - 200000.00 USD / Year
plus.ai Logo
PlusAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Phd or MS in Computer Science, Electrical Engineering, or related field
  • Good oral and written communication skills
  • Phd new grad or Masters with 3+ years of software engineering experience with a focus on ML infrastructure or distributed systems
  • Proficiency in in Python, C++, SQL
  • Deep understanding of containerization, orchestration technologies, distributed ML workload, and experiment tracking tools (e.g., Docker, Kubernetes, multiprocessing, Kubeflow, and mlflow)
  • Deploy and manage resources across multiple cloud platforms (AWS, GCP, or on-prem environments)
  • Proficiency in at least one deep learning framework, such as PyTorch and data pipeline tools (e.g., Apache Airflow, Prefect)
  • Strong knowledge of distributed systems, databases, and storage solutions
  • Extensive software design and development skills
  • Ability to learn and adapt to new technologies and contribute in a productive environment
Job Responsibility
Job Responsibility
  • Design and develop scalable, high-performance systems for training, inference, deploying, and monitoring ML models at scale
  • Build and maintain efficient data pipelines, model versioning systems, and experiment tracking frameworks
  • Collaborate with cross-functional teams, including ML researchers and engineers, to identify bottlenecks and improve platform usability
  • Implement distributed systems and storage solutions optimized for machine learning workloadsDrive improvements in CI/CD workflows for ML models and infrastructure
  • Ensure high availability and reliability of the ML platform by implementing robust monitoring, logging, and alerting systems
  • Stay current with industry trends and integrate relevant tools and frameworks to enhance the platform
  • Mentor junior engineers and contribute to a culture of technical excellence
  • Ensure that your work is performed in accordance with the company’s Quality Management System (QMS) requirements and contribute to continuous improvement efforts
  • Ensure team compliance with QMS, monitor quality, and drive process improvements
What we offer
What we offer
  • Work, learn and grow in a highly future-oriented, innovative and dynamic field
  • Wide range of opportunities for personal and professional development
  • Catered free lunch, unlimited snacks and beverages
  • Highly competitive salary and benefits package, including 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Mlops Engineer

Location
Location
Japan , Tokyo
Salary
Salary:
9000000.00 - 15000000.00 JPY / Year
https://www.randstad.com Logo
Randstad
Expiration Date
March 24, 2026
Flip Icon
Requirements
Requirements
  • Must have 5+ years of experience in MLOps, DevOps, or related fields
  • Must have proven experience with tools like Hugging Face, MLFlow, and containerization technologies (Docker, Kubernetes)
  • Must have strong experience with cloud platforms (AWS, Azure, GCP) and infrastructure as code (Terraform)
  • Must have hands-on experience in reducing inference latency and optimizing AI infrastructure
  • Must have proficiency in Python, with experience in ML libraries such as TensorFlow, PyTorch, and related frameworks
  • Must have expertise in CI/CD pipelines, version control (Git), and orchestration tools
  • Must have familiarity with Generative AI, prompt engineering, and deploying models at scale
  • Must have excellent problem-solving skills with the ability to tackle complex challenges independently
What we offer
What we offer
  • Health insurance
  • Employee pension insurance
  • Unemployment insurance
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Join the Affirm team as a Senior Staff Machine Learning Engineer and become a pi...
Location
Location
United States
Salary
Salary:
232000.00 - 310000.00 USD / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience researching, designing, deploying, and operating large-scale, real-time machine learning systems
  • Experience leading end-to-end ML system design, from data architecture and feature pipelines to model training, evaluation, and production deployment
  • Proficient in Python and ML frameworks, including PyTorch and XGBoost
  • Strong understanding of representation learning and embedding-based modeling
  • Deep expertise in neural network-based sequence modeling, including architectures such as Transformers, recurrent, or attention-based models, and multi-task learning systems
  • Deep hands-on experience with large-scale distributed ML infrastructure, including streaming or batch data ingestion, feature stores, feature engineering, training pipelines, model serving and inference infrastructure, monitoring, and automated retraining
  • Strong technical leadership: defining long-term strategy, guiding research direction, and aligning work across teams
  • Exceptional judgment, collaboration, and communication skills
  • Strong verbal and written communication skills that support effective collaboration across our global engineering organization
  • Equivalent practical experience or a Bachelor’s degree in a related field
Job Responsibility
Job Responsibility
  • Define and drive multi-year, multi-team technical strategy for machine learning across Affirm
  • Lead the design, implementation, and scaling of advanced ML systems
  • Partner deeply with ML Platform, product, engineering, and risk leadership to shape long-term modeling capabilities
  • Provide broad technical leadership across the ML organization, mentoring senior engineers
  • Drive clarity and alignment on ambiguous, high-stakes technical decisions
  • Champion operational and system excellence at the area level
What we offer
What we offer
  • Equity rewards
  • Monthly stipends for health, wellness and tech spending
  • 100% subsidized medical coverage, dental and vision for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Competitive vacation and holiday schedules
  • Employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Join the Affirm team as a Senior Staff Machine Learning Engineer and become a pi...
Location
Location
Canada
Salary
Salary:
206000.00 - 256000.00 CAD / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience researching, designing, deploying, and operating large-scale, real-time machine learning systems
  • Experience leading end-to-end ML system design, from data architecture and feature pipelines to model training, evaluation, and production deployment
  • Proficiency in Python and ML frameworks, including PyTorch and XGBoost
  • Experience with ML tooling for training orchestration, experimentation, and model monitoring, such as Kubeflow, MLflow, or equivalent
  • Strong understanding of representation learning and embedding-based modeling
  • Deep expertise in neural network-based sequence modeling, including architectures such as Transformers, recurrent, or attention-based models, and multi-task learning systems
  • Deep hands-on experience with large-scale distributed ML infrastructure, including streaming or batch data ingestion, feature stores, feature engineering, training pipelines, model serving and inference infrastructure, monitoring, and automated retraining
  • Strong technical leadership: defining long-term strategy, guiding research direction, and aligning work across teams
  • Exceptional judgment, collaboration, and communication skills
  • Strong verbal and written communication skills
Job Responsibility
Job Responsibility
  • Define and drive multi-year, multi-team technical strategy for machine learning across Affirm
  • Lead the design, implementation, and scaling of advanced ML systems
  • Partner deeply with ML Platform, product, engineering, and risk leadership to shape long-term modeling capabilities
  • Provide broad technical leadership across the ML organization
  • Drive clarity and alignment on ambiguous, high-stakes technical decisions
  • Champion operational and system excellence at the area level
What we offer
What we offer
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Personalization and Recommendations

As a Senior Machine Learning Engineer on the Personalization & Recommendations t...
Location
Location
United States , San Francisco
Salary
Salary:
183360.00 - 248000.00 USD / Year
edtechjobs.io Logo
EdTech Jobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in applied machine learning or ML-heavy software engineering, with a strong focus on personalization, ranking, or recommendation systems
  • Demonstrated impact improving key metrics such as CTR, retention, or engagement through recommender or search systems in production
  • Strong hands-on skills in Python and PyTorch, with expertise in data and feature engineering, distributed training and inference on GPUs, and familiarity with modern MLOps practices — including model registries, feature stores, monitoring, and drift detection
  • Deep understanding of retrieval and ranking architectures, such as Two-Tower models, deep cross networks, Transformers, or MMoE, and the ability to apply them to real-world problems
  • Experience with large-scale embedding models and vector search, including FAISS, ScaNN, or similar systems
  • Proficiency in experiment design and evaluation, connecting offline metrics (AUC, NDCG, calibration) with online A/B test outcomes to drive product decisions
  • Clear, effective communication, collaborating well with product managers, data scientists, engineers, and cross-functional partners
  • A growth and mentorship mindset, helping elevate team quality in modeling, experimentation, and reliability
  • Commitment to responsible and inclusive personalization, ensuring our systems respect learner privacy, fairness, and diverse goals
Job Responsibility
Job Responsibility
  • Design and implement personalization models across candidate retrieval, ranking, and post-ranking layers, leveraging user embeddings, contextual signals and content features
  • Develop scalable retrieval and serving systems using architectures such as Two-Tower models, deep ranking networks, and ANN-based vector search for real-time personalization
  • Build and maintain model training, evaluation, and deployment pipelines, ensuring reliability, training–serving consistency, observability, and robust monitoring
  • Partner with Product and Data Science to translate learner objectives (engagement, retention, mastery) into measurable modeling goals and experiment designs
  • Advance evaluation methodologies, contributing to offline metric design (e.g., NDCG, CTR, calibration) and supporting rigorous A/B testing to measure learner and business impact
  • Collaborate with platform and infrastructure teams to optimize distributed training, inference latency, and serving cost in production environments
  • Stay informed on industry and research trends, evaluating opportunities to meaningfully apply them within Quizlet’s ecosystem
  • Mentor junior and mid-level engineers, supporting technical growth, experimentation rigor, and responsible ML practices
  • Champion collaboration, inclusion, curiosity, and data-driven problem solving, contributing to a healthy and productive team culture
What we offer
What we offer
  • 20 vacation days
  • Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
  • Employer-sponsored 401k plan with company match
  • Access to LinkedIn Learning and other resources to support professional growth
  • Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
  • 40 hours of annual paid time off to participate in volunteer programs of choice
  • Fulltime
Read More
Arrow Right