CrawlJobs Logo

Senior Machine Learning Systems Engineer

United States, San Francisco 221000.00 - 260000.00 USD / Year · Job Posted January 20, 2026
Apply Position
Job Link Share

Job Description

As a Senior Machine Learning Systems Engineer at Abridge, you’ll play a pivotal role in building and optimizing the core infrastructure that powers our machine learning models. Your work will be instrumental in enhancing the scalability, efficiency, and performance of our AI-driven solutions. You will work with our Infrastructure and Research teams to build, deploy, optimize and orchestrate across our AI models.

Job Responsibility

  • Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training
  • Develop, optimize, and maintain ML model serving and training infrastructure, ensuring high-performance and low-latency
  • Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughout optimization, and compute efficiency
  • Optimize compute-heavy workflows and enhance GPU utilization for ML workloads
  • Build a robust model API orchestration system
  • Collaborate with leadership to define and implement strategies for scaling infrastructure as the company grows, ensuring long-term efficiency and performance

Requirements

  • Strong experience in building and deploying machine learning models in production environments
  • Deep understanding of container orchestration and distributed systems architecture
  • Expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management
  • Experience developing APIs and managing distributed systems for both batch and real-time workloads
  • Excellent communication skills, with the ability to interface between research and product engineering

Nice to have

  • Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT-LLM and so on
  • Expertise with ML toolchains such as PyTorch, Tensorflow or distributed training and inference libraries
  • Familiarity with GPU cluster management and CUDA optimization
  • Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices
  • Experience with container registries, image optimization, and multi-stage builds for ML workloads
  • Experience orchestrating across ASR models or LLM models for building various GenAI applications

What we offer

  • Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
  • Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
  • Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
  • Paid Parental Leave: Generous paid parental leave for all full-time employees
  • Family Forming Benefits: Resources and financial support to help you build your family
  • 401(k) Matching: Contribution matching to help invest in your future
  • Personal Device Allowance: Tax free funds for personal device usage
  • Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
  • Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
  • Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals
  • Sabbatical Leave: Paid Sabbatical Leave after 5 years of employment
  • Compensation and Equity: Competitive compensation and equity grants for full time employees

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Machine Learning Systems Engineer

8 matching positions

Senior Machine Learning Systems Engineer

Our organization drives AI innovation across Jira products. We deliver seamless ...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience building Machine Learning and AI solutions (4+ years)
  • Proven experience developing, deploying, and maintaining end-to-end ML systems, including data engineering, model serving, and monitoring
  • Expert proficiency with GenAI frameworks and tools, including developing and fine-tuning large language models (LLMs) and building retrieval-augmented generation (RAG) systems
  • Expert proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX
  • Experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models
Job Responsibility
Job Responsibility
  • Collaborate with software engineers, data scientists, and product managers to solve complex problems
  • Lead projects from technical design through launch
  • Partner with teams to achieve impactful results
  • Deliver robust ML solutions to build AI features reaching millions
  • This includes curating ML datasets, fine-tuning open-source LLMs, or accessing proprietary LLMs
  • Mentor junior members of the team
What we offer
What we offer
  • Health and wellbeing resources
  • Paid volunteer days
Read More
Arrow Right

Senior Machine Learning Engineer, Reinforcement Learning

About the Role: We are looking for a Senior Machine Learning Engineer with stron...
Location
Location
United States , Beverly Hills, CA
Salary
Salary:
150000.00 - 185000.00 USD / Year
snail.com Logo
Snail Games
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong foundation in machine learning, with hands-on experience building, training, and iterating on applied ML systems
  • Professional or substantial project experience with reinforcement learning, agent-based systems, sequential decision-making, or closely related areas
  • Strong Python skills and experience with modern ML frameworks such as PyTorch
  • Experience designing experiments, evaluating model behavior, and improving results through systematic iteration
  • T-shaped capability: deep machine learning expertise plus practical range across one or more adjacent areas such as simulation, evaluation, model integration, systems collaboration, or robotics-adjacent machine learning
  • Strong problem-solving ability, sound judgment, and comfort working in ambiguous, fast-changing environments
  • Respectful, low-ego collaborative style and willingness to work beyond a narrow specialty when the work requires it
Job Responsibility
Job Responsibility
  • Design, train, and iterate on machine learning models for intelligent agents and decision-making systems, with an emphasis on reinforcement learning and related approaches
  • Define and refine state representations, action spaces, reward structures, and evaluation criteria to improve agent behavior
  • Build and improve practical experimentation and training workflows, including data generation, experiment tracking, and reproducibility
  • Analyze results, debug model behavior, and make pragmatic tradeoffs between model performance, iteration speed, and system complexity
  • Work closely with engineers and other partners to help integrate successful ML work into usable product systems
  • Contribute thoughtful technical input on next-step experiments, tooling, and ML direction as Egofold continues to evolve
What we offer
What we offer
  • True focus on work/life balance
  • Paid company holidays, vacation, and separate sick leave
  • Medical, dental, vision, and Life/LTD
  • 401k with company match
  • Fulltime
Read More
Arrow Right
New

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...
Location
Location
United States; Canada
Salary
Salary:
139000.00 - 218000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right
New

Senior Machine Learning Engineer

IT AND R&D REMOTE - Senior Machine Learning Engineer - RTB House is a global com...
Location
Location
Poland
Salary
Salary:
Not provided
rtbhouse.com Logo
RTB House
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in designing and implementing complex IT systems
  • Ability to develop user-friendly, versatile tools
  • Proficiency in at least one programming language, such as Python, C++, Java, or Scala, along with expertise in Linux
  • Strong skills in evaluating and optimizing system performance, from initial design through to production troubleshooting
  • Deep understanding of algorithms and data structures
  • Initiative and creativity to improve existing solutions
  • Ability to work effectively both within and across teams
  • C1 level in Polish
Job Responsibility
Job Responsibility
  • Developing and maintaining the ML training platform and the bidding infrastructure that evaluates ML models in the production environment
  • Identifying performance bottlenecks and optimizing critical, low-level parts of the system
  • Ensuring the reliability and scalability of implementations, and creating performance and correctness tests for new system components
  • Testing and benchmarking open-source Big Data and ML technologies to assess their suitability for the production environment
What we offer
What we offer
  • Access to the latest technologies, with the opportunity to apply them in a large-scale and fast-paced project
  • Opportunity to cooperate with a team of enthusiasts experienced in Machine Learning, Big Data, and distributed systems
  • Flexible cooperate hours, with the possibility of remote cooperate or cooperate from our office in Warsaw
  • An opportunity to apply your expertise in optimizing algorithms that support hundreds of millions of internet users and billions of ad views per month within the RTB model
  • The ability to see the immediate impact of your cooperate on the company's business outcomes
  • The possibility of publishing your results
  • Fulltime
Read More
Arrow Right
New

Senior Machine Learning Engineer – AV Labs

Uber is launching AV Labs to accelerate the autonomous technology ecosystem. We'...
Location
Location
United States , Sunnyvale
Salary
Salary:
202000.00 - 224000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of working experience in the ML/Robotics industry
  • Bachelor's degree in Computer Science, Computer Engineering, or related fields
  • Proficient in Python and Linux environments
  • Familiar with modern AI/ML frameworks (e.g., PyTorch).
Job Responsibility
Job Responsibility
  • Algorithm Development: Lead the development of autonomy algorithms and foundation models that extract high-fidelity semantic meaning from complex urban edge cases to enrich our L4 data lake
  • Systems Architecture Design: Architect scalable ML systems, including management of upstream sensor dependencies
  • Technical Leadership: Partner with fellow engineers to architect, design, and build scalable solutions for ML technology that can stand the test of scale and availability
  • Dataset Optimization: Deliver high-quality datasets to accelerate ML technologies through advanced sensor data collection, processing, and auto-labeling
  • Cross-Functional Collaboration: Partner with platform, product, and security engineering teams to enable the successful deployment of the latest machine learning techniques into production.
What we offer
What we offer
  • Bonus program
  • Equity award & other types of comp
  • 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Search Assistant

Roku is changing how the world watches TV. Roku is the #1 TV streaming platform ...
Location
Location
United States , San Jose
Salary
Salary:
361300.00 - 510000.00 USD / Year
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of industry experience (or PhD with 5+ years) applying ML at scale in search, recommendation, ads, personalization, or related domains
  • Strong expertise in ranking systems, recommendation systems, retrieval, personalization, and multi-objective optimization
  • Experience building large-scale ML systems leveraging deep learning, sequence models, LLMs, reinforcement learning, or bandit frameworks
  • Strong product intuition and experience optimizing user engagement, retention, and monetization simultaneously
  • Proficiency in Python, Java, or Scala
  • Experience with distributed systems and ML infrastructure such as Spark, Airflow, streaming systems, feature stores, and cloud platforms
  • Strong technical leadership, system design, communication, and problem-solving skills
  • MS or PhD in Computer Science, Statistics, or a related field
Job Responsibility
Job Responsibility
  • Lead the technical vision and roadmap for ranking, personalization, and recommendation systems powering Roku’s entertainment assistant
  • Develop and deploy state-of-the-art ML models using deep learning, transformers, LLMs, bandits, reinforcement learning, and causal inference techniques
  • Build multi-objective optimization systems balancing engagement, retention, relevance, and monetization goals
  • Drive innovation in conversational discovery, contextual recommendations, and personalized content experiences across the platform
  • Design, run, and analyze online A/B experiments tied to key product and business KPIs
  • Architect scalable ML systems, feature platforms, and data pipelines supporting rapid experimentation and long-term growth
  • Mentor engineers and provide technical leadership across cross-functional initiatives involving engineering, product, UX, and analytics teams
What we offer
What we offer
  • Health insurance
  • Equity awards
  • Life insurance
  • Disability benefits
  • Parental leave
  • Wellness benefits
  • Paid time off
  • Global access to mental health and financial wellness support and resources
  • Healthcare (medical, dental, and vision)
  • Life, accident, disability, commuter, and retirement options (401(k)/pension)
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Shopping AI

As the engine behind Zillow Group's mission to build a seamless digital real est...
Location
Location
United States
Salary
Salary:
163200.00 - 274300.00 USD / Year
Zillow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in developing applications in search, personalized ranking, or recommender systems
  • Experience developing and deploying ML models that scale to high-traffic, latency sensitive customer-facing services (100s of millions of requests per day)
  • Strong programming skills in a high-level language such as Python or Java
  • Familiarity with common machine learning libraries like PyTorch, TensorFlow, Catboost, scikit-learn and huggingface (repository)
  • Expertise with large scale distributed data processing systems such as Hive, Spark, Airflow, or Databricks
  • Experience owning the full lifecycle of customer facing machine learning models, from offline experimentation and prototyping to online deployment, A/B testing, and performance monitoring
Job Responsibility
Job Responsibility
  • Design, build, and ship production new machine learning models that power core product features on the Zillow app, website, and email/push notifications
  • Re-architect our core home ranking and recommendation systems to support advanced neural networks and dramatically accelerate the pace of experimentation across surfaces
  • Own the full lifecycle of your models, from offline experimentation and prototyping with massive datasets to online deployment, A/B testing, and performance monitoring
  • Pioneer the application of cutting-edge deep learning and large language models (LLMs) to improve our home shopping experience
  • Develop new AI components that optimize how we display and when we recommend homes, ensuring we connect shoppers with the right content on the right properties at the right time
  • Collaborate in a cross-functional group of engineers, applied scientists, product managers, and designers to define, execute, and iterate on the team's strategic roadmap
  • Contribute to the team's engineering excellence by improving our machine learning infrastructure, development standards, and shared tooling
  • Act as a key technical voice, mentoring other engineers and helping to shape the long-term vision for artificial intelligence in the home shopping experience
What we offer
What we offer
  • Equity awards
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer - ML Training Infrastructure

We are seeking an experienced, technical oriented, impact delivering-driven expe...
Location
Location
United States , Mountain View
Salary
Salary:
170000.00 - 240000.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelors degree or higher in Computer Science or equivalent major OR equivalent relevant experience
  • 3+ years professional software engineering experience
  • 2+ years specialized experience in AI/ML infrastructure, e.g., enabling distributed training for scaling large ML models
  • Strong programming skills in Python, with proficiency in frameworks such as, PyTorch (preferred), TensorFlow, or similar
  • Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure)
  • Willingness to travel to Sunnyvale, CA as needed
  • Comfortable working in highly ambiguous and dynamic environments
Job Responsibility
Job Responsibility
  • Design and development of scalable, reliable, high-performance ML framework to support model training at scale
  • Model training performance analysis and optimization solutions to scale distributed training workflows and maximize resource utilization across heterogeneous hardware environments, and save cost
  • Raise the bar on system observability, debuggability, and operational excellence, and user experience
  • Collaborate with cross-functional teams to integrate new features and technologies into the platform
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right