CrawlJobs Logo

ML Model Serving Engineer

sesame.com Logo

Sesame

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

175000.00 - 280000.00 USD / Year

Job Description:

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice agents part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

Job Responsibility:

  • Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models
  • Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category
  • Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving
  • Work with the training team to identify opportunities to produce faster models without sacrificing quality
  • Use techniques like in-flight batching, caching, and custom kernels to speed up inference
  • Find ways to reduce model initialization times without sacrificing quality

Requirements:

  • Expert in some differentiable array computing framework, preferably PyTorch
  • Expert in optimizing machine learning models for serving reliably at high throughput, with low latency
  • Significant systems programming experience
  • ex. Experience working on high-performance server systems—you’d be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase
  • Significant performance engineering experience
  • ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code
  • Always up to date on the latest techniques for model serving optimization

Nice to have:

  • Familiarity with high-performance LLM serving
  • ex. experience with VLLM, SGlang deployment, and internals
  • Experience with a public cloud platform such as GCP, AWS, or Azure
  • Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc
  • You like to ship and have a track record of leading complex multi-month projects without assistance
  • You’re excited to learn new things and work in a multitude of roles
What we offer:
  • 401k matching
  • 100% employer-paid health, vision, and dental benefits
  • Unlimited PTO and sick time
  • Flexible spending account matching (medical FSA)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Model Serving Engineer

ML Engineer (Production-focused)

We are looking for an ML Engineer with hands-on experience bringing models into ...
Location
Location
France , Paris
Salary
Salary:
Not provided
corsearch.com Logo
Corsearch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience as an ML Engineer delivering models into production
  • Strong programming skills in Python (production-level)
  • Hands-on experience with PyTorch (preferred) or TensorFlow
  • Proven experience in model deployment / model serving
  • Experience optimizing inference (latency, resource usage, throughput)
  • Strong understanding of ML pipelines and automated workflows
  • Experience with Docker and containerized ML workloads
  • Ability to demonstrate measurable impact (e.g., uplift, precision improvements, latency reduction, stability gains)
  • Fluent spoken and written English
  • Located within a time zone aligned with CET (CET −2 to CET +4)
Job Responsibility
Job Responsibility
  • Build and maintain ML models for large-scale detection, classification, and automation tasks
  • Optimize inference performance (latency, throughput, memory)
  • Develop and maintain end-to-end ML pipelines: data processing, training, validation, deployment, monitoring
  • Integrate ML components into microservice-based architecture
  • Work closely with engineering teams to ensure reliability and performance in production
  • Improve tooling for model versioning, testing, and CI/CD
Read More
Arrow Right

ML Platform Engineer

We are seeking a Machine Learning Engineer to help build and scale our machine l...
Location
Location
United States
Salary
Salary:
Not provided
duettocloud.com Logo
Duetto
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in ML engineering or a similar role building and deploying machine learning models in production
  • Strong experience with AWS ML services (SageMaker, Lambda, EMR, ECR) for training, serving, and orchestrating model workflows
  • Hands-on experience with Kubernetes (e.g., EKS) for container orchestration and job execution at scale
  • Strong proficiency in Python, with exposure to ML/DL libraries such as TensorFlow, PyTorch, scikit-learn
  • Experience working with feature stores, data pipelines, and model versioning tools (e.g., SageMaker Feature Store, Feast, MLflow)
  • Familiarity with infrastructure-as-code and deployment tools such as Terraform, GitHub Actions, or similar CI/CD systems
  • Experience with logging and monitoring stacks such as Prometheus, Grafana, CloudWatch, or similar
  • Experience working in cross-functional teams with data scientists and DevOps engineers to bring models from research to production
  • Strong communication skills and ability to operate effectively in a fast-paced, ambiguous environment with shifting priorities
Job Responsibility
Job Responsibility
  • Develop, maintain, and scale machine learning pipelines for training, validation, and batch or real-time inference across thousands of hotel-specific models
  • Build reusable components to support model training, evaluation, deployment, and monitoring within a Kubernetes- and AWS-based environment
  • Partner with data scientists to translate notebooks and prototypes into production-grade, versioned training workflows
  • Implement and maintain feature engineering workflows, integrating with custom feature pipelines and supporting services
  • Collaborate with platform and DevOps teams to manage infrastructure-as-code (Terraform), automate deployment (CI/CD), and ensure reliability and security
  • Integrate model monitoring for performance metrics, drift detection, and alerting (using tools like Prometheus, CloudWatch, or Grafana)
  • Improve retraining, rollback, and model versioning strategies across different deployment contexts
  • Support experimentation infrastructure and A/B testing integrations for ML-based products
Read More
Arrow Right

Senior AI ML Engineer

We are seeking a highly skilled and experienced Assistant Vice President (AVP), ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Statistics, or a related quantitative field
  • Minimum of 6+ years of professional experience in Data Science, Machine Learning Engineering, or a similar role, with a strong track record of deploying ML models to production
  • Proven experience in a lead or senior technical role
  • Expert-level proficiency in Python programming, including experience with relevant data science libraries (e.g., Pandas, NumPy, Scikit-learn) and deep learning frameworks (e.g., TensorFlow, PyTorch)
  • Strong hands-on experience designing, developing, and deploying RESTful APIs using FastAPI
  • Solid understanding and practical experience with CI/CD tools and methodologies (e.g., Jenkins, GitLab CI, GitHub Actions, Azure DevOps) for MLOps
  • Experience with MLOps platforms, model monitoring, and model versioning
  • Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) for deploying and managing ML workloads
  • Proficiency in SQL and experience working with relational and/or NoSQL databases
  • Deep understanding of machine learning algorithms, statistical modeling, and data mining techniques
Job Responsibility
Job Responsibility
  • Design, develop, and implement advanced machine learning models (e.g., predictive, prescriptive, generative AI) to solve complex business problems, from initial data exploration and feature engineering to model training and evaluation
  • Lead the deployment of AI/ML models into production environments, ensuring scalability, reliability, and performance
  • Build and maintain robust, high-performance APIs (using frameworks like FastAPI) to serve machine learning models and integrate them with existing applications and systems
  • Establish and manage continuous integration and continuous deployment (CI/CD) pipelines for ML code and model deployments, promoting automation and efficiency
  • Collaborate with data engineers to ensure optimal data pipelines and data quality for model development and deployment
  • Conduct rigorous experimentation, A/B testing, and model performance monitoring to continuously improve and optimize AI/ML solutions
  • Promote and enforce best practices in software development, including clean code, unit testing, documentation, and version control
  • Mentor junior team members, contribute to technical discussions, and drive the adoption of new technologies and methodologies within the team
  • Effectively communicate complex technical concepts and model results to both technical and non-technical stakeholders.
What we offer
What we offer
  • Not explicitly stated.
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right
New

Principal Engineer - Marketplace

Principal Engineer role in the Marketplace Engineering team to lead breakthrough...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
302000.00 - 336000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Machine Learning, Operations Research, or related quantitative field OR Master’s degree with 12+ years of industry experience
  • 10+ years of experience building and deploying ML models in large-scale production environments
  • Expert-level proficiency in modern ML frameworks (TensorFlow, PyTorch, JAX) and distributed computing platforms (Spark, Ray)
  • Deep expertise across multiple areas including: Deep Learning, Causal Inference, Reinforcement Learning, Multi-objective Optimization, Algorithmic Game Theory, and Large-scale Ads Ranking/Auction Systems
  • Proven track record of leading complex ML projects from research through production with significant measurable business impact
  • Strong programming skills in Python, Java, or Go with experience building production ML systems
  • Experience with feature engineering, model serving, and ML infrastructure at scale (handling millions of predictions per second)
  • Technical leadership experience including mentoring senior engineers and driving cross-team technical initiatives
  • Advanced Deep Learning and Neural Network architectures
  • Scalable ML architecture and distributed model training
Job Responsibility
Job Responsibility
  • Lead the design and implementation of advanced ML systems for dynamic pricing algorithms serving millions of drivers across 70+ countries around the world
  • Architect real-time ML infrastructure handling 1M+ pricing decisions per second with sub-50ms latency requirements
  • Drive breakthrough research in causal ML, reinforcement learning, algorithmic game theory, and multi-objective optimization for marketplace optimization with strategic agents
  • Own end-to-end ML model lifecycle from research through production deployment and continuous optimization
  • Develop and enforce best practices in system design, ensuring data integrity, security, and optimal performance
  • Serve as a representative for the Marketplace organization to the broader internal and external technical community
  • Contribute to the eng brand for Marketplace and serve as a talent magnet to help attract and retain talent for the team
  • Stay abreast of industry trends and emerging technologies in software engineering, focused particularly on ML/AI, to enhance our systems and processes continually
  • Build scalable ML architecture and feature management systems supporting Driver Pricing and broader Marketplace teams
  • Design experimentation frameworks enabling rapid testing of pricing algorithms using A/B, Switchback, Synthetic Control, and other experimental methodologies
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible to participate in a 401(k) plan
  • Eligible for various benefits (details at provided link)
  • Fulltime
Read More
Arrow Right

AI Product Manager

We’re scaling AI and machine learning across our products, devices, and operatio...
Location
Location
United States , Boston
Salary
Salary:
121300.00 - 177900.00 USD / Year
simplisafe.com Logo
SimpliSafe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of product management experience, including significant ownership of AI/ML or data-intensive products
  • Clear track record of shipping production ML systems (not just integrating third-party AI APIs), in close partnership with data science, ML engineering, and MLOps
  • Principal-level impact: leading cross-team initiatives, shaping strategy, and influencing senior stakeholders
  • Strong understanding of core ML concepts and lifecycle: data, labeling, training/validation, evaluation metrics, deployment, monitoring, and retraining
  • ML experience with at least one of following: computer vision or sensor data, LLM-powered applications (prompting, RAG, fine-tuning, evaluation) and/or hardware or edge products (e.g., on-device models, connectivity/latency trade-offs)
  • Familiarity with modern ML infrastructure (cloud platforms, model serving, CI/CD for ML, monitoring/alerting)
  • Comfortable going deep into data, metrics, and model behavior—not just the UX layer
  • Excellent communicator who can make complex AI topics clear to diverse audiences
  • Strong alignment with our values: customer-obsessed, low ego, highly collaborative, comfortable with ambiguity, and biased toward learning and iteration.
Job Responsibility
Job Responsibility
  • Define and communicate the multi-year roadmap for key AI/ML capabilities across SimpliSafe
  • Identify and prioritize AI opportunities where models and data can materially improve safety, customer experience, or efficiency—on both devices and cloud services
  • Make build-vs-buy decisions for AI capabilities in partnership with data science and engineering
  • Partner with data scientists, ML engineers, and MLOps to design and deliver end-to-end ML solutions—from problem framing through data, training, evaluation, deployment, and monitoring
  • Work with hardware and embedded teams to shape edge AI/ML experiences (e.g., on-device detection, low-latency decisions, bandwidth-aware designs)
  • Define model-level requirements (metrics, latency, cost, guardrails) and connect them to business outcomes (e.g., false alarm reduction, detection accuracy, handle time, CSAT)
  • Translate product needs into requirements for ML platform capabilities (model serving, observability, experiment tracking, human-in-the-loop tools)
  • Lead product direction for LLM and multimodal use cases (e.g., text, vision, sensor data)
  • Decide when to use prompt engineering, RAG, fine-tuning, or traditional ML—and how to evaluate quality, safety, and hallucinations
  • Design workflows that incorporate human review and escalation where needed
What we offer
What we offer
  • A mission- and values-driven culture and a safe, inclusive environment where you can build, grow, and thrive
  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
  • Participation in our annual bonus program, equity, and other forms of compensation, in addition to a full range of medical, retirement, and lifestyle benefits.
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer - Driver Pricing & Marketplace Optimization

We’re seeking an exceptional Senior Staff ML Engineer to lead breakthrough ML in...
Location
Location
United States , Sunnyvale, California; San Francisco, California
Salary
Salary:
267000.00 - 297000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Machine Learning, Operations Research, or related quantitative field OR Master’s degree with 12+ years of industry experience
  • 10+ years of experience building and deploying ML models in large-scale production environments
  • Expert-level proficiency in modern ML frameworks (TensorFlow, PyTorch, JAX) and distributed computing platforms (Spark, Ray)
  • Deep expertise across multiple areas including: Deep Learning, Causal Inference, Reinforcement Learning, Multi-objective Optimization, Algorithmic Game Theory, and Large-scale Ads Ranking/Auction Systems
  • Proven track record of leading complex ML projects from research through production with significant measurable business impact
  • Strong programming skills in Python, Java, or Go with experience building production ML systems
  • Experience with feature engineering, model serving, and ML infrastructure at scale (handling millions of predictions per second)
  • Technical leadership experience including mentoring senior engineers and driving cross-team technical initiatives
Job Responsibility
Job Responsibility
  • Lead the design and implementation of advanced ML systems for dynamic pricing algorithms serving millions of drivers across 70+ countries around the world
  • Architect real-time ML infrastructure handling 1M+ pricing decisions per second with sub-50ms latency requirements
  • Drive breakthrough research in causal ML, reinforcement learning, algorithmic game theory, and multi-objective optimization for marketplace optimization with strategic agents
  • Own end-to-end ML model lifecycle from research through production deployment and continuous optimization
  • Build scalable ML architecture and feature management systems supporting Driver Pricing and broader Marketplace teams
  • Design experimentation frameworks enabling rapid testing of pricing algorithms using A/B, Switchback, Synthetic Control, and other experimental methodologies
  • Establish ML engineering best practices, monitoring, and operational excellence across the organization
  • Create platform abstractions that enable other ML engineers to iterate faster on pricing algorithms
  • Partner with Product, Operations, and Earner Experience teams to translate complex business requirements into ML solutions
  • Collaborate with Marketplace Engineering and Science teams to productionize cutting-edge ML research
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • Eligible for various benefits
  • Fulltime
Read More
Arrow Right