CrawlJobs Logo

ML Engineer - Inference Serving

lumalabs.ai Logo

Luma AI

Location Icon

Location:
United States; United Kingdom , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

187500.00 - 395000.00 USD / Year

Job Description:

Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. We are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to affect change. We know we are not going to reach our goal with reliable & scalable infrastructure, which is going to become the differentiating factor between success and failure.

Job Responsibility:

  • Ship new model architectures by integrating them into our inference engine
  • Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
  • Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
  • Automate, test and maintain our inference services to ensure maximum uptime and reliability
  • Optimize deployment workflows to scale across thousands of machines
  • Manage and optimize our inference workloads across different clusters & hardware providers
  • Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
  • Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

Requirements:

  • Strong Python and system architecture skills
  • Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
  • Experience with queues, scheduling, traffic-control, fleet management at scale
  • Experience with Linux, Docker, and Kubernetes
  • Python
  • Redis
  • S3-compatible Storage
  • Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
  • Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

Nice to have:

  • Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink)
  • Experience with high performance large scale ML systems (>100 GPUs)
  • Experience with FFmpeg and multimedia processing
  • CUDA
  • FFmpeg

Additional Information:

Job Posted:
January 22, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for ML Engineer - Inference Serving

ML Engineer (Production-focused)

We are looking for an ML Engineer with hands-on experience bringing models into ...
Location
Location
France , Paris
Salary
Salary:
Not provided
corsearch.com Logo
Corsearch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience as an ML Engineer delivering models into production
  • Strong programming skills in Python (production-level)
  • Hands-on experience with PyTorch (preferred) or TensorFlow
  • Proven experience in model deployment / model serving
  • Experience optimizing inference (latency, resource usage, throughput)
  • Strong understanding of ML pipelines and automated workflows
  • Experience with Docker and containerized ML workloads
  • Ability to demonstrate measurable impact (e.g., uplift, precision improvements, latency reduction, stability gains)
  • Fluent spoken and written English
  • Located within a time zone aligned with CET (CET −2 to CET +4)
Job Responsibility
Job Responsibility
  • Build and maintain ML models for large-scale detection, classification, and automation tasks
  • Optimize inference performance (latency, throughput, memory)
  • Develop and maintain end-to-end ML pipelines: data processing, training, validation, deployment, monitoring
  • Integrate ML components into microservice-based architecture
  • Work closely with engineering teams to ensure reliability and performance in production
  • Improve tooling for model versioning, testing, and CI/CD
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

ML Platform Engineer

We are seeking a Machine Learning Engineer to help build and scale our machine l...
Location
Location
United States
Salary
Salary:
Not provided
duettocloud.com Logo
Duetto
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in ML engineering or a similar role building and deploying machine learning models in production
  • Strong experience with AWS ML services (SageMaker, Lambda, EMR, ECR) for training, serving, and orchestrating model workflows
  • Hands-on experience with Kubernetes (e.g., EKS) for container orchestration and job execution at scale
  • Strong proficiency in Python, with exposure to ML/DL libraries such as TensorFlow, PyTorch, scikit-learn
  • Experience working with feature stores, data pipelines, and model versioning tools (e.g., SageMaker Feature Store, Feast, MLflow)
  • Familiarity with infrastructure-as-code and deployment tools such as Terraform, GitHub Actions, or similar CI/CD systems
  • Experience with logging and monitoring stacks such as Prometheus, Grafana, CloudWatch, or similar
  • Experience working in cross-functional teams with data scientists and DevOps engineers to bring models from research to production
  • Strong communication skills and ability to operate effectively in a fast-paced, ambiguous environment with shifting priorities
Job Responsibility
Job Responsibility
  • Develop, maintain, and scale machine learning pipelines for training, validation, and batch or real-time inference across thousands of hotel-specific models
  • Build reusable components to support model training, evaluation, deployment, and monitoring within a Kubernetes- and AWS-based environment
  • Partner with data scientists to translate notebooks and prototypes into production-grade, versioned training workflows
  • Implement and maintain feature engineering workflows, integrating with custom feature pipelines and supporting services
  • Collaborate with platform and DevOps teams to manage infrastructure-as-code (Terraform), automate deployment (CI/CD), and ensure reliability and security
  • Integrate model monitoring for performance metrics, drift detection, and alerting (using tools like Prometheus, CloudWatch, or Grafana)
  • Improve retraining, rollback, and model versioning strategies across different deployment contexts
  • Support experimentation infrastructure and A/B testing integrations for ML-based products
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer (Health)

WHOOP is an advanced health and fitness wearable, on a mission to unlock human p...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science, Data Science, Applied Mathematics, or a related field. Master’s preferred
  • 5+ years of professional experience as a Machine Learning Engineer or Software Engineer with focus on ML systems
  • Proven expertise working with time series data (wearable, physiological, or high-frequency sensor data strongly preferred)
  • Experience designing and deploying ML inference systems at scale: both real-time streaming and large-scale batch pipelines
  • Strong coding skills in Python (scientific stack) and SQL, with a track record of writing clean, production-quality code
  • Strong communication skills to collaborate across engineering, research, and product teams
  • Proven experience deploying and maintaining ML systems on cloud platforms (AWS or GCP)
  • Working familiarity with MLOps best practices: model versioning, CI/CD for ML, observability, and monitoring for inference systems
  • Ability to reason about and design for performance trade-offs (latency vs. throughput vs. cost) when building ML inference systems
  • Strong understanding of backend service development (APIs and service reliability) as it applies to serving ML models at scale
Job Responsibility
Job Responsibility
  • Create, improve, and maintain production services that provide analysis for health features in collaboration with Data Scientists and MLOps Engineers
  • Collaborate with Data Engineers to improve ML data pipelines, tooling, and validation systems that support robust model performance
  • Work alongside data scientists to translate research prototypes into production ML systems optimized for scale, latency, and cost efficiency
  • Collaborate with researchers and product teams to align model development with health insights and member impact
  • Participate in on-call rotations for data science services, ensuring uptime and performance in production environments
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer

Influur is redefining how advertising works — through creators, data, and AI. Ou...
Location
Location
United States , San Francisco Bay Area
Salary
Salary:
200000.00 USD / Year
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience designing, building, and maintaining end-to-end machine learning systems in production
  • Deep understanding of ML algorithms, embeddings, retrieval systems, and evaluation methodologies
  • Strong experience with large language models (LLMs), fine-tuning, inference optimization, and agent frameworks
  • Expertise in ML infrastructure, including feature stores, vector databases, model serving, and real-time inference pipelines
  • Strong Python skills and experience with PyTorch, TensorFlow, FastAPI, NumPy, scikit-learn, and data processing frameworks
  • Experience with scalable data pipelines (batch + streaming), including tools like Spark, Kafka, or similar
  • Experience implementing ML solutions such as recommendation engines, ranking models, and personalization systems
  • Solid understanding of statistical analysis (A/B testing, experimentation, causal inference)
  • Ability to work closely with engineering teams to productionize ML models with reliability, monitoring, and CI/CD best practices
  • Writes clean, reusable, and well-documented code for ML pipelines and distributed systems
What we offer
What we offer
  • Competitive equity in a venture-backed company shaping the future of music influencer marketing
  • A seat at the table as we redefine how the most iconic record labels, artists, and brands go viral (think Bad Bunny) — with our tech, support, and strategic guidance
  • Access to elite tools, AI copilots, and a team that builds daily at top speed
  • Hybrid flexibility + top-tier health benefits
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer

Influur is redefining how advertising works — through creators, data, and AI. Ou...
Location
Location
United States , Miami
Salary
Salary:
200000.00 USD / Year
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience designing, building, and maintaining end-to-end machine learning systems in production
  • Deep understanding of ML algorithms, embeddings, retrieval systems, and evaluation methodologies
  • Strong experience with large language models (LLMs), fine-tuning, inference optimization, and agent frameworks
  • Expertise in ML infrastructure, including feature stores, vector databases, model serving, and real-time inference pipelines
  • Strong Python skills and experience with PyTorch, TensorFlow, FastAPI, NumPy, scikit-learn, and data processing frameworks
  • Experience with scalable data pipelines (batch + streaming), including tools like Spark, Kafka, or similar
  • Experience implementing ML solutions such as recommendation engines, ranking models, and personalization systems
  • Solid understanding of statistical analysis (A/B testing, experimentation, causal inference)
  • Ability to work closely with engineering teams to productionize ML models with reliability, monitoring, and CI/CD best practices
What we offer
What we offer
  • Competitive equity in a venture-backed company shaping the future of music influencer marketing
  • A seat at the table as we redefine how the most iconic record labels, artists, and brands go viral
  • Access to elite tools, AI copilots, and a team that builds daily at top speed
  • Hybrid flexibility + top-tier health benefits
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer

Influur is redefining how advertising works — through creators, data, and AI. Ou...
Location
Location
Salary
Salary:
200000.00 USD / Year
influur.com Logo
Influur
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience designing, building, and maintaining end-to-end machine learning systems in production
  • Deep understanding of ML algorithms, embeddings, retrieval systems, and evaluation methodologies
  • Strong experience with large language models (LLMs), fine-tuning, inference optimization, and agent frameworks
  • Expertise in ML infrastructure, including feature stores, vector databases, model serving, and real-time inference pipelines
  • Strong Python skills and experience with PyTorch, TensorFlow, FastAPI, NumPy, scikit-learn, and data processing frameworks
  • Experience with scalable data pipelines (batch + streaming), including tools like Spark, Kafka, or similar
  • Experience implementing ML solutions such as recommendation engines, ranking models, and personalization systems
  • Solid understanding of statistical analysis (A/B testing, experimentation, causal inference)
  • Ability to work closely with engineering teams to productionize ML models with reliability, monitoring, and CI/CD best practices
What we offer
What we offer
  • Competitive equity in a venture-backed company shaping the future of music influencer marketing
  • A seat at the table as we redefine how the most iconic record labels, artists, and brands go viral (think Bad Bunny) — with our tech, support, and strategic guidance
  • Access to elite tools, AI copilots, and a team that builds daily at top speed
  • Hybrid flexibility + top-tier health benefits
  • Fulltime
Read More
Arrow Right