CrawlJobs Logo

Senior / Lead Machine Learning Engineer, Serving

Serbia · Job Posted April 23, 2026
Apply Position
Job Link Share

Requirements

  • Inference Optimization
  • Model Acceleration
  • High-Performance Systems
  • Distributed Systems & Scaling
  • Public work
  • Full-cycle ownership
  • Background
  • Professional fluency in English

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior / Lead Machine Learning Engineer, Serving

8 matching positions

Senior / Lead Machine Learning Engineer, Serving

Inworld is a product-oriented research lab of top AI researchers and engineers, ...
Location
Location
Germany
Salary
Salary:
Not provided
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
  • Model Acceleration. Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding
  • High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
  • Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
  • Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
  • Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
  • Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems
  • Professional fluency in English (written and spoken) is required, as you will be collaborating daily with our US-based leadership and engineering teams
Read More
Arrow Right

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...
Location
Location
United States; Canada
Salary
Salary:
139000.00 - 218000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, AI Platform

Location
Location
United States; Canada
Salary
Salary:
128000.00 - 171000.00 CAD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans to all eligible employees
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
  • Quarterly all-company wellness days where everyone takes a pause together
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer - Maps

The Places Data Team owns Uber's "Ground Truth" — the definitive dataset of POIs...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D., M.S. or Bachelor's degree in Computer Science, Machine Learning, or Operations Research, or equivalent technical background with exceptional demonstrated impact
  • 4+ years of experience in developing and deploying machine learning models and optimization algorithms in large-scale production environments, delivering measurable business impact over multiple quarters and making significant technical contributions
  • Proficiency in programming languages such as Python, Scala, Java, or Go
  • Experience with large-scale data systems (e.g. Spark, Ray), real-time processing (e.g. Flink), and microservices architectures
  • Experience in the development, training, productionization and monitoring of ML solutions at scale, ranging from offline pipelines to online serving and MLOps
Job Responsibility
Job Responsibility
  • Design, develop and productionize end-to-end ML solutions for places data conflation (POI, addresses, BFP, etc.) and attribute inference using a mix of classical ML, deep learning, and generative AI
  • Collaborate with product, science, and engineering teams to execute on the technical vision and roadmap
  • Conduct rigorous experimentation and A/B testing to validate model performance and iterate on improvements
  • Own projects from initial mathematical formulation through to prototyping, algorithm implementation, and large-scale experimentation in production
  • Raise the technical bar for the team. You will mentor L3/L4 engineers, lead complex code reviews, and foster a culture of engineering excellence and scientific rigor
Read More
Arrow Right

Senior Machine Learning Engineer

AI has created an unprecedented opportunity to make work better for hundreds of ...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python with experience in at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead the design and architecture of ML solutions for projects/sub-systems
  • Select appropriate models, training regimes, and serving approaches
  • Produce maintainable, efficient, and explainable ML code
  • Drive monitoring for model drift, bias/fairness, and reliability
  • Mentor early-in-profession engineers, provide design/code reviews, and raise quality standards
  • Partner with other teams to ensure integrated ML systems are production-ready
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Computer Vision - Robotics

Scale’s Robotics business unit is dedicated to solving the data bottleneck in Ph...
Location
Location
United States , San Francisco
Salary
Salary:
218400.00 - 273000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ph.D. in Computer Science, Computer Engineering, or a related quantitative field (Mathematics, Electrical Engineering, etc.) OR a Master’s degree with 4+ years of equivalent professional experience in an applied research setting
  • 5+ years of hands-on experience in algorithm development for 2D/3D computer vision and deep learning
  • Expert proficiency in at least one major deep learning framework (PyTorch, TensorFlow or Jax)
  • Mastery of Python for machine learning and strong proficiency in C++ for performance-critical algorithm implementation
  • In-depth knowledge of classical and modern computer vision fundamentals, including multi-view geometry, projective geometry, camera calibration, and 3D graphics/rendering principles
  • Building real-time and batch ML systems that analyze structured and unstructured signals
  • Hands-on experience rapidly prototyping and iterating on ML systems with changing requirements
Job Responsibility
Job Responsibility
  • Pioneer Core CV Algorithms: Lead the research, design, and implementation of novel computer vision and deep learning algorithms, with a specialized focus on 2D and 3D data (e.g point clouds)
  • Focus Area Expertise: Drive innovation in key perception areas, including: 3D Reconstruction and SLAM: Advanced techniques for real-time 3D mapping, pose estimation, and environmental modeling from multi-modal sensor inputs (e.g., RGB-D, LiDAR). Hand/Body Tracking: Developing robust and precise models for hand pose estimation, gesture recognition, and full-body tracking under various lighting and occlusion conditions. Object Detection and Tracking (MOT/SOT): Designing high-performance deep learning models for accurate detection and persistent tracking of objects and people in video streams. Video Processing: Creating algorithms for temporal feature extraction, video-based action recognition, and motion analysis
  • Model Optimization: Optimize computationally intensive models for deployment on edge devices (low power, low latency) and/or large-scale cloud infrastructure
  • Technical Leadership: Serve as the subject matter expert in Computer Vision, providing technical direction and mentorship to junior engineers and cross-functional teams
  • Publication & IP: Maintain state-of-the-art knowledge, evaluate recent academic publications (e.g., CVPR, ICCV, ECCV), and drive the filing of patents and publication of novel research
  • Cross-Functional Partnering: Collaborate closely with Software Engineering, Product, and Hardware teams to define requirements, integrate vision systems, and ensure solutions meet performance targets
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • may be eligible for additional benefits such as a commuter stipend
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

As a Senior Machine Learning Engineer, you will take end-to-end ownership of the...
Location
Location
Canada
Salary
Salary:
128000.00 - 160000.00 CAD / Year
freshbooks.com Logo
FreshBooks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in data science, applied ML, or ML engineering roles
  • Strong background in supervised and unsupervised learning, statistical modeling, and experimentation techniques
  • Proven experience developing and shipping ML models in production environments (batch or real-time)
  • Strong Python and SQL skills
  • comfort working with structured and unstructured data
  • Hands-on experience building and deploying ML or LLM-based systems (e.g. retrieval-augmented generation, embeddings, prompt tuning)
  • Familiarity with cloud infrastructure and ML tools, ideally on Google Cloud Platform (e.g. Vertex AI, BigQuery, Cloud Composer, Kubernetes)
  • Experience working with CI/CD pipelines, containerization (Docker), and job orchestration tools (Airflow, dbt, etc.)
  • Deep understanding of end-to-end ML operations including model observability, model drift detection, and model performance optimization
  • Strong communication skills and ability to explain technical concepts to non-technical stakeholders
Job Responsibility
Job Responsibility
  • Design, prototype, and validate machine learning models to power product features or internal tools
  • Own and lead all phases of the ML lifecycle from experimentation through to production deployment and model monitoring
  • Collaborate with Data Engineers and Product Engineers to integrate models into production infrastructure (batch and online serving)
  • Develop and prototype features for the shared feature store, including documentation, versioning, and consistency validation
  • Author high-quality, production-ready code with appropriate tests, observability, and monitoring hooks
  • Design experiments (e.g. A/B tests, pre-post analyses) and interpret results to guide product and business decisions
  • Design and build end-to-end pipelines for classification, ranking, embeddings, or generation tasks
  • Drive reliability practices in deployed models, including retraining logic, alerting on drift, and root cause analysis
  • Work closely with product and engineering stakeholders to align ML work with business priorities
  • Contribute to standards and documentation, mentor junior team members, and help shape our evolving ML platform
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

We are seeking a Senior Machine Learning Engineer to bridge the gap between adva...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing
  • Minimum 4+ years of experience in Machine Learning, with a mandatory split focus between Model Architecture and Systems Optimization
  • Proven experience building and shipping Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct)
  • Must have experience creating custom evaluation sets for tasks like Document Understanding
  • Expert-level knowledge of SGLang and vLLM for optimized serving
  • Demonstrable experience optimizing models for both NVIDIA (H100) and AMD (MI300x) accelerators
  • Hands-on experience with Knowledge Distillation and Pruning to reduce model latency for target serving sizes
  • A track record of taking complex multi-modal models from research code to a deployed, user-facing production product
Job Responsibility
Job Responsibility
  • Continuously evaluate and implement the latest research trends in Vision-Language Models, specifically focusing on Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA)
  • Design and build massive-scale training and evaluation datasets, ensuring multilingual compatibility and broad visual understanding for European market requirements
  • Lead the model co-design process, creating architectures that are natively optimized for accelerator capabilities (compute-bound vs. memory-bound operations)
  • Architect high-throughput serving layers using SGLang and vLLM, optimizing for non-standard decoding strategies
  • Implement scientific experiments to find the Pareto-optimal frontier between serving latency and generation quality
  • Execute Knowledge Distillation (KD), unstructured pruning, and quantization techniques to fit large-scale VLM architectures onto single-node GPU setups (specifically H100 or MI300x) without compromising model quality
  • Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency, identifying bottlenecks at the operator level
  • Manage the full pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines
  • Take ownership of landing the serving-efficient model in a production environment, ensuring reliability and scalability
  • Fulltime
Read More
Arrow Right