Senior / Lead Machine Learning Engineer, Serving Job at Inworld AI

Senior / Lead Machine Learning Engineer, Serving

Inworld is a product-oriented research lab of top AI researchers and engineers, ...

Location

Germany

Salary:

Not provided

Inworld AI

Expiration Date

Until further notice

Requirements

Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
Model Acceleration. Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding
High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems
Professional fluency in English (written and spoken) is required, as you will be collaborating daily with our US-based leadership and engineering teams

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...

Location

United States; Canada

Salary:

139000.00 - 218000.00 USD / Year

Mozilla

Expiration Date

Until further notice

Requirements

Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
Hands-on experience working with GPU-based workloads and accelerated computing in production settings
Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
Ability to independently scope and drive technical initiatives while balancing product and operational priorities
Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams

Job Responsibility

Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews

What we offer

Generous performance-based bonus plans
Rich medical, dental, and vision coverage
Generous retirement contributions with 100% immediate vesting
Quarterly all-company wellness days
Country specific holidays plus a day off for your birthday
One-time home office stipend
Annual professional development budget
Quarterly well-being stipend
Considerable paid parental leave
Employee referral bonus program

Fulltime

Senior Machine Learning Engineer, AI Platform

Location

United States; Canada

Salary:

128000.00 - 171000.00 CAD / Year

Mozilla

Expiration Date

Until further notice

Requirements

Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
Hands-on experience working with GPU-based workloads and accelerated computing in production settings
Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
Ability to independently scope and drive technical initiatives while balancing product and operational priorities
Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams

Job Responsibility

Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews

What we offer

Generous performance-based bonus plans to all eligible employees
Rich medical, dental, and vision coverage
Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
Quarterly all-company wellness days where everyone takes a pause together
Country specific holidays plus a day off for your birthday
One-time home office stipend
Annual professional development budget
Quarterly well-being stipend
Considerable paid parental leave
Employee referral bonus program

Fulltime

Senior Machine Learning Engineer - Maps

The Places Data Team owns Uber's "Ground Truth" — the definitive dataset of POIs...

Location

Netherlands , Amsterdam

Salary:

Not provided

Uber

Expiration Date

Until further notice

Requirements

Ph.D., M.S. or Bachelor's degree in Computer Science, Machine Learning, or Operations Research, or equivalent technical background with exceptional demonstrated impact
4+ years of experience in developing and deploying machine learning models and optimization algorithms in large-scale production environments, delivering measurable business impact over multiple quarters and making significant technical contributions
Proficiency in programming languages such as Python, Scala, Java, or Go
Experience with large-scale data systems (e.g. Spark, Ray), real-time processing (e.g. Flink), and microservices architectures
Experience in the development, training, productionization and monitoring of ML solutions at scale, ranging from offline pipelines to online serving and MLOps

Job Responsibility

Design, develop and productionize end-to-end ML solutions for places data conflation (POI, addresses, BFP, etc.) and attribute inference using a mix of classical ML, deep learning, and generative AI
Collaborate with product, science, and engineering teams to execute on the technical vision and roadmap
Conduct rigorous experimentation and A/B testing to validate model performance and iterate on improvements
Own projects from initial mathematical formulation through to prototyping, algorithm implementation, and large-scale experimentation in production
Raise the technical bar for the team. You will mentor L3/L4 engineers, lead complex code reviews, and foster a culture of engineering excellence and scientific rigor

Senior Machine Learning Engineer

AI has created an unprecedented opportunity to make work better for hundreds of ...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python with experience in at least one deep learning framework such as PyTorch, JAX, or TensorFlow
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Lead the design and architecture of ML solutions for projects/sub-systems
Select appropriate models, training regimes, and serving approaches
Produce maintainable, efficient, and explainable ML code
Drive monitoring for model drift, bias/fairness, and reliability
Mentor early-in-profession engineers, provide design/code reviews, and raise quality standards
Partner with other teams to ensure integrated ML systems are production-ready

Fulltime

Senior Machine Learning Engineer, Computer Vision - Robotics

Scale’s Robotics business unit is dedicated to solving the data bottleneck in Ph...

Location

United States , San Francisco

Salary:

218400.00 - 273000.00 USD / Year

Scale

Expiration Date

Until further notice

Requirements

Ph.D. in Computer Science, Computer Engineering, or a related quantitative field (Mathematics, Electrical Engineering, etc.) OR a Master’s degree with 4+ years of equivalent professional experience in an applied research setting
5+ years of hands-on experience in algorithm development for 2D/3D computer vision and deep learning
Expert proficiency in at least one major deep learning framework (PyTorch, TensorFlow or Jax)
Mastery of Python for machine learning and strong proficiency in C++ for performance-critical algorithm implementation
In-depth knowledge of classical and modern computer vision fundamentals, including multi-view geometry, projective geometry, camera calibration, and 3D graphics/rendering principles
Building real-time and batch ML systems that analyze structured and unstructured signals
Hands-on experience rapidly prototyping and iterating on ML systems with changing requirements

Job Responsibility

Pioneer Core CV Algorithms: Lead the research, design, and implementation of novel computer vision and deep learning algorithms, with a specialized focus on 2D and 3D data (e.g point clouds)
Focus Area Expertise: Drive innovation in key perception areas, including: 3D Reconstruction and SLAM: Advanced techniques for real-time 3D mapping, pose estimation, and environmental modeling from multi-modal sensor inputs (e.g., RGB-D, LiDAR). Hand/Body Tracking: Developing robust and precise models for hand pose estimation, gesture recognition, and full-body tracking under various lighting and occlusion conditions. Object Detection and Tracking (MOT/SOT): Designing high-performance deep learning models for accurate detection and persistent tracking of objects and people in video streams. Video Processing: Creating algorithms for temporal feature extraction, video-based action recognition, and motion analysis
Model Optimization: Optimize computationally intensive models for deployment on edge devices (low power, low latency) and/or large-scale cloud infrastructure
Technical Leadership: Serve as the subject matter expert in Computer Vision, providing technical direction and mentorship to junior engineers and cross-functional teams
Publication & IP: Maintain state-of-the-art knowledge, evaluate recent academic publications (e.g., CVPR, ICCV, ECCV), and drive the filing of patents and publication of novel research
Cross-Functional Partnering: Collaborate closely with Software Engineering, Product, and Hardware teams to define requirements, integrate vision systems, and ensure solutions meet performance targets

What we offer

Comprehensive health, dental and vision coverage
retirement benefits
a learning and development stipend
generous PTO
equity based compensation
may be eligible for additional benefits such as a commuter stipend

Fulltime

Senior Machine Learning Engineer

As a Senior Machine Learning Engineer, you will take end-to-end ownership of the...

Location

Canada

Salary:

128000.00 - 160000.00 CAD / Year

FreshBooks

Expiration Date

Until further notice

Requirements

5+ years of experience in data science, applied ML, or ML engineering roles
Strong background in supervised and unsupervised learning, statistical modeling, and experimentation techniques
Proven experience developing and shipping ML models in production environments (batch or real-time)
Strong Python and SQL skills
comfort working with structured and unstructured data
Hands-on experience building and deploying ML or LLM-based systems (e.g. retrieval-augmented generation, embeddings, prompt tuning)
Familiarity with cloud infrastructure and ML tools, ideally on Google Cloud Platform (e.g. Vertex AI, BigQuery, Cloud Composer, Kubernetes)
Experience working with CI/CD pipelines, containerization (Docker), and job orchestration tools (Airflow, dbt, etc.)
Deep understanding of end-to-end ML operations including model observability, model drift detection, and model performance optimization
Strong communication skills and ability to explain technical concepts to non-technical stakeholders

Job Responsibility

Design, prototype, and validate machine learning models to power product features or internal tools
Own and lead all phases of the ML lifecycle from experimentation through to production deployment and model monitoring
Collaborate with Data Engineers and Product Engineers to integrate models into production infrastructure (batch and online serving)
Develop and prototype features for the shared feature store, including documentation, versioning, and consistency validation
Author high-quality, production-ready code with appropriate tests, observability, and monitoring hooks
Design experiments (e.g. A/B tests, pre-post analyses) and interpret results to guide product and business decisions
Design and build end-to-end pipelines for classification, ranking, embeddings, or generation tasks
Drive reliability practices in deployed models, including retraining logic, alerting on drift, and root cause analysis
Work closely with product and engineering stakeholders to align ML work with business priorities
Contribute to standards and documentation, mentor junior team members, and help shape our evolving ML platform

Fulltime

Senior Machine Learning Engineer

We are seeking a Senior Machine Learning Engineer to bridge the gap between adva...

Location

Switzerland , Zürich

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing
Minimum 4+ years of experience in Machine Learning, with a mandatory split focus between Model Architecture and Systems Optimization
Proven experience building and shipping Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct)
Must have experience creating custom evaluation sets for tasks like Document Understanding
Expert-level knowledge of SGLang and vLLM for optimized serving
Demonstrable experience optimizing models for both NVIDIA (H100) and AMD (MI300x) accelerators
Hands-on experience with Knowledge Distillation and Pruning to reduce model latency for target serving sizes
A track record of taking complex multi-modal models from research code to a deployed, user-facing production product

Job Responsibility

Continuously evaluate and implement the latest research trends in Vision-Language Models, specifically focusing on Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA)
Design and build massive-scale training and evaluation datasets, ensuring multilingual compatibility and broad visual understanding for European market requirements
Lead the model co-design process, creating architectures that are natively optimized for accelerator capabilities (compute-bound vs. memory-bound operations)
Architect high-throughput serving layers using SGLang and vLLM, optimizing for non-standard decoding strategies
Implement scientific experiments to find the Pareto-optimal frontier between serving latency and generation quality
Execute Knowledge Distillation (KD), unstructured pruning, and quantization techniques to fit large-scale VLM architectures onto single-node GPU setups (specifically H100 or MI300x) without compromising model quality
Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency, identifying bottlenecks at the operator level
Manage the full pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines
Take ownership of landing the serving-efficient model in a production environment, ensuring reliability and scalability

Fulltime

Select Country

Senior / Lead Machine Learning Engineer, Serving

Requirements

Looking for more opportunities?

Senior / Lead Machine Learning Engineer, Serving

Senior / Lead Machine Learning Engineer, Serving

Senior Machine Learning Engineer, AI Platform

Senior Machine Learning Engineer, AI Platform

Senior Machine Learning Engineer - Maps

Senior Machine Learning Engineer

Senior Machine Learning Engineer, Computer Vision - Robotics

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Our AI answers in your language