CrawlJobs Logo

Senior Machine Learning Engineer (Team Lead)

Australia, South Bank · Job Posted March 25, 2026
Apply Position
Job Link Share

Job Description

As our Artificial Intelligence (AI) and Machine Learning (ML) Team Leader, you will drive the delivery of high impact, first of their kind AI solutions within the travel industry. You will lead the design and productionisation of advanced AI products that operate reliably at scale, shaping the long-term AI strategy and platform foundations. This role spans Classical ML, Generative AI, AI agents and agentic workflows. You will define robust ML and LLM infrastructure, establish strong engineering standards across MLOps and LLMOps, and ensure production grade evaluation, safeguards and observability across all AI systems.

Job Responsibility

  • Lead the development and productionisation of ML models, LLM powered systems and agent based applications
  • Define and build end to end MLOps including CI CD, model registry, monitoring, drift detection and retraining for predictive ML systems
  • Establish LLMOps standards including context engineering, automated evaluation pipelines, red teaming, safeguards and policy guardrails
  • Architect and build AI agent workflows, endpoints, gateways and orchestration layers enabling secure tool access, structured reasoning and multi agent collaboration
  • Design and govern MCP servers and modern agent communication protocols to ensure interoperability, security and scalability
  • Implement strong observability across ML and GenAI systems including reliability, latency, evaluation metrics, usage tracking and cost control
  • Drive scalable ML infrastructure, feature stores and data platforms on Databricks
  • Oversee Kubernetes based deployments and cloud native AI infrastructure
  • Partner with senior stakeholders to prioritise and deliver multiple high impact AI initiatives
  • Coach and grow a high performing AI engineering team

Requirements

  • 7+ years delivering production grade ML or AI systems with proven commercial impact
  • 3+ years Leading and Mentoring engineers
  • Experience building AI agents, RAG systems or LLM powered applications in production
  • Demonstrated experience leading technical teams and managing complex AI programmes
  • Strong hands on experience across ML infrastructure, distributed systems and scalable AI architecture
  • Experience building and governing AI agent platforms including endpoints, gateways and tool orchestration
  • Familiarity with MCP servers and emerging agent communication standards and protocols
  • Experience defining evaluation frameworks, safety mechanisms and governance for LLM and agent based systems
  • Deep knowledge of Python, modern AI/ML frameworks and scalable AI platforms including Databricks
  • Strong expertise in Kubernetes and cloud native production environments

What we offer

  • Individualised, ongoing Learning & Development via communities of practice
  • Innovation Days
  • Dedicated Engineering Days
  • Access to 'LinkedIn Learning' for ongoing skills development
  • Women in PM&E group
  • Exclusive Staff Discounts
  • Travel Discounts
  • Career opportunities in a network of brands and businesses across the globe
  • Corporate Health Discounts
  • Mental Health Support and Employee Assistance Program for staff and family
  • Regular awards nights, social team-building and industry events
  • Corporate Social Responsibility program supporting nominated charities
  • Sustainability efforts

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Machine Learning Engineer (Team Lead)

8 matching positions

Senior / Lead Machine Learning Engineer, Serving

Inworld is a product-oriented research lab of top AI researchers and engineers, ...
Location
Location
Germany
Salary
Salary:
Not provided
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
  • Model Acceleration. Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding
  • High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
  • Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
  • Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
  • Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
  • Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems
  • Professional fluency in English (written and spoken) is required, as you will be collaborating daily with our US-based leadership and engineering teams
Read More
Arrow Right

Senior Lead Machine Learning Engineer

As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile tea...
Location
Location
United States , New York; San Francisco; San Jose; Cambridge; McLean
Salary
Salary:
229900.00 - 286200.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree
  • At least 8 years of experience designing and building data-intensive solutions using distributed computing
  • At least 4 years of experience programming with Python, Scala, or Java
  • At least 3 years of experience building, scaling, and optimizing ML systems
  • At least 2 years of experience leading teams developing ML solutions
Job Responsibility
Job Responsibility
  • Design, build, and/or deliver ML models and components that solve real-world business problems
  • Inform ML infrastructure decisions using understanding of ML modeling techniques
  • Solve complex problems by writing and testing application code, developing and validating ML models, and automating tests and deployment
  • Collaborate as part of a cross-functional Agile team to create and enhance software
  • Retrain, maintain, and monitor models in production
  • Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale
  • Construct optimized data pipelines to feed ML models
  • Leverage continuous integration and continuous deployment best practices
  • Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI
  • Use programming languages like Python, Scala, or Java
What we offer
What we offer
  • Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • Comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Senior Lead Machine Learning Engineer

At Capital One, we are creating responsible and reliable AI systems, changing ba...
Location
Location
United States , New York; San Francisco; San Jose; Cambridge; McLean
Salary
Salary:
229900.00 - 286200.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree
  • At least 8 years of experience designing and building data-intensive solutions using distributed computing
  • At least 4 years of experience programming with Python, Scala, or Java
  • At least 3 years of experience building, scaling, and optimizing ML systems
  • At least 2 years of experience leading teams developing ML solutions
Job Responsibility
Job Responsibility
  • Design, build, and/or deliver ML models and components that solve real-world business problems, while working in collaboration with a cross-functional team of engineers, research scientists, technical program managers, and product managers
  • Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale such as AWS Ultraclusters, Huggingface, VectorDBs, PyTorch, and more
  • Construct optimized data pipelines to feed ML models
  • Design, develop, test, deploy, and support AI software components including large language model inference, similarity search, model evaluation, experimentation, governance, and observability, etc
  • Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems
  • Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One
  • Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI
What we offer
What we offer
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, AI Platform

The AI Platform team is responsible for building the foundational infrastructure...
Location
Location
United States; Canada
Salary
Salary:
139000.00 - 218000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams
Job Responsibility
Job Responsibility
  • Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments
  • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence
  • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads
  • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization
  • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation
  • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines
  • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing
  • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer – AV Labs

Uber is launching AV Labs to accelerate the autonomous technology ecosystem. We'...
Location
Location
United States , Sunnyvale
Salary
Salary:
202000.00 - 224000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of working experience in the ML/Robotics industry
  • Bachelor's degree in Computer Science, Computer Engineering, or related fields
  • Proficient in Python and Linux environments
  • Familiar with modern AI/ML frameworks (e.g., PyTorch).
Job Responsibility
Job Responsibility
  • Algorithm Development: Lead the development of autonomy algorithms and foundation models that extract high-fidelity semantic meaning from complex urban edge cases to enrich our L4 data lake
  • Systems Architecture Design: Architect scalable ML systems, including management of upstream sensor dependencies
  • Technical Leadership: Partner with fellow engineers to architect, design, and build scalable solutions for ML technology that can stand the test of scale and availability
  • Dataset Optimization: Deliver high-quality datasets to accelerate ML technologies through advanced sensor data collection, processing, and auto-labeling
  • Cross-Functional Collaboration: Partner with platform, product, and security engineering teams to enable the successful deployment of the latest machine learning techniques into production.
What we offer
What we offer
  • Bonus program
  • Equity award & other types of comp
  • 401(k) plan
  • Various benefits
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, Search Assistant

Roku is changing how the world watches TV. Roku is the #1 TV streaming platform ...
Location
Location
United States , San Jose
Salary
Salary:
361300.00 - 510000.00 USD / Year
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of industry experience (or PhD with 5+ years) applying ML at scale in search, recommendation, ads, personalization, or related domains
  • Strong expertise in ranking systems, recommendation systems, retrieval, personalization, and multi-objective optimization
  • Experience building large-scale ML systems leveraging deep learning, sequence models, LLMs, reinforcement learning, or bandit frameworks
  • Strong product intuition and experience optimizing user engagement, retention, and monetization simultaneously
  • Proficiency in Python, Java, or Scala
  • Experience with distributed systems and ML infrastructure such as Spark, Airflow, streaming systems, feature stores, and cloud platforms
  • Strong technical leadership, system design, communication, and problem-solving skills
  • MS or PhD in Computer Science, Statistics, or a related field
Job Responsibility
Job Responsibility
  • Lead the technical vision and roadmap for ranking, personalization, and recommendation systems powering Roku’s entertainment assistant
  • Develop and deploy state-of-the-art ML models using deep learning, transformers, LLMs, bandits, reinforcement learning, and causal inference techniques
  • Build multi-objective optimization systems balancing engagement, retention, relevance, and monetization goals
  • Drive innovation in conversational discovery, contextual recommendations, and personalized content experiences across the platform
  • Design, run, and analyze online A/B experiments tied to key product and business KPIs
  • Architect scalable ML systems, feature platforms, and data pipelines supporting rapid experimentation and long-term growth
  • Mentor engineers and provide technical leadership across cross-functional initiatives involving engineering, product, UX, and analytics teams
What we offer
What we offer
  • Health insurance
  • Equity awards
  • Life insurance
  • Disability benefits
  • Parental leave
  • Wellness benefits
  • Paid time off
  • Global access to mental health and financial wellness support and resources
  • Healthcare (medical, dental, and vision)
  • Life, accident, disability, commuter, and retirement options (401(k)/pension)
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

Join Amgen’s Mission of Serving Patients. At Amgen, if you feel like you’re part...
Location
Location
United States , Thousand Oaks
Salary
Salary:
158606.00 - 200052.00 USD / Year
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree (or foreign equivalent) in Computer Science, Statistics, Electrical Engineering, Mathematics, Bioengineering or a related field and 4 years of experience in the job offered or in a Data Scientist related – Occupation
  • 4 years of experience in the following: 1) Programming skills in Python including experience with scientific computing and machine learning libraries including pandas, NumPy, scikit-learn, XGBoost, LightGBM, TensorFlow and PyTorch
  • 2) Applying supervised and unsupervised learning techniques to real-world problems, including experience with Random Forest, XGBoost, ensemble models, and deep learning architectures
  • 3) Version control (Git), containerization and orchestration tools including Docker and Kubernetes, and cloud environments including AWS or GCP
  • 4) Involved with data science platforms including Databricks and SageMaker
  • 5) Using statistical methods industry including regression modeling, hypothesis testing, Bayesian methods, Forecasting techniques and time-series analysis
  • 6) Building ETL (Extract, Transform, Load) pipelines for handling large-scale, high-dimensional datasets, familiarity with healthcare data structures and data types
  • 7) Experience with software DevOps CI/CD tools and GitLab
  • 8) Experience developing and fine-tuning Natural Language Processing (NLP) models, including work with architectures such as BERT, ALBERT, GPT, and other transformer-based models.
Job Responsibility
Job Responsibility
  • Lead efforts with Amgen business leaders to identify, explore and develop transformative AI and ML solutions to enable access to computational tools and data within Amgen, and ultimately improve patient outcomes across multiple therapeutic areas
  • Utilize Cloud Services to collect, store, preprocess, and analyze large datasets from various sources across Amgen
  • Collaborate with other ML engineers, data scientists and research scientists to identify appropriate ML models and algorithms
  • Facilitate ML & Data engineering efforts by architecting and guiding the implementation of data and ML pipelines for development and deployment
  • Facilitate model deployment to production, including monitoring and maintenance of ML models, put in place metrics to assess accuracy and drift
  • Define model evaluation and validation strategies, train and test models, and analyze and resolve errors and biases in models
  • Lead and develop standards, processes, and best practices for the team across the machine learning-based solution implementation lifecycle
  • Involved in technical guide and career development mentor to junior machine learning engineers and data scientists in a formal or matrixed fashion.
What we offer
What we offer
  • stock
  • retirement
  • medical
  • life and disability insurance
  • eligibility for an annual bonus or for sales roles, other incentive compensation
  • Retirement and Savings Plan with generous company contributions
  • group medical, dental and vision coverage
  • life and disability insurance
  • flexible spending accounts
  • discretionary annual bonus program
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer, ML Training Platform

Location
Location
United States
Salary
Salary:
216700.00 - 303400.00 USD / Year
Reddit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems
  • Deep Kubernetes Expertise: You know K8s beyond just 'deploying pods.' You understand CRDs, Controllers and the Operator pattern
  • Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms
  • Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling)
  • GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes
  • Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP
  • Experience working with distributed training frameworks, including Ray and Kubernetes
  • Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle
  • Strong organizational & communication skills
Job Responsibility
Job Responsibility
  • Lead the building, testing, and maintenance of ML training infrastructure at Reddit
  • Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows
  • Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows
  • Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance
  • GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully
  • Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the 'Idea-to-Prototype' loop, and standardize software environments (Docker images, Python dependency management)
What we offer
What we offer
  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days off
  • Generous paid Parental Leave
  • Paid Volunteer time off
  • Fulltime
Read More
Arrow Right