CrawlJobs Logo

Principal Machine Learning System Engineer

· Job Posted December 27, 2025
Apply Position
Job Link Share

Job Description

As a Principal Machine Learning System Engineer on the AI & ML Platform team, you will play a pivotal role in developing and refining the core infrastructure that empowers all Atlassian software engineers, ML engineers, and data scientists to create, train, evaluate, deploy, and manage Machine Learning models and pipelines. You will collaborate closely with product teams, such as Jira and Confluence, to solve their specific challenges in building ML solutions. This may involve curating high-quality ML datasets, fine-tuning open-sourced Large Language Models (LLMs), or accessing proprietary LLMs. Your expertise in both ML and software development expertise will be instrumental in overcoming challenging problems and navigating complex infrastructure and architectural issues. This position offers you the chance to lead projects from the technical design phase all the way to launch. You will partner with various teams and internal stakeholders to achieve impactful results.

Job Responsibility

  • Collaborate with your teammates to solve complex problems, from technical design to launch
  • Deliver cutting-edge solutions that are used by other Atlassian teams and products to build AI features that reach millions of customers
  • Deliver code reviews, documentation & bug fixes within a strong engineering culture
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members of the team

Requirements

  • Extensive experience in building Machine Learning and AI infra/platform/system (generally 5+ years)
  • Comprehensive ML lifecycle expertise: proven experience developing, deploying, and maintaining end-to-end ML systems, from data engineering to model serving and monitoring
  • Large-scale system design: Extensive experience designing and building scalable, fault-tolerant, and high-performance distributed systems for machine learning
  • Proficiency with frameworks and languages: Expert-level proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX. Familiarity with other languages like Go, Java, or Scala is also beneficial
  • MLOps and automation: Deep experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models

Nice to have

  • Cloud infrastructure: Hands-on expertise with major cloud platforms such as AWS, GCP, or Azure, including their specific AI/ML services and compute resources like GPUs
  • Big data processing: Experience with distributed computing frameworks for large-scale data processing, such as Spark, Ray, or Dask
  • Performance optimization: A demonstrated ability to diagnose and solve complex performance and optimization problems for ML models and infrastructure
  • Generative AI systems: Experience with GenAI frameworks and tools, including developing and fine-tuning large language models (LLMs) and building retrieval-augmented generation (RAG) systems

What we offer

  • health and wellbeing resources
  • paid volunteer days

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Principal Machine Learning System Engineer

8 matching positions

Principal Engineer (Machine Learning)

We are seeking a highly experienced Sr Principal ML Engineer with a good underst...
Location
Location
United States , Santa Clara
Salary
Salary:
185200.00 - 299475.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience using Python to build complex backend systems, ML experience is preferred. 10+ years of experience in software development
  • Strong background on machine learning and ML frameworks (e.g., TensorFlow, PyTorch)
  • Experience with cloud-native service development stack on GCP is a plus
  • Solid grasp of RESTful API design and micro services architecture
  • Skilled in diagnosing and solving complex problems while providing detailed technical analysis
  • Attention to details and high behavioral standards
  • Team player with can-do attitude to tackle difficult problems and you inspire your team to do the same
  • High energy and the ability to work in a fast-paced environment
  • Excellent collaboration and communication with multiple teams
  • Fast learner and eager to absorb new emerging technologies
Job Responsibility
Job Responsibility
  • Architect and implement new ML models and pipeline to support efficient model training, validation, and real-time inference
  • Optimize the existing ML models and pipeline
  • Ensure smooth integration of ML solutions into production systems, focusing on performance, reliability, and scalability
  • Build automation tools for continuous integration, delivery, and deployment of backend and ML components
  • Work closely with cross-functional teams (product, QA, DevOps, and customer support) to align development efforts with business needs
  • Troubleshoot and resolve complex issues that arise within both the backend infrastructure and ML models
  • Ensure code quality, security, and data privacy by following industry best practices
  • Maintain clear and concise documentation for system architecture, API endpoints, and ML model integration processes
What we offer
What we offer
  • restricted stock units
  • bonus
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer - AV Labs

Uber is launching AV Labs to accelerate the autonomous technology ecosystem. We'...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
302000.00 - 336000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of working experience in the ML, Robotics, or Autonomous Systems industry
  • Proven experience leading large-scale technical projects from conception to production
  • Bachelor's degree in Computer Science, Computer Engineering, or related fields
  • Expert-level proficiency in Python and Linux environments
  • Deep expertise in modern AI/ML frameworks (e.g., PyTorch, TensorFlow)
Job Responsibility
Job Responsibility
  • Strategic Semantic Modeling: Lead the strategy for developing autonomy algorithms and foundation models that extract high-fidelity semantic meaning from complex urban edge cases to enrich our L4 data lake
  • Scene and Behavior Causality Understanding: Provide the overarching technical vision for multi-modal scene understanding and modeling the causality behind ego vehicle behaviors from logged data
  • Technical Mentorship & Influence: Mentor senior and lead engineers, fostering a culture of rigorous experimentation and engineering excellence
  • Cross-Organizational Leadership: Act as a bridge between AV Labs and other Uber engineering units to ensure our semantic models and data evaluation platforms are successfully integrated and deployed at scale
What we offer
What we offer
  • bonus program
  • equity award
  • other types of comp
  • 401(k) plan
  • various benefits
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer, Agentic AI

Location
Location
United States
Salary
Salary:
Not provided
Zillow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A master's degree or above, equivalent experience in Computer Science, Electrical Engineering, or a related field with emphasis on foundational LLM, agentic AI, reinforcement learning, AI planning, or natural language processing
  • 7+ years of hands-on work building large-scale, high-impact solutions—ideally in the most recent two years building agent-based systems, multi-agent collaboration, or similar paradigms
  • Experience developing dialogue systems capable of long conversations, multi-step reasoning, context-rich decision-making
  • Experience deploying and scaling AI services capable of handling hundreds of millions of daily interactions with high availability, low latency, and robust fault tolerance
  • A track record of publishing high-impact research in top AI/ML venues is a big plus.
Job Responsibility
Job Responsibility
  • Leverage frameworks like AgentSDK, and LangChain/LangGraph to design, prototype, and develop multi-agent systems that are capable of highly autonomous and context-aware interactions
  • Leverage advanced GenAI models including reasoning models, real-time voice API, etc, to build agentic prototypes and later on convert them into product-level agentic skills and deploy to users
  • Have the mentality of “build, learn, and pivot”, and hold the high bar of shipping into production
  • Mentor and guide engineers in using the right technologies for agentic AI solutions and foster a culture of innovation and responsible AI usage
  • Distill complex research findings and system designs into actionable insights for diverse audiences—including executives
  • Remain on the cutting edge of agentic AI emerging paradigms, driving product innovation
  • Serve as the focal point for applied science projects, driving alignment on timelines, and prioritization
  • Continuously refine processes for experimentation, A/B testing, and production rollouts to balance rapid innovation with reliability and responsible deployment.
  • Fulltime
Read More
Arrow Right

Sr. Principal Machine Learning Engineer

Universal Ads is looking for a Sr. Principal Machine Learning Engineer to lead t...
Location
Location
United States
Salary
Salary:
192396.16 - 450928.49 USD / Year
comcastadvertising.com Logo
Comcast Advertising
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Machine Learning (ML)
  • Collaborating
  • Advertising Technologies
  • 15 Years + work experience
Job Responsibility
Job Responsibility
  • Set vision and technical direction for the long-term ML strategy and roadmap for ad ranking at Universal Ads, including model architecture, infrastructure, and deployment frameworks
  • Architect and scale distributed ML systems capable of real-time decisioning across high-throughput ad environments
  • Partner with product and marketplace teams to align model performance with user and advertiser outcomes
  • Drive technical direction for cross-functional initiatives involving bidding algorithms, pacing, and system optimization
  • Shape technical culture, and ensure high standards for research rigor, reproducibility, and code quality
  • Establish strong experimentation and evaluation frameworks (A/B testing, counterfactual analysis, etc.) for model validation and pacing control
  • Represent Universal Ads externally in technical forums, publications, or conferences to shape the broader conversation around ML in advertising
  • Design and own ML data pipelines end-to-end, ensuring data cleanliness and freshness, and establishing the standards and infrastructure for reliable model inputs across ranking and pacing systems
What we offer
What we offer
  • Paid Time off
  • Physical Wellbeing
  • Financial Wellbeing
  • Emotional Wellbeing
  • Life Events + Family Support
  • Base pay
  • Bonus
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer – Autonomy

As a Principal ML Engineer, you will be at the forefront of Physical AI, develop...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
302000.00 - 336000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of working experience in the ML, Robotics, or Autonomous Systems industry (building upon the base 8+ years expected for advanced roles)
  • Proven experience leading large-scale technical projects from conception to production
  • Bachelor's degree in Computer Science, Computer Engineering, or related fields
  • Expert-level proficiency in Python and Linux environments
  • Deep expertise in modern AI/ML frameworks (e.g., PyTorch)
Job Responsibility
Job Responsibility
  • Lead the strategy for Autonomous Driving Algorithm Development, ensuring our stack is robust, safe, and capable of handling the most complex urban edge cases
  • Provide the overarching technical vision for our multi-modal autonomy systems
  • Design and oversee the implementation of complex, large-scale ML systems
  • Mentor senior and lead engineers
  • Act as a bridge between AV Labs and other Uber engineering units
What we offer
What we offer
  • Bonus program
  • Equity award
  • Other types of compensation
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer

VideoAmp is on a mission to create the best employee and workplace experience wh...
Location
Location
United States , Los Angeles; New York; Boulder; Chicago; Dallas; St. Petersburg
Salary
Salary:
184000.00 - 200000.00 USD / Year
videoamp.com Logo
VideoAmp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Machine Learning Engineering, Data Engineering, or a similar technical role
  • Expert-level proficiency in Python, ML frameworks, Temporal, and distributed data processing (Spark, Hive)
  • Experience with making models reproduceable as needed
  • Deep understanding of data quality methodologies, fault-tolerant data systems, and validation frameworks
  • Strong experience designing and scaling ML infrastructure, automated pipelines, and production-grade deployment workflows
  • Hands-on experience with cloud-native architectures, ideally on AWS
  • Expertise with CI/CD, version control, and modern DevOps practices
  • Strong communicator with the ability to translate complex technical concepts into clear, actionable insights
  • Demonstrated ability to lead cross-functional technical initiatives with minimal guidance
Job Responsibility
Job Responsibility
  • Architect and lead development of advanced machine learning models, quality frameworks, and large-scale data validation systems that power VideoAmp’s measurement and optimization products
  • Design and optimize ML infrastructure, including scalable distributed pipelines, model lifecycle tooling, and automated validation frameworks
  • Lead experimentation strategies, including model benchmarking, reproducibility, evaluation methodologies, and statistical validation
  • Drive data quality standards across the organization by partnering with Data Engineering, Core Engineering, and Measurement Science teams
  • Own cross-functional initiatives end-to-end, from requirements definition through production deployment, monitoring, and iteration
  • Influence technical direction, contributing to architectural decisions, design reviews, and long-term platform strategy
  • Mentor and uplevel engineers, providing guidance on ML best practices, system design, data quality, and code excellence
  • Communicate complex findings, insights, and recommendations to technical and non-technical stakeholders
What we offer
What we offer
  • Discretionary and flexible paid time off
  • In addition to standard US holidays off, VideoAmp employees also partake in Spring, Summer and Winter breaks
  • Comprehensive medical, dental, and vision benefits for you and your dependents—including multiple options fully covered by VideoAmp
  • Unlimited financial wellness sessions with Origin financial advisors
  • 401k Plan with matching
  • HSA & FSA
  • Commuter Benefits
  • Cell Phone Reimbursement
  • Paid Maternity and Parental Leave for All Family Additions
  • Equity
  • Fulltime
Read More
Arrow Right

Staff / Principal Machine Learning Engineer, Serving

Staff / Principal Machine Learning Engineer, Serving - UK. About Inworld: Inworl...
Location
Location
United Kingdom
Salary
Salary:
140000.00 - 200000.00 GBP / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Inference Optimization
  • Model Acceleration
  • High-Performance Systems
  • Distributed Systems & Scaling
  • Public work
  • Full-cycle ownership
  • Background
Job Responsibility
Job Responsibility
  • Take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Staff / Principal Machine Learning Engineer, Serving

A year ago, reliably working agentic systems and sub-second multimodal inference...
Location
Location
United States , Mountain View
Salary
Salary:
270000.00 - 500000.00 USD / Year
inworld.ai Logo
Inworld AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM
  • Model Acceleration. Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding
  • High-Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs
  • Distributed Systems & Scaling. Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections
  • Public work. Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups
  • Full-cycle ownership. You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production
  • Background. PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems
Job Responsibility
Job Responsibility
  • We hand you unclear problems and expect you to make them clear
  • We value engineers who say 'I don't know yet' and then design the benchmark or prototype that finds out
  • We treat performance, latency, and reliability as first-class product features, not a box to check before launch
  • Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward
  • Your work should be visible
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • relocation assistance
  • Fulltime
Read More
Arrow Right