Principal Machine Learning System Engineer Job at Atlassian

Job Description

As a Principal Machine Learning System Engineer on the AI & ML Platform team, you will play a pivotal role in developing and refining the core infrastructure that empowers all Atlassian software engineers, ML engineers, and data scientists to create, train, evaluate, deploy, and manage Machine Learning models and pipelines. You will collaborate closely with product teams, such as Jira and Confluence, to solve their specific challenges in building ML solutions. This may involve curating high-quality ML datasets, fine-tuning open-sourced Large Language Models (LLMs), or accessing proprietary LLMs. Your expertise in both ML and software development expertise will be instrumental in overcoming challenging problems and navigating complex infrastructure and architectural issues. This position offers you the chance to lead projects from the technical design phase all the way to launch. You will partner with various teams and internal stakeholders to achieve impactful results.

Job Responsibility

Collaborate with your teammates to solve complex problems, from technical design to launch
Deliver cutting-edge solutions that are used by other Atlassian teams and products to build AI features that reach millions of customers
Deliver code reviews, documentation & bug fixes within a strong engineering culture
Partner across engineering teams to take on company-wide initiatives spanning multiple projects
Mentor junior members of the team

Requirements

Extensive experience in building Machine Learning and AI infra/platform/system (generally 5+ years)
Comprehensive ML lifecycle expertise: proven experience developing, deploying, and maintaining end-to-end ML systems, from data engineering to model serving and monitoring
Large-scale system design: Extensive experience designing and building scalable, fault-tolerant, and high-performance distributed systems for machine learning
Proficiency with frameworks and languages: Expert-level proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX. Familiarity with other languages like Go, Java, or Scala is also beneficial
MLOps and automation: Deep experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models

Nice to have

Cloud infrastructure: Hands-on expertise with major cloud platforms such as AWS, GCP, or Azure, including their specific AI/ML services and compute resources like GPUs
Big data processing: Experience with distributed computing frameworks for large-scale data processing, such as Spark, Ray, or Dask
Performance optimization: A demonstrated ability to diagnose and solve complex performance and optimization problems for ML models and infrastructure
Generative AI systems: Experience with GenAI frameworks and tools, including developing and fine-tuning large language models (LLMs) and building retrieval-augmented generation (RAG) systems

What we offer

health and wellbeing resources
paid volunteer days

Atlassian - All Job Offers

Select Country

Principal Machine Learning System Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Principal Machine Learning System Engineer

Principal Engineer (Machine Learning)

Principal Machine Learning Engineer - AV Labs

Principal Machine Learning Engineer, Agentic AI

Sr. Principal Machine Learning Engineer

Principal Machine Learning Engineer – Autonomy

Principal Machine Learning Engineer

Staff / Principal Machine Learning Engineer, Serving

Staff / Principal Machine Learning Engineer, Serving

Our AI answers in your language