This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Principal Machine Learning System Engineer on the AI & ML Platform team, you will play a pivotal role in developing and refining the core infrastructure that empowers all Atlassian software engineers, ML engineers, and data scientists to create, train, evaluate, deploy, and manage Machine Learning models and pipelines. You will collaborate closely with product teams, such as Jira and Confluence, to solve their specific challenges in building ML solutions. This may involve curating high-quality ML datasets, fine-tuning open-sourced Large Language Models (LLMs), or accessing proprietary LLMs. Your expertise in both ML and software development expertise will be instrumental in overcoming challenging problems and navigating complex infrastructure and architectural issues. This position offers you the chance to lead projects from the technical design phase all the way to launch. You will partner with various teams and internal stakeholders to achieve impactful results.
Job Responsibility:
Collaborate with your teammates to solve complex problems, from technical design to launch
Deliver cutting-edge solutions that are used by other Atlassian teams and products to build AI features that reach millions of customers
Deliver code reviews, documentation & bug fixes within a strong engineering culture
Partner across engineering teams to take on company-wide initiatives spanning multiple projects
Mentor junior members of the team
Requirements:
Extensive experience in building Machine Learning and AI infra/platform/system (generally 5+ years)
Comprehensive ML lifecycle expertise: proven experience developing, deploying, and maintaining end-to-end ML systems, from data engineering to model serving and monitoring
Large-scale system design: Extensive experience designing and building scalable, fault-tolerant, and high-performance distributed systems for machine learning
Proficiency with frameworks and languages: Expert-level proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX. Familiarity with other languages like Go, Java, or Scala is also beneficial
MLOps and automation: Deep experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models
Nice to have:
Cloud infrastructure: Hands-on expertise with major cloud platforms such as AWS, GCP, or Azure, including their specific AI/ML services and compute resources like GPUs
Big data processing: Experience with distributed computing frameworks for large-scale data processing, such as Spark, Ray, or Dask
Performance optimization: A demonstrated ability to diagnose and solve complex performance and optimization problems for ML models and infrastructure
Generative AI systems: Experience with GenAI frameworks and tools, including developing and fine-tuning large language models (LLMs) and building retrieval-augmented generation (RAG) systems
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.