This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Working closely with our Engineering Manager, you’ll be a Senior LLMOps Engineer on the Model Platform team. You are a technical leader responsible for building and scaling the infrastructure that powers our entire model lifecycle. Your mission is to build a robust, scalable, and reliable platform for deploying and managing our LLMs. You will lead the design and implementation of our LLMOps strategy, ensuring our AI engineers can move models from development to production seamlessly and efficiently. You will combine your deep infrastructure knowledge with MLOps principles to solve the critical challenges of serving models at scale.
Job Responsibility:
Lead the architecture, design, and implementation of our end-to-end LLMOps platform, from data ingestion and model training pipelines to production deployment and monitoring
Build and maintain robust CI/CD/CT (Continuous Integration/Continuous Delivery/Continuous Training) pipelines to automate the testing, validation, and deployment of large language models
Engineer highly available and scalable model serving solutions using modern infrastructure like Kubernetes, ensuring low latency and high throughput for our production services
Collaborate closely with AI research and engineering teams to understand their needs, streamline workflows, and create the tooling that accelerates their development cycles
Champion and implement best practices for model versioning, experiment tracking, monitoring, and governance across the organization
Mentor mid-level and junior engineers, sharing your deep expertise in infrastructure, automation, and operational excellence to foster a culture of reliability and scalability
Requirements:
Proven track record of designing, building, and maintaining MLOps or LLMOps infrastructure in a production environment
Previous hands-on experience building scalable, cloud-native infrastructure and platforms
Deployed and managed large-scale machine learning models in a production environment
Expert in Python, cloud platforms (AWS, GCP, or Azure), containerization (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, CloudFormation)
Deep and practical understanding of the entire machine learning lifecycle and the specific operational challenges of large language models
Ability to translate complex engineering and research requirements into concrete, robust, and automated platform solutions
Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Nice to have:
Experience with advanced model serving and optimization techniques (e.g., quantization, distillation, multi-model serving)
Experience with specialized MLOps frameworks like MLflow, Kubeflow, or Weights & Biases
Contributions to open-source MLOps or infrastructure-related projects
What we offer:
Flexible hybrid working environment, with 3 days in the office
Additional paid day off for your birthday and wellness days
Special corporate rates at Anytime Fitness in Melbourne, Sydney tbc
A generous personal development budget of $500 per annum
Learn from some of the best engineers and creatives, joining a diverse team
Become an owner, with shares (equity) in the company