CrawlJobs Logo

Senior MLOps Engineer

Singapore, Singapore · Job Posted January 12, 2026
Apply Position
Job Link Share

Job Description

As an MLOps Engineer in DAMO service line, you will be responsible for ensuring the reliability, safety, performance and continuous improvement of large-scale machine learning and AI systems in production, including both generative AI and traditional ML systems like computer vision and recommendation models. You will work across the full software delivery lifecycle, contributing to design, implementation, deployment and ongoing operational excellence.

Job Responsibility

  • Design, implement and maintain monitoring and alerting for ML and AI operational signals
  • Build and operate robust evaluation and testing pipelines for all ML and AI systems
  • Investigate and resolve production issues related to model behaviour
  • Collaborate with infrastructure and platform teams to ensure stable, performant and cost-efficient AI inference
  • Manage the lifecycle of ML models, prompts, embeddings, vector indices and associated components
  • Design and operate effective feedback loops that incorporate real user interactions
  • Uphold governance, safety and compliance standards
  • Maintain clear, comprehensive documentation
  • Communicate system health, risks, upcoming changes and operational insights clearly
  • Support the growth and development of junior team members

Requirements

  • High proficiency in Python (Pandas, NumPy, Scikit-learn) for scripting, analysis, and maintaining production models
  • Strong SQL skills for querying, data manipulation, and operational data checks
  • Experience building or maintaining GenAI / agentic solutions (e.g., RAG, LlamaIndex, CrewAI, or similar orchestration/RAG tooling)
  • Solid understanding of classical ML algorithms, model evaluation, and challenges like drift and bias
  • Hands-on experience with model monitoring (data quality, prediction quality, latency) using Prometheus, Grafana, or cloud-native tools
  • Experience with Azure (Databricks, Azure Machine Learning, etc.) for deployment and resource management
  • Familiarity with Agile methodologies (Scrum/Kanban)
  • Must be Singaporean citizens or already hold Singaporean Permanent Residency (PR) at the time of application
  • Willingness to be part of a 24x7 on-call rotation, as needed

Nice to have

  • Experience with big data frameworks (Spark, Dask) for large-scale processing
  • Understanding of containerization/orchestration such as Docker and basic Kubernetes
  • Exposure to workflow/pipeline or IaC tooling (Airflow, Kubeflow, MLflow, Terraform)
  • Familiarity with GCP/AWS is a plus

What we offer

  • Learning & Development
  • Interactive tools
  • Numerous development programs
  • Teammates who want to help you grow

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior MLOps Engineer

8 matching positions

Senior MLOps Engineer

We're looking for an MLOps Engineer to help ensure our machine learning models r...
Location
Location
Denmark , København
Salary
Salary:
Not provided
life-science-talent-solutions.dk Logo
Life Science Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience deploying machine learning models into production and managing their lifecycle
  • Experience implementing model governance, including versioning, monitoring, drift detection, and reporting
  • Familiarity with MLOps tools such as MLflow, Kubeflow, or DVC
  • Solid understanding of CI/CD systems (e.g., GitHub Actions, ArgoCD) and infrastructure-as-code tools (e.g., Terraform, Helm)
  • Familiarity with data engineering concepts such as ETL pipelines, data lakes, and large-scale batch/stream processing
  • Experience mentoring or supporting colleagues to help them grow their technical skills
  • Proven experience in a senior-level DevOps, MLOps, or related infrastructure-focused engineering role
  • Strong proficiency in Python
  • Deep experience with cloud platforms (AWS, GCP, or Azure) and container orchestration tools (Docker, Kubernetes)
  • Ability to design scalable, secure, and observable systems in fast-moving environments
Job Responsibility
Job Responsibility
  • Own and manage the full lifecycle of both ML models and core infrastructure – from development and deployment to monitoring and continuous improvement
  • Build and maintain robust CI/CD pipelines for both software and ML workflows
  • Ensure reliability, scalability, observability, and security of production systems and ML infrastructure
  • Automate deployment, orchestration, and environment management using modern DevOps tooling
  • Collaborate closely with software engineers, ML engineers, and product teams to bring ML-powered features to production
  • Proactively detect, troubleshoot, and resolve infrastructure and model performance issues
  • Stay up to date with industry best practices in DevOps, MLOps, and infrastructure engineering
  • Document infrastructure, workflows, and operational procedures clearly and thoroughly
What we offer
What we offer
  • Equipment provided by Corti
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer

Prolific is not just another player in the AI space – we are the architects of t...
Location
Location
United Kingdom
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience with cloud infrastructure and infrastructure as code
  • Previous experience with the ML and LLM lifecycle - training, hosting, optimisation, observability
  • Used to working closely with researchers and data scientists - taking experiments from worksheets into production
  • Strong grasp of ML fundamentals and modern GenAI stack
Job Responsibility
Job Responsibility
  • Infrastructure & Platform Engineering: Design and maintain scalable cloud environments (GCP/AWS) using Terraform
  • Manage GPU/TPU resource allocation for training, fine-tuning, and interactive notebooks
  • Build internal services and CLI tools to streamline the developer experience for the AI team
  • ML & LLM Orchestration: Design CI/CD/CT (Continuous Training) pipelines using tools such as GitHub Actions, MLFlow, Vertex AI Pipelines
  • Develop reusable patterns for model serving
  • Managing service deployments to Kubernetes
  • Manage and optimize vector databases and embedding pipelines for RAG-based systems
  • Performance & Optimization: Implement techniques to reduce latency and increase throughput
  • Solve scaling bottlenecks for serverless or containerized model deployments
  • Optimize GPU utilization and cloud spend without compromising performance
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
Read More
Arrow Right

Senior MLOps Engineer

This is a rare opportunity to build the foundational infrastructure that powers ...
Location
Location
United States; United Kingdom , Palo Alto; London
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional engineering experience with deep, hands-on proficiency in Python and complex distributed systems architecture
  • Extensive, practical experience building and managing systems at scale, specifically with queues, scheduling, traffic-control, and fleet management
  • Deep expertise in our core infrastructure stack: Linux, Docker, and Kubernetes
  • Strong experience with Redis, S3-compatible storage, and public cloud platforms (AWS)
Job Responsibility
Job Responsibility
  • Architect end-to-end model serving pipelines and integrate new model architectures from our research team into our core, high-throughput inference engine
  • Build robust and sophisticated scheduling systems to manage jobs based on cluster availability and user priority, ensuring we optimally leverage thousands of expensive GPU resources
  • Design and implement dynamic, traffic-based systems for hotswapping models on our GPU workers to maximize fleet efficiency and meet product SLOs
  • Own the end-to-end CI/CD pipelines, including creating a resilient artifact store to manage all model checkpoints across multiple versions and providers
  • Develop and maintain user-friendly APIs and interaction patterns that empower our product and research teams to ship groundbreaking features at high velocity
  • Manage and optimize our complex inference workloads at scale, operating across multiple clusters and hardware providers
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer

We are looking for an experienced MLOps Engineer to join our cloud and AI engine...
Location
Location
India
Salary
Salary:
Not provided
northbaysolutions.com Logo
NorthBay
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6–8 years of experience in MLOps, ML Engineering, or DevOps for ML
  • Strong hands-on experience with AWS SageMaker (training jobs, endpoints, pipelines, model registry)
  • Solid experience with Apache Airflow for workflow orchestration
  • Proficiency in Python for ML and pipeline development
  • Experience building and maintaining production-grade ML pipelines
  • Hands-on experience with AWS services such as S3, IAM, EC2, ECR, CloudWatch
  • Familiarity with CI/CD tools (GitHub Actions, Jenkins, GitLab CI, etc.)
  • Strong understanding of Linux environments and cloud networking basics
  • Experience with monitoring, logging, and alerting for ML systems
Job Responsibility
Job Responsibility
  • Design, build, and maintain end-to-end MLOps pipelines using AWS SageMaker
  • Develop and manage Airflow DAGs for ML workflow orchestration (training, validation, deployment, retraining)
  • Automate model training, evaluation, versioning, and deployment
  • Implement CI/CD pipelines for ML workflows and model releases
  • Manage model lifecycle, including experimentation, deployment, monitoring, and retraining
  • Integrate data ingestion and feature engineering workflows with ML pipelines
  • Monitor model performance, data drift, and pipeline reliability
  • Collaborate closely with Data Scientists, Data Engineers, and DevOps teams
  • Ensure security, scalability, and cost optimization across ML infrastructure
What we offer
What we offer
  • Work on large-scale, real-world ML systems
  • Fully remote role from India
  • Collaborate with global teams on cutting-edge AI initiatives
  • Opportunity to influence and mature MLOps practices at scale
  • Fulltime
Read More
Arrow Right

Senior MLOps Engineer - Data Ingestion - Paris

We are looking for a Senior MLOps Engineer to join the Panda Team (Data & ML Ope...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You have at least 7+ years as an MLOps Engineer or ML Platform Engineer with proven production model lifecycle management experience
  • You have expert-level experience with ML orchestration tools (MLflow, Braintrust, or similar) for batch processing and inference pipelines
  • You have a strong Site Reliability Engineering (SRE) foundation with focus on operations excellence, reliability, and observability
  • You have expertise in Python for automation and ML pipeline scripting
  • You have strong proficiency with infrastructure-as-code tools such as Terraform and container orchestration (Kubernetes)
  • You have experience with model evaluation frameworks and golden dataset management
  • You have a solid understanding of cloud infrastructure (preferably GCP, AWS, or Azure)
  • You have excellent problem-solving skills with focus on identifying and resolving infrastructure bottlenecks
  • You are fluent in English
Job Responsibility
Job Responsibility
  • Design and implement end-to-end ML model pipelines in production (LLM and custom models) with robust deployment, evaluation, and monitoring frameworks
  • Own data pseudo-anonymization architecture within ingestion services, converting Tier 0 (personal identifiers) to Tier 1 (anonymized data) while ensuring data quality and model performance
  • Build and maintain secure data export services with ML-based threat detection to prevent attack vectors (SQL injection, etc.) using adaptive models rather than manual rules
  • Manage golden datasets and implement production model evaluation frameworks to ensure anonymization quality and system reliability
  • Build and maintain data pipelines that efficiently extract, transform, and load data from various sources, handling multiple data formats (text, images, audio, video)
  • Implement automation and orchestration tools using ML orchestration platforms (MLflow, Braintrust, or similar) to streamline infrastructure provisioning and reduce manual effort
  • Monitor data and ML platforms for performance, reliability, and security
  • identify and troubleshoot issues proactively
  • Mentor team members on MLOps expertise and best practices to reduce knowledge silos and build organizational capability
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • 25 days of paid vacation per year, plus up to 14 days of RTT
  • Free mental health and coaching services through our partner Moka.care
  • Work from abroad for up to 10 days per year thanks to our flexibility days policy
  • Lunch vouchers (Swile card) worth €8.50 per working day, with €4.50 covered by Doctolib
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • 50% reimbursement of your public transport subscription
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Relocation support in case of international mobility
  • Fulltime
Read More
Arrow Right

Senior MLOps / LLMOps Engineer

Location
Location
Germany , Berlin
Salary
Salary:
Not provided
immoscout24.de Logo
ImmoScout24 GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in MLOps (CI/CD, Docker, Kubernetes) and operating production-grade systems
  • Proficiency in Python and solid software engineering and scalable system design skills
  • Hands-on experience with LLMs and generative AI technologies (e.g. GPT, Gemini or Anthropic-like models)
  • Expertise in prompt engineering, agent orchestration, context management, and output validation
  • Experience with LLM evaluation frameworks and deploying self-hosted LLMs
  • Familiarity with cloud platforms (e.g. AWS, GCP) as well as DevOps, testing, and observability practices
  • Strong communication skills and ability to collaborate with cross-functional teams and stakeholders
Job Responsibility
Job Responsibility
  • Design and maintain scalable ML/LLM infrastructure and pipelines
  • Productionize traditional ML and generative AI solutions with cross-functional product teams
  • Own the ML/LLMOps lifecycle: prompting, deployment, monitoring, evaluation and optimization
  • Build and evolve an LLM Gateway service to standardize access, routing, and governance
  • Develop evaluation frameworks to measure quality, performance, and reliability of LLM outputs
  • Design and implement MCP-compatible services to enable standardized context exchange between LLMs, tools, and data sources
  • Integrate MCP into internal platforms to support tool use, retrieval, and agent-based workflows across teams
  • Work with AWS and integrate self-hosted open-source AI models for scalable, secure applications
  • Ensure observability, cost efficiency, and system performance
  • Contribute to project management, stakeholder communication and cross-team collaboration
What we offer
What we offer
  • A competitive salary package and a bonus on top
  • Hybrid work model with three days of on-site work per week in the office
  • 30 days of vacation per year
  • Possibility to work from abroad for 10 days per year
  • Relocation agency support with visa process and attractive relocation package
  • Plus membership for tenants on ImmoScout24
  • Dedicated learning time per month, online courses on ScoutAcademy, regular book challenges, structured feedback, Lunch & Learn events and individual career paths
  • Professional family service for childcare
  • Bring your dog to work (upon approval)
  • Subsidized public transport or Job Bikes
Read More
Arrow Right

Senior ML Operations (MLOps) Engineer

Join our team as a Sr MLOps Engineer to help us bring current and next generatio...
Location
Location
Salary
Salary:
Not provided
eightsleep.com Logo
Eight Sleep
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience with a focus on ML infrastructure, distributed systems, or large-scale data processing in Python (e.g., PyTorch, TensorFlow, or similar)
  • Hands-on experience with ML workflow orchestration and CI/CD pipelines for model deployment
  • Demonstrated success shipping ML models to production at scale, handling telemetry, monitoring, and feedback loops across large device fleets or user populations
  • Strong experience with AWS (Lambda, ECS, DynamoDB, CloudWatch) or equivalent cloud platforms for serving and monitoring ML systems
  • A fast-paced, collaborative, and iterative approach to tackling complex problems
Job Responsibility
Job Responsibility
  • Pioneer Cutting-Edge Technology: Introduce and implement cutting-edge ML technologies, integrating them into our products and processes to enable the future of health monitoring
  • End-to-End Ownership: Own design and operation of robust ML infrastructure – building scalable data, model, and deployment pipelines that ensure reliable delivery of models to production
  • Cross-functional Collaboration Partner with R&D, firmware, data, and backend teams to ensure ML inference operates reliably and scales to Pods everywhere
  • Optimize for Performance: Drive cost-effective, scalable, and high-performance ML systems by optimizing compute, storage, and deployment resources across training and inference
  • Enhance Tooling and Platforms: Develop tooling, micro services, and frameworks to streamline data processing, experimentation, and deployment
  • Effective Remote Communication: Thrive in a remote work environment, ensuring clear and direct communication
What we offer
What we offer
  • Equity participation
  • Periodic equity refreshments based on performance
  • Your own Pod
  • Full access to health, vision, and dental insurance for you and your dependents
  • Supplemental life insurance
  • Flexible PTO
  • Commuter benefits to ease your daily commute
  • Paid parental leave
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer (LLMs, MLOps, Computer Vision & Cloud AI)

We are seeking a highly skilled Senior Machine Learning Engineer to design, deve...
Location
Location
United States , Austin
Salary
Salary:
Not provided
dutechsystems.com Logo
Dutech Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in cloud platforms including AWS, Azure, GCP, or OCI
  • 8+ years of experience with DevOps technologies including Docker, Kubernetes, Ansible, and CI/CD automation
  • Strong experience with SQL databases (PostgreSQL, MySQL) and NoSQL/vector databases
  • Proficiency in Bash and PowerShell scripting for automation and infrastructure management
  • Experience with Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD platforms
  • 3+ years of hands-on Python development experience in production environments
  • 3+ years of experience with NLP, LLMs, transformers, prompt engineering, RAG, and AI application development
  • Experience building and deploying machine learning models serving real-world users
  • Experience with time-series forecasting, anomaly detection, and predictive analytics
  • Experience developing recommendation systems and personalization engines
Job Responsibility
Job Responsibility
  • Design, develop, deploy, and maintain production-grade machine learning and AI solutions
  • Build and optimize Large Language Model (LLM) applications using GPT, BERT, T5, Hugging Face, Ollama, and similar technologies
  • Develop Retrieval-Augmented Generation (RAG) systems, prompt engineering strategies, and fine-tuning workflows
  • Implement and maintain MLOps pipelines using MLflow, Kubeflow, Airflow, Weights & Biases, or similar tools
  • Deploy and manage AI workloads across AWS, Azure, GCP, and OCI cloud environments
  • Design and support scalable infrastructure using Docker, Kubernetes, Ansible, and CI/CD pipelines
  • Develop machine learning models for forecasting, anomaly detection, predictive analytics, and real-time monitoring
  • Build recommendation engines, personalization platforms, ranking systems, and collaborative filtering solutions
  • Develop and deploy computer vision solutions using PyTorch, TensorFlow, OpenCV, YOLO, object detection, and image segmentation techniques
  • Implement feature engineering strategies and feature stores such as Feast or Tecton
  • Fulltime
Read More
Arrow Right