CrawlJobs Logo

Ai Ops Ml Ops Engineer

· Job Posted June 09, 2026
Apply Position
Job Link Share

Job Description

Whitehall Resources are currently looking for a AI Ops ML Ops Engineer. Key Requirements: – The AI Ops / ML Ops Engineer operationalizes, monitors and supports AI/ML solutions in production. – The role ensures models, pipelines and AI services are deployed, monitored, governed and maintained with reliable operational practices. – Implement ML Ops and deployment practices. – KPI (Qualitative): AI/ML deployments follow controlled and repeatable operational practices. – Monitor model and solution health. – KPI (Qualitative): Model health checks completed and production issues detected early. – Support production AI/ML systems. – KPI (Qualitative): Production AI/ML systems maintain expected uptime and support SLAs. – Ensure auditability and governance of AI operations. – KPI (Qualitative): AI operational records are complete and audit-ready. – Improve automation and reliability. – KPI (Qualitative): Reduced manual effort and improved reliability of AI/ML operations.

Job Responsibility

  • Operationalizes, monitors and supports AI/ML solutions in production
  • Ensures models, pipelines and AI services are deployed, monitored, governed and maintained with reliable operational practices
  • Implement ML Ops and deployment practices
  • Monitor model and solution health
  • Support production AI/ML systems
  • Ensure auditability and governance of AI operations
  • Improve automation and reliability

Requirements

  • Min 7+ years of Experience in ML Ops, DevOps, AI/ML deployment, monitoring, cloud platforms and production support
  • Build strong cross-functional ways of working across Data & AI, IT, Digital and business teams so delivery is aligned, practical and business-led
  • Keep the internal and external customer experience at the center of data, analytics and AI delivery, with focus on reliable outcomes and decision support
  • Continuously build capability in modern data, analytics and AI practices and actively share knowledge with peers and business users
  • Apply structured problem solving to simplify complex data, process and technology issues and remove barriers to execution
  • Identify practical opportunities to improve business performance using modern data platforms, analytics, automation, GenAI and embedded AI capabilities
  • Adjust priorities and delivery approach in a dynamic business environment while maintaining governance, quality and business continuity
  • Deploy, monitor and manage ML models and AI services across the lifecycle
  • Apply release, versioning, automation and controlled deployment practices
  • Monitor uptime, drift, performance, data quality and operational metrics
  • Maintain runbooks, troubleshoot issues and coordinate incident resolution for AI/ML systems
  • Arabic & English preferred
  • Certifications in ML Ops, Databricks, Azure, DevOps, Kubernetes or cloud platforms are desirable

Nice to have

  • Arabic
  • English
  • Certifications in ML Ops
  • Databricks
  • Azure
  • DevOps
  • Kubernetes
  • cloud platforms

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ai Ops Ml Ops Engineer

8 matching positions

Senior Software Engineer, AI & ML Ops

Hyundai AutoEver America seeks a seasoned Senior AI/ML Engineer to architect, de...
Location
Location
United States , Irvine
Salary
Salary:
103170.00 - 158873.00 USD / Year
haeaus.com Logo
Hyundai AutoEver America
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, AI, or related field
  • advanced degrees/certifications are a plus
  • 8+ years of software engineering experience, including 3+ years in AI/ML solution development
  • Proven experience designing and deploying LLM-based solutions, traditional ML models, RAG systems, and agent workflows
  • Strong expertise in Python, TensorFlow/PyTorch, Hugging Face, prompt engineering, vector databases, and AI orchestration
  • Hands-on experience with AWS SageMaker/Bedrock, Azure OpenAI, or Azure ML Studio, plus MLOps best practices (CI/CD, testing, model monitoring)
  • Proficiency in frontend frameworks (React), cloud-native deployment (Docker/Kubernetes), microservice APIs, and relational/NoSQL databases
Job Responsibility
Job Responsibility
  • Architect and develop scalable AI/ML and LLM-based systems, including RAG pipelines, agentic workflows, predictive models, and generative AI solutions
  • Build full‑stack AI applications, including React-based dashboards and front‑end interfaces integrated with backend services and cloud infrastructure
  • Develop data pipelines and ML Ops workflows using Python, SQL, AWS/Azure platforms, and monitoring tools to train, deploy, and optimize models
  • Lead cross-functional AI initiatives, deliver PoCs/MVPs, ensure compliance with AI governance, and integrate AI features into enterprise and user-facing systems
  • Provide technical leadership and mentorship, guiding standards, code reviews, model documentation, and best practices in AI/ML development
  • Continuously improve AI performance and reliability through prompt engineering, architecture enhancements, and data optimization
What we offer
What we offer
  • comprehensive medical/dental coverage
  • generous PTO
  • education assistance
  • annual merit increase eligibility
  • Fulltime
Read More
Arrow Right

Ml Ops Engineer

We are hiring a ML Ops Engineer for our GCC client — Europe’s top retail brands....
Location
Location
India , Bangalore
Salary
Salary:
Not provided
srkay.com Logo
SRKay Consulting Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Workflow Management: Experience in managing Apache Airflow and Composer to support the Data Engineering components of grounded AI solutions
  • MLflow: Deep knowledge of MLflow Tracking, Projects, and Registry. Experience migrating MLflow backends between cloud providers
  • Workflow Tools: Familiarity with Vertex AI Pipelines and Azure DevOps for automation
  • GCP AI Services: Practical experience with Vertex AI (Workbench, Model Garden, Feature Store) and BigQuery ML
  • Containerization: Expert-level Docker and Kubernetes (GKE/AKS) skills. Must understand K8s operators and resource management for ML workloads
  • Infrastructure as Code (IaC): Proficiency in Terraform to manage reproducible cloud environments
  • Programming: Advanced Python skills with a focus on software engineering best practices (unit testing, modular design)
  • Data Engineering: Experience with Change Data Capture (CDC), Spark/PySpark, and optimizing data flow from BigQuery to training nodes
  • Access Control: Knowledge of IAM roles, VPC Service Controls, and securing ML endpoints
  • Experience with LLMOps (managing large-scale foundation models, prompt versioning, and vector database scaling)
Job Responsibility
Job Responsibility
  • Pipeline Orchestration: Design, develop, and maintain complex ML workflows using Apache Airflow (Cloud Composer) to automate data ingestion, preprocessing, and model training
  • Lifecycle Management: Administer and scale MLflow for experiment tracking, model packaging, and maintaining a centralized Model Registry across the organization
  • Cloud & Hybrid Ops: Create and optimize training environments for custom ML/LLM models
  • Model Serving & Scaling: Architect high-performance inference endpoints and serve models via FastAPI/Flask with API Gateway
  • Infrastructure Management: Manage auto-scaling CUDA clusters on Google Kubernetes Engine (GKE)
  • CI/CD: Manage end-to-end delivery with Continuous Integration & Continuous Delivery (CI/CD)
  • Observability & Monitoring: Build dashboards to track model health, latency, and data drift
  • Fulltime
Read More
Arrow Right

ML Ops Engineer

The MLOps Engineer will work closely with the Data Science, Analytics, and Data ...
Location
Location
United States
Salary
Salary:
127000.00 - 160550.00 USD / Year
zelis.com Logo
Zelis
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2–5 years of experience in ML Ops, ML Engineering, or a related role with a focus on production-level model monitoring, automation, and deployment
  • Strong experience with ML observability tools or custom-built monitoring systems
  • Experience with monitoring LLMs and Generative AI models, including prompt evaluation, hallucination tracking, and agent behavior auditing
  • Experience in deploying and managing ML workloads using containerization and orchestration platforms such as Docker, Kubernetes, Kubeflow, or TensorFlow Extended
  • Familiarity with AutoML pipelines and workflow management tools (e.g., MLflow, SageMaker Autopilot)
  • Experience working in cloud environments, preferably AWS (e.g., SageMaker, S3, Lambda, ECS/EKS)
  • Understanding of ML lifecycle tools (e.g., MLflow, SageMaker Pipelines) and CI/CD practices
  • Strong security and compliance awareness, particularly related to model/data governance (e.g., HIPAA, GDPR)
  • Proficiency in Python and key data libraries (Pandas, Numpy, Matplotlib, etc.)
  • Advanced SQL skills and experience with Snowflake or similar data warehousing platforms
Job Responsibility
Job Responsibility
  • Build and maintain monitoring infrastructure for conventional machine learning models, with capabilities for performance tracking, drift detection, and alerting
  • Research, evaluate, and implement monitoring strategies and tools for Generative AI systems, including LLMs and Agentic AI architectures
  • Collaborate with ML Engineers, Data Scientists, and DevOps teams to deploy, manage, and monitor models in production
  • Develop and support scalable, secure, and automated data pipelines using Snowflake, SQL, and Python for training, serving, and monitoring ML and GenAI models
  • Leverage AutoML tools and frameworks (e.g., MLflow, Kubeflow, SageMaker Autopilot) to streamline experimentation and deployment
  • Design dashboards and reporting systems to visualize model health metrics and surface key operational insights
  • Ensure auditability, reproducibility, and compliance for model performance and data flow in production environments, with consideration for regulatory standards like GDPR and HIPAA
  • Maintain CI/CD workflows and version-controlled codebases (e.g., Git) for ML infrastructure and pipelines
  • Utilize containerization and orchestration technologies (e.g., Docker) to manage scalable ML infrastructure
  • Leverage tools such as Streamlit and Python visualization libraries to present insights from model and data monitoring
What we offer
What we offer
  • 401k plan with employer match
  • flexible paid time off
  • holidays
  • parental leaves
  • life and disability insurance
  • health benefits including medical, dental, vision, and prescription drug coverage
  • Fulltime
Read More
Arrow Right

Ai/ Ml Engineer

Octopus was founded with a mission to use technology to accelerate us towards a ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
octopus.energy Logo
Octopus Energy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep Understanding of GenAI - experience working with LLMs
  • Data Product Development - experience building Python-based applications and/or data products, with hands-on work in data-intensive and machine learning systems
  • AI model evaluation and observability - Experience of different ways of evaluating AI models and applications. Implementing logging, tracing, and monitoring in systems
  • Context Engineering and Knowledge Grounding - Experience of optimising and grounding GenAI models and applications through prompt design, RAG and knowledge base integration
  • Software Development Practices - Strong grounding in Git, testing, CI/CD frameworks
  • Ability to thrive in a fast moving environment - Dealing with ambiguity, setting clear priorities, and translating ideas into actionable plans
Job Responsibility
Job Responsibility
  • Design and Develop AI Platform Services - Build reusable, scalable services that expose GenAI models, knowledge retrieval pipelines, and agent workflows to application teams
  • Knowledge Base Development - Build and maintain knowledge retrieval systems including embedding generation, chunking, and strategies for database management
  • AI Ops, evals and observability - Setting up frameworks for monitoring and evaluating AI output quality (relevance, accuracy, safety, drift, cost) and platform observability (latency, cost, usage)
  • Context Engineering - Design systems for prompt assembly: Create prompt templates, system prompts and guidelines for platform users
  • Fulltime
Read More
Arrow Right

ML Ops Engineer

As an MLOps Engineer, you will be responsible for building, maintaining, and opt...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4 to 10 years of experience in MLOps, DevOps, or ML Engineering
  • Strong proficiency with cloud platforms such as AWS, Azure, or GCP
  • Experience with containerization and orchestration tools like Docker and Kubernetes
  • Hands-on experience with ML model deployment, monitoring, and scaling
  • Proficiency with CI/CD tools such as Jenkins or GitLab CI
  • Familiarity with data versioning and management tools such as DVC
  • Strong coding skills in Python with knowledge of ML libraries like TensorFlow or PyTorch
  • Strong problem-solving skills and ability to work in a collaborative environment
  • Effective communication skills for cross-functional teamwork
Job Responsibility
Job Responsibility
  • Develop and manage infrastructure for end-to-end ML workflows including model training, deployment, monitoring, and maintenance
  • Implement CI/CD pipelines for ML models and data workflows
  • Collaborate with cross-functional teams to build scalable and robust ML infrastructure on cloud and on-premises environments
  • Monitor and optimize model performance and infrastructure to ensure efficient resource usage
  • Manage data versioning and model versioning across multiple environments
  • Implement security, governance, and compliance protocols in ML deployment and data pipelines
  • Support troubleshooting, debugging, and incident management for ML infrastructure issues
What we offer
What we offer
  • Competitive compensation
  • Opportunity to work with a dynamic team on cutting-edge AI and ML solutions
  • Professional growth and development opportunities
  • Fulltime
Read More
Arrow Right

Senior ML Ops Engineer

Join Elsevier as a Senior ML Ops Engineer to lead the development of impactful A...
Location
Location
United States , Philadelphia
Salary
Salary:
95300.00 - 158800.00 USD / Year
edtechjobs.io Logo
EdTech Jobs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Current experience in ML Engineering, MLOps platforms, shipping ML or search/GenAI systems to production
  • Strong Python, Java, and/or Scala experience
  • Hands-on experience with major cloud vendor solutions (AWS, Azure and/or Google)
  • Experience with Search/vector/graph technologies (e.g., Elasticsearch / OpenSearch / Solr / Neo4j)
  • Experience in evaluating LLM models
  • A strong understanding of the Data Science Life Cycle including feature engineering, model training, and evaluation metrics
  • Familiarity with ML frameworks, e.g., PyTorch, TensorFlow, PySpark
  • Experience with large-scale data processing systems, e.g., Spark
  • Experience with statistical analysis, machine learning theory and natural language processing
Job Responsibility
Job Responsibility
  • Automate and orchestrate machine learning workflows across major cloud and AI platforms (AWS, Azure, Databricks, and foundation model APIs such as OpenAI)
  • Maintain and version model registries and artifact stores to ensure reproducibility and governance
  • Develop and manage CI/CD for ML, including automated data validation, model testing, and deployment
  • Implement ML Engineering solutions using popular MLOps platforms such as AWS SageMaker, MLflow, Azure ML
  • Scale end-end custom Sagemaker pipelines
  • Design and implement the engineering components of GAR+RAG systems (e.g., query interpretation and reflection, chunking, embeddings, hybrid retrieval, semantic search), manage prompt libraries, guardrails and structured output for LLMs hosted on Bedrock/SageMaker or self-hosted
  • Design and implement ML pipelines that utilize Elasticsearch/OpenSearch/Solr, vector DBs, and graph DBs
  • Build evaluation pipelines: offline IR metrics (NDCG, MAP, MRR), LLM quality metrics (faithfulness, grounding), and A/B testing
  • Optimize infrastructure costs through monitoring, scaling strategies, and efficient resource utilization
  • Stay current with the latest GAI research, NLP and RAG and apply the state-of-the-art in our experiments and systems
What we offer
What we offer
  • Annual incentive bonus
  • Country specific benefits
  • Fair and accessible hiring process with accommodation support
  • Fulltime
Read More
Arrow Right

Vice President ML Ops Engineer

Embark on a transformative journey as Vice President- ML Operations Engineer at ...
Location
Location
India , Noida
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in Programming & Automation: Python, Bash, SQL
  • MLOps Tools: MLflow, Kubeflow, AWS SageMaker Pipelines
  • Cloud Platforms: AWS (SageMaker, Bedrock, Lambda, Step Functions, CloudWatch)
  • DevOps Expertise: CI/CD (GitHub Actions, Jenkins), Docker, Kubernetes
  • Data Management: Enterprise data governance, ETL processes
  • Leadership Skills: Strategic planning, team management, stakeholder communication
Job Responsibility
Job Responsibility
  • Definition and oversight of data governance and procedures to address control and regulatory requirements, including Data Privacy
  • Definition and oversight of data analytics and insights that support the effective management of the business and well as driving commercial outcomes
  • Development of a team of data professionals with expertise in data analytics, data engineering, data science, and other relevant disciplines
  • Analysis of the bank's current data landscape, identify key data assets and gaps, and develop a roadmap for future data initiatives
  • Monitoring team performance and setting clear performance expectations
  • Lead the design and governance of MLOps frameworks, AWS-based architectures, and automation strategies to enable efficient, secure, and scalable deployment of AI and Generative AI models
What we offer
What we offer
  • Hybrid working
  • Modern workspaces, collaborative areas, and state-of-the-art meeting rooms
  • Facilities include wellness rooms, on-site cafeterias, fitness centers, and tech-equipped workstations
  • Fulltime
Read More
Arrow Right

Ai Ops Platform Engineer

Join us as an AI Ops Engineer, to build and run an enterprise AI Factory within ...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
barclays.co.uk Logo
Barclays
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • LLMOps / MLOps at production scale, operating the full Generative AI lifecycle including models, prompts and agents, CI/CD pipelines, structured evaluation, drift and hallucination monitoring, and controlled, auditable release processes suitable for banking environments
  • Cloud‑native AI platform engineering on AWS, with hands‑on delivery using services such as Amazon Bedrock for foundation models, agent orchestration patterns, Lambda and Step Functions, alongside demonstrated Python engineering capability and secure microservices and API design
  • AI governance, observability and cost optimisation, embedding governance by design through policy as code, alignment to model risk framework expectations, lifecycle traceability and audit‑ready evidence, supported by SRE‑grade monitoring and ongoing optimisation of token usage and compute cost across AI workloads
Job Responsibility
Job Responsibility
  • Build and run an enterprise AI Factory within our Card Merchant Services organisation, enabling AI‑driven change across the merchant payments lifecycle
  • Accountable for the end‑to‑end operationalisation of AI, spanning model, prompt, and agent lifecycles
  • deployment and monitoring
  • guardrails
  • and cost optimisation, ensuring AI solutions are production‑ready, auditable, compliant, and scalable across merchant payment use cases
  • Accountable for the end‑to‑end engineering of GenAI and ML platforms, embedding governance, observability and operational resilience by design, while enabling teams to deploy and run AI solutions with clarity, assurance and accountability at scale
  • Lead and manage engineering teams, providing technical guidance, mentorship, and support to ensure the delivery of high-quality software solutions
  • Oversee timelines, team allocation, risk management and task prioritization
  • Mentor and support team members' professional growth, conduct performance reviews, provide actionable feedback, and identify opportunities for improvement
  • Evaluation and enhancement of engineering processes, tools, and methodologies
What we offer
What we offer
  • Competitive holiday allowance
  • Life assurance
  • Private medical care
  • Pension contribution
  • Fulltime
Read More
Arrow Right