CrawlJobs Logo

AI/ML DevOps Engineer

India, Noida · Job Posted January 24, 2026
Apply Position
Job Link Share

Job Description

The AI/ML DevOps Engineer will be responsible for designing and maintaining Infrastructure-as-Code templates using Terraform for Azure AI/ML services. This role requires a blend of engineering and coordination skills to ensure safe deployments of AI workloads. Candidates should have experience with Azure services, automation tools, and security practices.

Job Responsibility

  • Hands-on engineer responsible for designing, building, and maintaining Infrastructure-as‑Code (IaC) templates with Terraform to provision and operate Azure AI/ML services for multiple application teams
  • The role blends engineering (Terraform, pipelines, AKS, security) with coordination (Kanban flow, cross-team alignment, risk/issue tracking) to accelerate safe, repeatable deployments of AI workloads for onboarding application teams to the bank's AI Platform (Azure)

Requirements

  • Engineer IaC modules and reusable templates (Terraform) to provision Azure resources for AI/ML (e.g., Azure OpenAI/AI Studio, Azure ML, Cognitive Search, Key Vault, Storage, networking)
  • Automate pipelines (Azure DevOps/GitHub Actions) for plan/apply, policy checks, and environment promotion
  • integrate secrets, approvals, and drift detection
  • Stand up access & identity using Microsoft Entra ID patterns (app registrations, groups/roles, RBAC) for app teams and automation
  • Support AKS-based deployments and platform integrations (ingress, images, namespaces, quotas) for AI services that land on Kubernetes
  • Harden & govern: embed guardrails (Azure Policy, role assignments, private endpoints), tagging/FinOps, logging/monitoring baselines
  • Scripting with Python or Bash for IaC tooling and deployment helpers
  • Experience codifying policies/controls (OPA/Conftest, Azure Policy as Code) and cost governance tags

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI/ML DevOps Engineer

8 matching positions

New

Senior AI/ML Engineer - Vice President

Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong expertise with 10+ years in microservices, RESTful APIs, and distributed architectures
  • Experience in the following AI/ML technologies
  • Development: Python, FastAPI, Java, Spring AI, Async Programming
  • Foundational Models: Gemini, OpenAI, Claude, Llama, Local Models
  • Machine Learning: PyTorch, TensorFlow, Fine-tuning
  • AI Frameworks: Google's ADK, LangChain, LlamaIndex, Hugging Face
  • Orchestration: LangGraph, Multi-Agent Systems
  • Retrieval Augmented Generation: PostgreSQL, Vector DBs, Advanced Retrieval
  • Deployment: Docker, Production APIs, Monitoring
  • Applied Methodology: Prompt Engineering, Workflow Design, GenAI Optimization
  • Fulltime
Read More
Arrow Right

Ai DevOps Engineer

The AI DevOps Engineer will play a critical role in designing, deploying, and sc...
Location
Location
United States , Los Angeles
Salary
Salary:
160000.00 - 185000.00 USD / Year
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience building and managing CI/CD pipelines (e.g., GitHub Actions or similar)
  • Hands-on experience with major cloud platforms (AWS, Azure, or GCP)
  • Experience with infrastructure automation and containerization concepts
  • Knowledge of AI/ML deployment workflows and operational best practices
  • Strong understanding of system reliability, monitoring, and observability tools
  • Experience implementing secure infrastructure and access control frameworks
  • Excellent communication skills with the ability to collaborate across technical and business teams
Job Responsibility
Job Responsibility
  • Design and implement CI/CD pipelines for AI applications and services
  • Build and manage scalable AI deployment frameworks and infrastructure
  • Establish best practices for AI operations, including monitoring, reliability, and cost optimization
  • Automate infrastructure provisioning and configuration using modern DevOps tools
  • Develop observability solutions to ensure high system performance and uptime
  • Implement secure deployment practices and maintain strong access controls
  • Operationalize AI platforms and integrate them into business workflows
  • Collaborate with cross-functional teams to identify opportunities for AI-driven scaling
  • Educate internal stakeholders on AI capabilities, tools, and operational best practices
  • Contribute to long-term infrastructure and AI platform strategy
What we offer
What we offer
  • Discretionary bonus
  • Annual bonus eligibility
  • Comprehensive health benefits including medical, dental, and vision coverage
  • Additional perks and employee-focused programs
  • Fulltime
Read More
Arrow Right

Cloud Platform DevOps Engineer

We are seeking an experienced (5+ years), motivated, and hands-on Cloud Platform...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience
  • Experience in infrastructure design, implementation, and maintenance
  • Agile development
  • Understanding of infrastructure and operational considerations for AI and Machine Learning initiatives
  • Hands-on experience with Kubernetes, Docker, HELM, Ansible, DevOps tools, or similar CI/CD platforms
  • Proficiency in scripting and automation (e.g., Python, Bash)
  • Track record of implementing scalable, resilient, and high-performance solutions
  • Strong communication and collaboration skills
  • Ability to mentor and guide junior team members
  • Proven hands-on experience with HashiCorp Vault
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and ongoing management of secure, scalable, and resilient infrastructure components
  • Administer and maintain secret and certificate management solutions using HashiCorp Vault
  • Perform hands-on administration and optimization of database systems (PostgreSQL, Oracle, MongoDB)
  • Deploy, monitor, and troubleshoot data orchestration workflows using Apache Airflow
  • Implement and manage messaging queues such as Kafka and IBM MQ
  • Develop, maintain, and troubleshoot RESTful API and SOAP integrations
  • Implement and optimize build and deployment processes using Gradle
  • Design, implement, and manage container orchestration platforms with Kubernetes and Helm
  • Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3
  • Set up and maintain load balancing solutions (e.g., Nginx, HAProxy, AWS ELB/ALB, Kubernetes Ingress controllers)
  • Fulltime
Read More
Arrow Right

DevOps Engineer III

DevOps Engineer III. Saint-Laurent, Québec. Contract. Title: Senior DevOps Engin...
Location
Location
Canada , Saint-Laurent
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
July 02, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of experience with Kubernetes and containerization technologies
  • OpenShift experience is a plus
  • CKA certification is a plus
  • Experience with GitOps tools like ArgoCD
  • Proficiency in Kubernetes package management using Helm and Kustomize
  • Strong understanding of service mesh technologies including Istio, Kiali, and Jaeger
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Loki
  • 5+ years of hands-on experience with Terraform and Ansible
  • 8+ years of experience with cloud platforms including Azure and AWS
  • strong experience required in networking design and implementation
Job Responsibility
Job Responsibility
  • Manage and maintain OpenShift clusters, ensuring high availability and scalability
  • Implement GitOps workflows for Kubernetes using ArgoCD
  • ensure declarative infrastructure and application delivery
  • Package and customize Kubernetes applications using Helm charts and Kustomize overlays
  • Deploy and manage service mesh architectures using Istio
  • monitor traffic and observability with Kiali and Jaeger
  • Monitor infrastructure and application metrics using Prometheus
  • visualize data and create dashboards with Grafana
  • Implement centralized logging using Grafana Loki for Kubernetes workloads
  • Write and maintain Terraform scripts for infrastructure as code across multi-cloud environments
What we offer
What we offer
  • Possibility of renewal: Initial contract is 1 year
  • however, there is a strong possibility of renewal or becoming permanent
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Palo Alto
Salary
Salary:
90000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Middle AI/ML Engineer (GenAI, AWS)

Provectus is an AWS Premier Consulting Partner and AI consultancy featured in Fo...
Location
Location
Colombia , Medellín; Bogotá; Bucaramanga; Cali; Barranquilla
Salary
Salary:
Not provided
provectus.com Logo
Provectus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Machine Learning
  • Deep learning hands-on experience: CNNs, RNNs, Transformers
  • Depth in at least one domain: NLP, Computer Vision, Recommendation, or Time Series
  • Experience building LLM apps with OpenAI, Anthropic, or Hugging Face APIs
  • Hands-on RAG design
  • Familiarity with vector databases (OpenSearch, Pinecone, Chroma, FAISS)
  • Understanding of prompt engineering and LLM evaluation
  • Proficient with AI coding tools (Claude Code, Cursor, Copilot, etc.)
  • Experience building tool-using, stateful agents with an orchestration framework
  • Understanding of Model Context Protocol (MCP)
Job Responsibility
Job Responsibility
  • Build and deliver ML pipelines from experimentation to production
  • Build and optimize models — supervised, unsupervised, and generative AI
  • Write clean, tested, modular Python code
  • Deploy and monitor models
  • track performance and prevent drift
  • Contribute to LLM applications: RAG systems and agent workflows
  • Use AI coding tools on every task
  • Use Claude Code or similar AI tools to deliver client projects
  • Build with agent frameworks (Bedrock AgentCore, Strands, CrewAI, or similar)
  • Integrate or build MCP servers
What we offer
What we offer
  • Competitive salary based on competencies and market rates
  • Premium AI tooling: Claude Code, Cursor, and Provectus AI toolkit
  • Mentorship from Senior ML Engineers and Tech Leads
  • Clear growth path: Mid-Level → Senior ML Engineer → Tech Lead
  • Learning budget for courses, certifications, and conferences
  • Remote-first culture
  • work on projects across LATAM, North America, and Europe
  • Health benefits
  • Fulltime
Read More
Arrow Right