AI/ML DevOps Engineer Job at NTT DATA (Noida)

New

Senior AI/ML Engineer - Vice President

Location

India , Chennai; Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Strong expertise with 10+ years in microservices, RESTful APIs, and distributed architectures
Experience in the following AI/ML technologies
Development: Python, FastAPI, Java, Spring AI, Async Programming
Foundational Models: Gemini, OpenAI, Claude, Llama, Local Models
Machine Learning: PyTorch, TensorFlow, Fine-tuning
AI Frameworks: Google's ADK, LangChain, LlamaIndex, Hugging Face
Orchestration: LangGraph, Multi-Agent Systems
Retrieval Augmented Generation: PostgreSQL, Vector DBs, Advanced Retrieval
Deployment: Docker, Production APIs, Monitoring
Applied Methodology: Prompt Engineering, Workflow Design, GenAI Optimization

Fulltime

Ai DevOps Engineer

The AI DevOps Engineer will play a critical role in designing, deploying, and sc...

Location

United States , Los Angeles

Salary:

160000.00 - 185000.00 USD / Year

Robert Half

Expiration Date

Until further notice

Requirements

Strong experience building and managing CI/CD pipelines (e.g., GitHub Actions or similar)
Hands-on experience with major cloud platforms (AWS, Azure, or GCP)
Experience with infrastructure automation and containerization concepts
Knowledge of AI/ML deployment workflows and operational best practices
Strong understanding of system reliability, monitoring, and observability tools
Experience implementing secure infrastructure and access control frameworks
Excellent communication skills with the ability to collaborate across technical and business teams

Job Responsibility

Design and implement CI/CD pipelines for AI applications and services
Build and manage scalable AI deployment frameworks and infrastructure
Establish best practices for AI operations, including monitoring, reliability, and cost optimization
Automate infrastructure provisioning and configuration using modern DevOps tools
Develop observability solutions to ensure high system performance and uptime
Implement secure deployment practices and maintain strong access controls
Operationalize AI platforms and integrate them into business workflows
Collaborate with cross-functional teams to identify opportunities for AI-driven scaling
Educate internal stakeholders on AI capabilities, tools, and operational best practices
Contribute to long-term infrastructure and AI platform strategy

What we offer

Discretionary bonus
Annual bonus eligibility
Comprehensive health benefits including medical, dental, and vision coverage
Additional perks and employee-focused programs

Fulltime

Cloud Platform DevOps Engineer

We are seeking an experienced (5+ years), motivated, and hands-on Cloud Platform...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

5+ years of experience
Experience in infrastructure design, implementation, and maintenance
Agile development
Understanding of infrastructure and operational considerations for AI and Machine Learning initiatives
Hands-on experience with Kubernetes, Docker, HELM, Ansible, DevOps tools, or similar CI/CD platforms
Proficiency in scripting and automation (e.g., Python, Bash)
Track record of implementing scalable, resilient, and high-performance solutions
Strong communication and collaboration skills
Ability to mentor and guide junior team members
Proven hands-on experience with HashiCorp Vault

Job Responsibility

Lead the design, implementation, and ongoing management of secure, scalable, and resilient infrastructure components
Administer and maintain secret and certificate management solutions using HashiCorp Vault
Perform hands-on administration and optimization of database systems (PostgreSQL, Oracle, MongoDB)
Deploy, monitor, and troubleshoot data orchestration workflows using Apache Airflow
Implement and manage messaging queues such as Kafka and IBM MQ
Develop, maintain, and troubleshoot RESTful API and SOAP integrations
Implement and optimize build and deployment processes using Gradle
Design, implement, and manage container orchestration platforms with Kubernetes and Helm
Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3
Set up and maintain load balancing solutions (e.g., Nginx, HAProxy, AWS ELB/ALB, Kubernetes Ingress controllers)

Fulltime

DevOps Engineer III

DevOps Engineer III. Saint-Laurent, Québec. Contract. Title: Senior DevOps Engin...

Location

Canada , Saint-Laurent

Salary:

Not provided

Randstad

Expiration Date

July 02, 2026

Requirements

5+ years of experience with Kubernetes and containerization technologies
OpenShift experience is a plus
CKA certification is a plus
Experience with GitOps tools like ArgoCD
Proficiency in Kubernetes package management using Helm and Kustomize
Strong understanding of service mesh technologies including Istio, Kiali, and Jaeger
Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Loki
5+ years of hands-on experience with Terraform and Ansible
8+ years of experience with cloud platforms including Azure and AWS
strong experience required in networking design and implementation

Job Responsibility

Manage and maintain OpenShift clusters, ensuring high availability and scalability
Implement GitOps workflows for Kubernetes using ArgoCD
ensure declarative infrastructure and application delivery
Package and customize Kubernetes applications using Helm charts and Kustomize overlays
Deploy and manage service mesh architectures using Istio
monitor traffic and observability with Kiali and Jaeger
Monitor infrastructure and application metrics using Prometheus
visualize data and create dashboards with Grafana
Implement centralized logging using Grafana Loki for Kubernetes workloads
Write and maintain Terraform scripts for infrastructure as code across multi-cloud environments

What we offer

Possibility of renewal: Initial contract is 1 year
however, there is a strong possibility of renewal or becoming permanent

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Palo Alto

Salary:

90000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Middle AI/ML Engineer (GenAI, AWS)

Provectus is an AWS Premier Consulting Partner and AI consultancy featured in Fo...

Location

Colombia , Medellín; Bogotá; Bucaramanga; Cali; Barranquilla

Salary:

Not provided

Provectus

Expiration Date

Until further notice

Requirements

Machine Learning
Deep learning hands-on experience: CNNs, RNNs, Transformers
Depth in at least one domain: NLP, Computer Vision, Recommendation, or Time Series
Experience building LLM apps with OpenAI, Anthropic, or Hugging Face APIs
Hands-on RAG design
Familiarity with vector databases (OpenSearch, Pinecone, Chroma, FAISS)
Understanding of prompt engineering and LLM evaluation
Proficient with AI coding tools (Claude Code, Cursor, Copilot, etc.)
Experience building tool-using, stateful agents with an orchestration framework
Understanding of Model Context Protocol (MCP)

Job Responsibility

Build and deliver ML pipelines from experimentation to production
Build and optimize models — supervised, unsupervised, and generative AI
Write clean, tested, modular Python code
Deploy and monitor models
track performance and prevent drift
Contribute to LLM applications: RAG systems and agent workflows
Use AI coding tools on every task
Use Claude Code or similar AI tools to deliver client projects
Build with agent frameworks (Bedrock AgentCore, Strands, CrewAI, or similar)
Integrate or build MCP servers

What we offer

Competitive salary based on competencies and market rates
Premium AI tooling: Claude Code, Cursor, and Provectus AI toolkit
Mentorship from Senior ML Engineers and Tech Leads
Clear growth path: Mid-Level → Senior ML Engineer → Tech Lead
Learning budget for courses, certifications, and conferences
Remote-first culture
work on projects across LATAM, North America, and Europe
Health benefits

Fulltime

Select Country

AI/ML DevOps Engineer

Job Description

Job Responsibility

Requirements

Looking for more opportunities?