CrawlJobs Logo

AI DevOps Engineer

Vietnam, Ho Chi Minh City · Job Posted January 15, 2026
Apply Position
Job Link Share

Job Description

ABOUT THE JOB: Design, implement, and maintain infrastructure to support AI model training or deployment using modern DevOps tools and technologies. Manage CI/CD processes for AI and software projects. Integrate or wrap around AI models into dockerized APIs. Set up and install on-prem LLMs on GPU and associated data pipelines for grounding them on pre-defined document sets. Automate deployment and monitoring of AI solutions. Ensure security and compliance in all DevOps practices. Collaborate with cross-functional teams to deliver robust ML/AI solutions. Troubleshoot and optimize infrastructure for performance and reliability.

Job Responsibility

  • Design, implement, and maintain infrastructure to support AI model training or deployment using modern DevOps tools and technologies
  • Manage CI/CD processes for AI and software projects
  • Integrate or wrap around AI models into dockerized APIs
  • Set up and install on-prem LLMs on GPU and associated data pipelines for grounding them on pre-defined document sets
  • Automate deployment and monitoring of AI solutions
  • Ensure security and compliance in all DevOps practices
  • Collaborate with cross-functional teams to deliver robust ML/AI solutions
  • Troubleshoot and optimize infrastructure for performance and reliability

Requirements

  • Bachelor’s degree in computer science, information systems, or a related field
  • 5+ years in DevOps Engineering
  • Solid knowledge of docker, bash, GIT, Kubernetes, OpenShift
  • Experience with AI generative tools
  • Experience with templated syntaxes (Ansible, Azure pipelines, Helm charts)
  • Experience in CICD pipelines and automation (Ansible, docker registries, Helm charts, API Manager)
  • Basic understanding of web development (backend/frontend segregation, HTTP communication, etc.)
  • Fluency in English, both spoken and written, is required
  • You demonstrate analytical and problem-solving mindset, strong teamwork, collaboration skills, security by design and by default mindset
  • You are proactive in troubleshooting contexts

Nice to have

Experience with GPU setup in containerized environments is a plus

What we offer

  • Competitive salary and 13th-month salary
  • 14+ annual leaves per year
  • Premium healthcare insurance, starting from your probation period
  • Project reviews and yearly performance appraisals
  • Annual company trips
  • Teambuilding activities: Team lunch/dinner, events, and celebrations, sports clubs (football, basketball, badminton, pickleball)
  • International team with flexible working time
  • Tailor-made career path
  • Technical workshops and training courses
  • Mobility: Opportunities to be on-site abroad in our offices in over 60+ countries

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

AI DevOps Engineer

8 matching positions

Ai DevOps Engineer

The AI DevOps Engineer will play a critical role in designing, deploying, and sc...
Location
Location
United States , Los Angeles
Salary
Salary:
160000.00 - 185000.00 USD / Year
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience building and managing CI/CD pipelines (e.g., GitHub Actions or similar)
  • Hands-on experience with major cloud platforms (AWS, Azure, or GCP)
  • Experience with infrastructure automation and containerization concepts
  • Knowledge of AI/ML deployment workflows and operational best practices
  • Strong understanding of system reliability, monitoring, and observability tools
  • Experience implementing secure infrastructure and access control frameworks
  • Excellent communication skills with the ability to collaborate across technical and business teams
Job Responsibility
Job Responsibility
  • Design and implement CI/CD pipelines for AI applications and services
  • Build and manage scalable AI deployment frameworks and infrastructure
  • Establish best practices for AI operations, including monitoring, reliability, and cost optimization
  • Automate infrastructure provisioning and configuration using modern DevOps tools
  • Develop observability solutions to ensure high system performance and uptime
  • Implement secure deployment practices and maintain strong access controls
  • Operationalize AI platforms and integrate them into business workflows
  • Collaborate with cross-functional teams to identify opportunities for AI-driven scaling
  • Educate internal stakeholders on AI capabilities, tools, and operational best practices
  • Contribute to long-term infrastructure and AI platform strategy
What we offer
What we offer
  • Discretionary bonus
  • Annual bonus eligibility
  • Comprehensive health benefits including medical, dental, and vision coverage
  • Additional perks and employee-focused programs
  • Fulltime
Read More
Arrow Right

AI DevOps Engineer

Seeking a Lead AI DevOps Engineer to oversee design and delivery of advanced AI/...
Location
Location
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in DevOps/Cloud Engineering with AI/ML/GenAI project experience
  • Proven experience deploying LLMs/SLMs (model serving, inference optimization, RAG, GenAI apps)
  • Expert proficiency in Linux and macOS administration
  • Advanced Python and scripting (Bash/PowerShell) for automation and integration
  • Deep knowledge of IaC (Terraform, Ansible) and CI/CD (Azure DevOps, GitHub Actions, Jenkins)
  • Strong expertise in Azure cloud, Kubernetes, and enterprise AI/ML platforms
  • Track record in delivering secure, production-ready solutions for AI/ML/GenAI
  • Familiarity with monitoring, observability and FinOps practices
  • Excellent leadership, communication and mentoring skills
  • Fluency in written and spoken English
Job Responsibility
Job Responsibility
  • Leading architecture and deployment of AI/ML/GenAI solutions (LLM/SLM at scale)
  • Driving automation of infrastructure, model lifecycle and inference pipelines
  • Overseeing CI/CD processes for AI/ML/GenAI workloads
  • Designing secure, scalable cloud infrastructures (Azure-focused)
  • Acting as technical advisor for stakeholders and client-facing solution design
  • Mentoring engineers, promoting best practices, and fostering innovation in GenAI adoption
  • Coordinating cross-functional teams to align AI engineering with business outcomes
  • Ensuring cost optimization, monitoring and compliance across environments
What we offer
What we offer
  • Stable employment
  • “Office as an option” model
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs
  • Upskilling support
  • Internal Gallup Certified Strengths Coach to support your growth
  • Grow as we grow as a company
  • Fulltime
Read More
Arrow Right

Ai devops engineer (rpa)

As an AI DevOps Engineer, you will design, build, and evolve the AI-PDLC platfor...
Location
Location
Salary
Salary:
Not provided
coherentsolutions.com Logo
Coherent Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, SRE, or platform engineering
  • Strong experience with CI/CD pipelines (Jenkins, Harness)
  • Experience with Docker and Kubernetes
  • Experience working with AWS
  • Programming or scripting experience (Python, Java, or Bash)
  • Experience with repository and artifact management tools (e.g., Bitbucket, Nexus)
  • English proficiency at B1+ level or higher
Job Responsibility
Job Responsibility
  • Design and maintain CI/CD pipelines across the full delivery lifecycle (spec → code → test → deploy)
  • Implement and manage infrastructure as code and GitOps-based deployment strategies
  • Develop and manage AI agents that convert EARS specifications into tests, automation, and code validation
  • Integrate Claude Code, AWS Kiro, and other AI tools into development workflows
  • Improve agent accuracy, performance, and lifecycle management
  • Improve performance, reliability, and scalability of pipelines and platform components
  • Embed quality and security checks into delivery pipelines and ensure end-to-end traceability
  • Automate testing processes, including edge cases and negative scenarios
  • Build and maintain containerized environments for development and testing
  • Implement monitoring, logging, and tracing to ensure system observability
What we offer
What we offer
  • Technical and non-technical training for professional and personal growth
  • Internal conferences and meetups to learn from industry experts
  • Support and mentorship from an experienced employee to help you professional grow and development
  • Health insurance
  • English courses
  • Sports activities to promote a healthy lifestyle
  • Flexible work options, including remote and hybrid opportunities
  • Referral program for bringing in new talent
  • Work anniversary program and additional vacation days
Read More
Arrow Right

DevOps Engineer (AI)

We are looking for a DevOps Engineer (AI) to join Sopra Steria Polska and one of...
Location
Location
Poland , Katowice
Salary
Salary:
12000.00 - 16000.00 PLN / Month
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Good knowledge of IT security principles
  • Advance experience in software development (Git, Python, TypeScript)
  • AI tools for Devs (Amazon Q, Kiro, Claude), coding assistants, AI agents
  • Experience with AI development including agentic, experience with delivering AI solutions to customers at scale
  • Practical experience with IDEs (VS Code, IntelliJ)
  • Must have a strong hands-on experience in AWS
  • Infrastructure as Code (Terraform, CDK, Ansible)
  • Service management experience (ITIL or similar)
  • EU citizenship
  • Fluent English: B2/C1
Job Responsibility
Job Responsibility
  • Collaborate with team members to write efficient, reusable, and maintainable code
  • Contribute to the continuous improvement of existing services by identifying opportunities for optimization
  • Write unit tests and integration tests where necessary to increase the confidence during development and deployment of the code
  • Utilize Gitlab, CI/CD, and scripting skills to automate the deployment and analysis of code
  • Suggest and implement automation to streamline deployment, configuration, and monitoring tasks
  • Work closely with cross-functional team members, including architects, system administrators, and other developers, to ensure seamless integration and interoperability
  • Participate in agile ceremonies, including sprint planning, daily stand-ups, and sprint reviews, to foster effective collaboration within the squad
  • Engage in knowledge-sharing activities and contribute to the collective expertise of the team
  • Document code, scripts, and configurations thoroughly to ensure maintainability and knowledge transfer within the squad
  • Contribute to the creation and maintenance of technical documentation for the squad
What we offer
What we offer
  • Luxmed
  • Medicover Sport
  • Worksmile
  • educational platforms
  • languages learning platform
  • referral bonus
  • copyrights
  • life insurance
  • workation
  • certifications (paid by the company)
  • Fulltime
Read More
Arrow Right

Senior DevOps AI Engineer

We are seeking a highly experienced and technically proficient Senior DevOps Eng...
Location
Location
United States , Columbia
Salary
Salary:
150000.00 - 250000.00 USD / Year
synergyecp.com Logo
Synergy ECP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. in a relevant technical field with 12 years of experience, or M.S. in a relevant technical field with 10 years of experience
  • Advanced proficiency in DevOps principles and practices
  • Demonstrated expertise in containerization using Docker and Kubernetes
  • Proven experience in architecting and managing CI/CD pipelines
  • Extensive experience with AI model lifecycle management and maintenance
  • Familiarity with cloud platforms (AWS, Microsoft Azure) for infrastructure deployment and management
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
  • Excellent communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams
  • Ability to translate complex technical concepts into actionable engineering solutions
  • TS/SCI with CI Poly
Job Responsibility
Job Responsibility
  • Design, implement, and maintain robust infrastructure for enterprise AI applications in cloud environments (AWS, Microsoft Azure)
  • Develop and optimize engineering workflows and processes to support AI model development, deployment, and maintenance
  • Architect and manage CI/CD pipelines for continuous integration and continuous delivery of AI models and applications
  • Implement and manage containerization solutions using technologies like Docker and Kubernetes
  • Ensure efficient AI model lifecycle management, including versioning, monitoring, and scaling
  • Collaborate with AI/ML engineers and data scientists to streamline deployment processes and optimize resource utilization
  • Oversee system performance, security, and scalability of AI infrastructure
  • Continuously research and implement new DevOps tools and practices to enhance efficiency
What we offer
What we offer
  • Highly competitive compensation
  • Comprehensive Health Benefits package
  • 401K Retirement plan
  • People Partners to help navigate personal and professional worlds
  • Wellness resources
  • Company-sponsored continuing education program
  • Generous Paid Time Off
  • 11 paid holidays a year
  • Flexible work options
  • Philanthropy program participation
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer, AI

LogicMonitor® is the AI-first hybrid observability platform powering the next ge...
Location
Location
India , Pune
Salary
Salary:
Not provided
logicmonitor.com Logo
LogicMonitor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in DevOps or similar roles
  • Proven experience with AWS (preferred), and GCP in production environments
  • Strong expertise in Infrastructure as Code practices
  • Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security
  • Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems
  • Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
  • Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
  • Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
  • Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
  • Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other
Job Responsibility
Job Responsibility
  • Multi-Cloud Enablement: Expand and manage application hosting across AWS and Google Cloud, ensuring performance, flexibility, and resilience
  • Infrastructure as Code (IaC): Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments
  • Cost Optimization: Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives
  • Cloud Security: Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks
  • Observability: Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution
  • Kubernetes Management: Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience
  • Automation & Scripting: Create Python and Bash scripts to automate repetitive tasks, streamline workflows, and improve operational efficiency
Read More
Arrow Right

LLM & AI DevOps Engineer

Join our team as a DevOps Engineer specializing in Artificial Intelligence (AI) ...
Location
Location
United States , Remote
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience as a DevOps Engineer, preferably supporting AI or machine learning platforms
  • Hands-on expertise with Kubernetes (EKS, AKS, GKE, or on-prem), Docker, Terraform, and Ansible
  • Experience with monitoring/observability tools such as Grafana and Prometheus
  • Familiarity with NVIDIA GPU drivers, CUDA, and hardware provisioning for machine learning tasks
  • Proficiency in at least one scripting language (Python, Bash, etc.)
  • Cloud platform experience (AWS, GCP, Azure)
  • hybrid/on-premise a plus
  • Previous work with MLOps tools and data pipeline automation is highly desirable
  • Bachelor’s degree in Computer Science or related field, or equivalent professional experience
Job Responsibility
Job Responsibility
  • Build, automate, and manage CI/CD pipelines for deploying and maintaining AI/LLM workloads
  • Collaborate with AI engineers and data scientists to streamline model deployment, versioning, and monitoring
  • Design and maintain cloud infrastructure using Infrastructure as Code (IaC) platforms such as Terraform and Ansible
  • Orchestrate and manage containerized AI environments using Kubernetes
  • Implement robust monitoring and logging solutions utilizing Grafana and Prometheus
  • Optimize AI model inference and training workloads—especially for NVIDIA GPU-powered environments
  • Apply strict security and compliance standards for all infrastructure components
  • Diagnose and resolve production issues, continuously improving reliability and scalability of AI services
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Senior DevOps Engineer (AI & Cloud Infrastructure)

We are seeking a Senior DevOps Engineer to design, deploy, and operate the next ...
Location
Location
United States , Palo Alto
Salary
Salary:
175000.00 - 250000.00 USD / Year
inflection.ai Logo
Inflection AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in DevOps, Site Reliability Engineering, or ML Infrastructure supporting high-scale, production systems
  • Deep expertise in Azure and AWS, including storage, compute, networking, databases, and cloud-native monitoring services
  • Strong Kubernetes administration experience, including GPU scheduling, operator deployment, and management of core infrastructure components
  • experience with Slurm is highly desirable
  • Proven experience deploying, scaling, and operating Large Language Models (LLMs) and inference engines such as vLLM, TGI, or Triton
  • Strong experience with modern DevOps tooling: Terraform, Helm, Kustomize, ArgoCD, GitHub Actions or GitLab CI, Prometheus, Grafana, and Clickhouse
  • Advanced scripting and automation skills in Python and Bash, with the ability to debug complex distributed systems and optimize performance at scale
  • Demonstrated ability to troubleshoot LLM servers, Kubernetes workloads, GPU utilization, and cloud infrastructure bottlenecks
  • Have a bachelor’s degree or equivalent in a related field to the offered position requirements.
Job Responsibility
Job Responsibility
  • Architect, deploy, and operate large-scale LLM inference servers and AI applications with a focus on low latency, high availability, and production reliability
  • Design, provision, and maintain complex cloud architectures across Azure and AWS, including storage, compute, networking, databases, and native LLM services
  • Manage GPU-enabled Kubernetes clusters and Slurm-based HPC environments, optimizing resource allocation for AI training and inference workloads
  • Deploy and operate core Kubernetes infrastructure components and operators (GPU operators, ingress controllers, service meshes, CNIs, CSIs, and storage drivers)
  • Build scalable infrastructure-as-code and deployment workflows using Terraform, Helm, Kustomize, ArgoCD, and GitOps best practices
  • Design and maintain centralized observability systems using Prometheus, Grafana, Clickhouse, and cloud-native monitoring tools
  • Participate in on-call rotations, lead incident response, perform post-mortems, and continuously improve system reliability and SLAs.
What we offer
What we offer
  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
  • Meaningful equity component.
  • Fulltime
Read More
Arrow Right