CrawlJobs Logo

Senior DevOps AI Engineer

United States, Columbia 150000.00 - 250000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

We are seeking a highly experienced and technically proficient Senior DevOps Engineer to play an integral role in our team, focusing on deploying infrastructure and engineering workflows and processes to support enterprise AI rollouts. This position requires deep expertise in DevOps principles, including containerization, CI/CD pipeline architecture, and AI model lifecycle management. The ideal candidate will be adept at ensuring robust, scalable, and efficient deployment and maintenance of AI applications at an enterprise scale.

Job Responsibility

  • Design, implement, and maintain robust infrastructure for enterprise AI applications in cloud environments (AWS, Microsoft Azure)
  • Develop and optimize engineering workflows and processes to support AI model development, deployment, and maintenance
  • Architect and manage CI/CD pipelines for continuous integration and continuous delivery of AI models and applications
  • Implement and manage containerization solutions using technologies like Docker and Kubernetes
  • Ensure efficient AI model lifecycle management, including versioning, monitoring, and scaling
  • Collaborate with AI/ML engineers and data scientists to streamline deployment processes and optimize resource utilization
  • Oversee system performance, security, and scalability of AI infrastructure
  • Continuously research and implement new DevOps tools and practices to enhance efficiency

Requirements

  • B.S. in a relevant technical field with 12 years of experience, or M.S. in a relevant technical field with 10 years of experience
  • Advanced proficiency in DevOps principles and practices
  • Demonstrated expertise in containerization using Docker and Kubernetes
  • Proven experience in architecting and managing CI/CD pipelines
  • Extensive experience with AI model lifecycle management and maintenance
  • Familiarity with cloud platforms (AWS, Microsoft Azure) for infrastructure deployment and management
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
  • Excellent communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams
  • Ability to translate complex technical concepts into actionable engineering solutions
  • TS/SCI with CI Poly
  • U.S. Citizenship

Nice to have

  • Experience with infrastructure as code (IaC) tools (e.g., Terraform, Ansible)
  • Understanding of machine learning concepts and their implications for infrastructure
  • Continuous learning mindset to stay abreast of cutting-edge DevOps and AI advancements

What we offer

  • Highly competitive compensation
  • Comprehensive Health Benefits package
  • 401K Retirement plan
  • People Partners to help navigate personal and professional worlds
  • Wellness resources
  • Company-sponsored continuing education program
  • Generous Paid Time Off
  • 11 paid holidays a year
  • Flexible work options
  • Philanthropy program participation
  • Great corporate facilities (weekly happy hour, café, collaborative space)
  • SkillBridge Program for servicemembers

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior DevOps AI Engineer

8 matching positions

Senior DevOps Engineer, AI

LogicMonitor® is the AI-first hybrid observability platform powering the next ge...
Location
Location
India , Pune
Salary
Salary:
Not provided
logicmonitor.com Logo
LogicMonitor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in DevOps or similar roles
  • Proven experience with AWS (preferred), and GCP in production environments
  • Strong expertise in Infrastructure as Code practices
  • Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security
  • Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems
  • Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
  • Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
  • Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
  • Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
  • Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other
Job Responsibility
Job Responsibility
  • Multi-Cloud Enablement: Expand and manage application hosting across AWS and Google Cloud, ensuring performance, flexibility, and resilience
  • Infrastructure as Code (IaC): Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments
  • Cost Optimization: Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives
  • Cloud Security: Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks
  • Observability: Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution
  • Kubernetes Management: Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience
  • Automation & Scripting: Create Python and Bash scripts to automate repetitive tasks, streamline workflows, and improve operational efficiency
Read More
Arrow Right

Senior DevOps Engineer (AI & Cloud Infrastructure)

We are seeking a Senior DevOps Engineer to design, deploy, and operate the next ...
Location
Location
United States , Palo Alto
Salary
Salary:
175000.00 - 250000.00 USD / Year
inflection.ai Logo
Inflection AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in DevOps, Site Reliability Engineering, or ML Infrastructure supporting high-scale, production systems
  • Deep expertise in Azure and AWS, including storage, compute, networking, databases, and cloud-native monitoring services
  • Strong Kubernetes administration experience, including GPU scheduling, operator deployment, and management of core infrastructure components
  • experience with Slurm is highly desirable
  • Proven experience deploying, scaling, and operating Large Language Models (LLMs) and inference engines such as vLLM, TGI, or Triton
  • Strong experience with modern DevOps tooling: Terraform, Helm, Kustomize, ArgoCD, GitHub Actions or GitLab CI, Prometheus, Grafana, and Clickhouse
  • Advanced scripting and automation skills in Python and Bash, with the ability to debug complex distributed systems and optimize performance at scale
  • Demonstrated ability to troubleshoot LLM servers, Kubernetes workloads, GPU utilization, and cloud infrastructure bottlenecks
  • Have a bachelor’s degree or equivalent in a related field to the offered position requirements.
Job Responsibility
Job Responsibility
  • Architect, deploy, and operate large-scale LLM inference servers and AI applications with a focus on low latency, high availability, and production reliability
  • Design, provision, and maintain complex cloud architectures across Azure and AWS, including storage, compute, networking, databases, and native LLM services
  • Manage GPU-enabled Kubernetes clusters and Slurm-based HPC environments, optimizing resource allocation for AI training and inference workloads
  • Deploy and operate core Kubernetes infrastructure components and operators (GPU operators, ingress controllers, service meshes, CNIs, CSIs, and storage drivers)
  • Build scalable infrastructure-as-code and deployment workflows using Terraform, Helm, Kustomize, ArgoCD, and GitOps best practices
  • Design and maintain centralized observability systems using Prometheus, Grafana, Clickhouse, and cloud-native monitoring tools
  • Participate in on-call rotations, lead incident response, perform post-mortems, and continuously improve system reliability and SLAs.
What we offer
What we offer
  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
  • Meaningful equity component.
  • Fulltime
Read More
Arrow Right
New

Senior Full Stack & Devops Engineer - Ai Solutions

Are you ready to shape the development of applied enterprise AI within a highly ...
Location
Location
Australia , Sydney
Salary
Salary:
185000.00 AUD / Year
https://www.randstad.com Logo
Randstad
Expiration Date
July 24, 2026
Flip Icon
Requirements
Requirements
  • Strong full stack engineering capability across cloud networks using Azure
  • paired with competent SQL
  • Polyglot coding skills in at least two of these including TypeScript
  • Python
  • Java
  • or C#
Job Responsibility
Job Responsibility
  • Full Stack Development: Architect full stack web and data applications on cloud networks using services, functions, and container tools
  • AI Integration: Productionise modern GenAI patterns, prompt engineering, context management, and custom connectors
  • DevSecOps Architecture: Build and manage robust CI/CD pipelines, Infrastructure as Code (IaC), secret management, and release governance
  • System Re-Engineering: Take business-built prototypes and structurally re-engineer them into secure, observable, and enterprise-grade services
What we offer
What we offer
  • Work with a modern AI stack
  • Shape enterprise-scale architecture
  • Cooperative & innovative culture
  • Clear pathways for career growth
  • Super and Bonus
Read More
Arrow Right

Senior Java/Kotlin Engineer (AI-Driven DevOps & Automation)

We are looking for a Senior Java/Kotlin Engineer who goes beyond traditional dev...
Location
Location
Colombia
Salary
Salary:
Not provided
parserdigital.com Logo
Parser Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in Java and/or Kotlin backend development
  • Solid understanding of software design, APIs, and distributed systems
  • Experience with CI/CD pipelines and DevOps practices
  • Hands-on experience with: Static code analysis tools
  • Dependency management and security remediation
  • Familiarity with AI-assisted coding tools (e.g., Claude, GitHub Copilot, etc.)
  • Experience working with Git-based workflows and multi-repo environments
Job Responsibility
Job Responsibility
  • Backend Development: Design, build, and maintain scalable backend services using Java/Kotlin
  • Deliver production-ready features with high quality and performance standards
  • Collaborate with product and engineering teams to translate requirements into technical solutions
  • AI-Driven DevOps & Automation: Use Claude (or similar agentic AI tools) to identify and fix vulnerabilities
  • Automate code improvements across repositories
  • Generate and maintain unit and integration tests using AI from code context and diffs
  • Continuously improve CI/CD workflows using AI-assisted processes
  • AI Readiness & Engineering Enablement: Improve AI readiness of repositories: clean architecture, modular structure, clear interfaces and contracts, type safety and documentation for LLM consumption
  • Build guardrails for AI usage: prompt design and versioning, output validation and consistency checks, safe code generation practices
What we offer
What we offer
  • The chance to work in innovative projects with leading brands that use the latest technologies that fuel transformation
  • The opportunity to be part of an amazing, multicultural community of tech experts
  • A competitive compensation package and medical insurance
  • A flexible working environment
  • Fulltime
Read More
Arrow Right

Senior Java Engineer – Agentic AI Driven Development - Senior Vice President

The Applications Development Technology Senior Lead Analyst is a senior-level po...
Location
Location
Canada , Mississauga
Salary
Salary:
145100.00 - 217700.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Core Java - Strong understanding of Java (JDK 8+, preferably Java 11/17), including multithreading, collections, garbage collection, and JVM internals
  • Frameworks - Extensive experience with Spring Framework (Spring Boot, Spring MVC, Spring Data JPA, Spring Security)
  • Middleware - Proven experience in designing and developing RESTful APIs and microservices
  • Relational Databases - Strong proficiency in SQL and experience with Oracle databases, including schema design, query optimization, and stored procedures
  • NoSQL Databases - Experience with MongoDB, including data modeling, querying, and performance tuning
  • CI/CD & DevOps - Hands-on experience with CI/CD tools and practices (e.g., Jenkins, GitLab CI, GitHub Actions, Maven/Gradle, Docker, Kubernetes)
  • Version Control - Proficiency with Git and standard branching strategies (e.g., Gitflow)
  • Testing - Experience with unit testing frameworks (JUnit, Mockito) and integration testing
  • Web Technologies (Beneficial) - Familiarity with web services (SOAP/REST), XML, JSON
  • AI Tools & Methodologies - Demonstrable exposure and practical experience with AI development tools such as Devin, GitHub Copilot, Claude, Anti Gravity, and Codex
Job Responsibility
Job Responsibility
  • Lead the design, development, and implementation of complex middleware applications using Java and Spring Boot
  • Architect and optimize database interactions with Oracle, SQL, and MongoDB, ensuring high performance and data integrity
  • Drive the adoption and continuous improvement of CI/CD pipelines to facilitate rapid and reliable software delivery
  • Collaborate with cross-functional teams, including product management, QA, and operations, to define requirements, design solutions, and deliver high-quality software
  • Mentor and provide technical guidance to junior and mid-level software engineers, fostering a culture of technical excellence and continuous learning
  • Actively research and experiment with AI technologies to identify opportunities for enhancing developer productivity, automating tasks, and improving software quality
  • Participate in code reviews, ensuring adherence to coding standards, best practices, and architectural guidelines
  • Troubleshoot and resolve complex technical issues, ensuring the stability and performance of production systems
  • Contribute to the strategic planning and technical roadmap for our middleware platforms
  • Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, and model development
What we offer
What we offer
  • Discover the top benefits offered to our global workforce, designed to support your well-being, growth and work-life balance. Explore a few of the highlights that make working with us rewarding.
  • Fulltime
Read More
Arrow Right

Senior AI Engineer – Microsoft Fabric & Azure AI Foundry

We are looking for an experienced AI Engineer to lead the implementation of Azur...
Location
Location
United States , New York City
Salary
Salary:
160000.00 - 220000.00 USD / Year
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in cloud engineering, AI engineering, or data platform architecture
  • Strong hands-on experience with: Microsoft Fabric, Azure AI Foundry, Azure OpenAI, Azure Machine Learning, Azure Data Services
  • Experience integrating AI workloads into enterprise analytics platforms
  • Proficiency in Python and/or C#
  • Experience with REST APIs, SDKs, and AI orchestration frameworks
  • Knowledge of: Vector databases, Retrieval-Augmented Generation (RAG), Prompt engineering, Model evaluation and monitoring
  • Familiarity with DevOps practices including GitHub Actions or Azure DevOps
  • Strong understanding of enterprise security and governance
Job Responsibility
Job Responsibility
  • Design and implement AI solutions using Microsoft Azure AI Foundry within an existing Microsoft Fabric architecture
  • Integrate AI services with Fabric components including: Data Factory, OneLake, Power BI, Lakehouse and Warehouse environments, Real-Time Analytics
  • Build and operationalize generative AI and machine learning workflows
  • Configure and manage: Azure AI Services, Azure OpenAI, Model deployment pipelines, Prompt orchestration and evaluation
  • Establish secure connectivity between Azure AI Foundry and enterprise data sources
  • Implement governance, RBAC, security, compliance, and cost management controls
  • Develop reusable AI pipelines, APIs, and automation frameworks
  • Collaborate with platform teams to ensure scalability, observability, and production readiness
  • Support CI/CD and Infrastructure-as-Code deployment patterns
  • Provide technical leadership and documentation for AI platform adoption
What we offer
What we offer
  • Flexibility, with remote and hybrid work options (country-dependent)
  • Career advancement, with international mobility and professional development programs
  • Learning and development, with access to cutting-edge tools, training and industry experts
  • Medical, dental, and vision insurance for you and your family, plus employer contributions to Health Savings Accounts
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer

We are seeking a highly skilled and strategic DevOps Engineer for a high-impact ...
Location
Location
Canada , Vancouver
Salary
Salary:
58.39 - 77.85 USD / Hour
https://www.randstad.com Logo
Randstad
Expiration Date
August 03, 2026
Flip Icon
Requirements
Requirements
  • 10+ years of progressive experience in DevOps, systems engineering, or infrastructure-focused roles, ideally with a background in mid-tier to large enterprise environments
  • Senior-level expertise in cloud platforms, with a strong focus on AWS architecture and organization
  • Deep understanding of IaC principles and hands-on scripting experience utilizing PowerShell, Terraform, or CloudFormation
  • Solid technical understanding of containerized environments, preferably utilizing Kubernetes
  • Mastery of continuous integration and continuous deployment tools (such as GitLab CI, Jenkins, or GitHub) alongside GitOps CD tools like ArgoCD
  • Demonstrated experience working with system monitoring and observability suites, specifically managing platforms like Splunk and Datadog
  • Adaptability
  • Exceptional flexibility and a resilient mindset
  • comfortable pivoting quickly alongside evolving project requirements and new technical directions
  • Outstanding articulation skills with the ability to explain complex technical challenges and system progress clearly to cross-functional partners
Job Responsibility
Job Responsibility
  • Drive the transition toward fully automated infrastructure by evaluating architecture bottlenecks and implementing scalable Infrastructure as Code (IaC) architectures
  • Design, implement, and manage secure deployment pipelines to enable fast, reliable, and repeatable application releases across multiple active environments
  • Architect and manage scalable, fault-tolerant, and cost-optimized cloud infrastructure primarily utilizing AWS services
  • Support the platform migration from Splunk to Datadog, establishing proactive monitoring frameworks, dashboards, alerting rules, and system reliability thresholds
  • Lead the troubleshooting of complex production issues, coordinate root-cause analysis, and implement long-term engineering fixes to eliminate operational toil
  • Incorporate security best practices, access controls, secrets management, and enterprise governance standards into automated delivery pipelines
  • Partner with software development, quality assurance, and product management teams to communicate technical progress, share best practices, and deliver unified infrastructure goals
What we offer
What we offer
  • High Likelihood of Extension
  • Drive Significant Transformation
  • Modern Tooling & Cloud Stack
  • Innovative & Adaptive Culture
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer

We are seeking an experienced Senior DevOps Engineer to support a major cloud mo...
Location
Location
United States , Austin
Salary
Salary:
Not provided
mmcgrp.com Logo
MMC Group LP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of hands-on experience with DevOps tools and practices, including Azure DevOps, Git, and CI/CD pipelines
  • 8+ years of experience administering Microsoft Azure cloud environments
  • Extensive experience administering and optimizing Snowflake, including: Role-Based Access Control (RBAC), Security and governance, Warehouse management, Performance tuning
  • Advanced Python scripting skills for automation and cloud engineering tasks
  • Experience with containerization technologies and Kubernetes, preferably Azure Kubernetes Service (AKS)
  • Strong understanding of enterprise data pipelines, ETL processes, and cloud-based data integration
  • Experience integrating AI and machine learning capabilities into cloud platforms
  • Hands-on experience with Azure Monitor, Log Analytics, centralized logging, alerting, dashboards, and operational monitoring
  • Strong knowledge of monitoring, logging, and observability best practices
  • Excellent troubleshooting, analytical, and communication skills
Job Responsibility
Job Responsibility
  • Design, implement, and maintain enterprise DevOps pipelines using Azure DevOps, Git, and CI/CD best practices
  • Build and manage scalable cloud infrastructure within the Microsoft Azure ecosystem
  • Administer and optimize enterprise Snowflake environments, including security, governance, and performance tuning
  • Implement Role-Based Access Control (RBAC), least privilege access, and enterprise security controls
  • Develop Python automation scripts to streamline deployments, infrastructure management, monitoring, and operational support
  • Build, deploy, and support enterprise data pipelines and cloud-based ETL processes
  • Deploy and manage containerized workloads using Kubernetes, with Azure Kubernetes Service (AKS) preferred
  • Integrate AI capabilities into cloud data platforms, including Snowflake Cortex and AI-enabled data pipelines
  • Configure Azure Monitor, Log Analytics, dashboards, alerts, and centralized logging to improve observability and operational health
  • Monitor cloud performance, optimize infrastructure costs, and improve platform reliability
What we offer
What we offer
  • Medical, dental, and vision coverage
  • Life and disability insurance
  • Additional voluntary benefits
  • Fulltime
Read More
Arrow Right