CrawlJobs Logo

LLM & AI DevOps Engineer

https://www.roberthalf.com Logo

Robert Half

Location Icon

Location:
United States , Remote

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join our team as a DevOps Engineer specializing in Artificial Intelligence (AI) and Large Language Model (LLM) infrastructure. You will play a critical role in architecting, deploying, and optimizing scalable AI platforms using modern DevOps practices and state-of-the-art tools.

Job Responsibility:

  • Build, automate, and manage CI/CD pipelines for deploying and maintaining AI/LLM workloads
  • Collaborate with AI engineers and data scientists to streamline model deployment, versioning, and monitoring
  • Design and maintain cloud infrastructure using Infrastructure as Code (IaC) platforms such as Terraform and Ansible
  • Orchestrate and manage containerized AI environments using Kubernetes
  • Implement robust monitoring and logging solutions utilizing Grafana and Prometheus
  • Optimize AI model inference and training workloads—especially for NVIDIA GPU-powered environments
  • Apply strict security and compliance standards for all infrastructure components
  • Diagnose and resolve production issues, continuously improving reliability and scalability of AI services

Requirements:

  • Proven experience as a DevOps Engineer, preferably supporting AI or machine learning platforms
  • Hands-on expertise with Kubernetes (EKS, AKS, GKE, or on-prem), Docker, Terraform, and Ansible
  • Experience with monitoring/observability tools such as Grafana and Prometheus
  • Familiarity with NVIDIA GPU drivers, CUDA, and hardware provisioning for machine learning tasks
  • Proficiency in at least one scripting language (Python, Bash, etc.)
  • Cloud platform experience (AWS, GCP, Azure)
  • hybrid/on-premise a plus
  • Previous work with MLOps tools and data pipeline automation is highly desirable
  • Bachelor’s degree in Computer Science or related field, or equivalent professional experience

Nice to have:

  • Previous work with MLOps tools and data pipeline automation is highly desirable
  • Cloud platform experience (AWS, GCP, Azure)
  • hybrid/on-premise a plus
What we offer:
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan

Additional Information:

Job Posted:
January 29, 2026

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM & AI DevOps Engineer

Senior Software Engineer - Build AI Tools

This role sits within the newly formed GenAI Security team, which is responsible...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly motivated self-starter with excellent interpersonal and problem-solving skills
  • Bachelor’s degree or equivalent work experience
  • Good oral and written communication skills
  • Significant relevant industry work experience
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Expertise in a major programming language such as Python and/or Go, and associated tooling (Git, Maven, IDEs, Jenkins, Bitbucket etc)
  • Expertise in designing and implementing secure APIs and libraries
  • Experience in Generative AI, LLM frameworks, LLM prompt engineering and/or adversarial testing is a bonus
  • Experience with Cyber engineering and Operations, which could include DevSecOps or MLSecOps
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
Job Responsibility
Job Responsibility
  • Designing, developing, optimizing, and enhancing a GenAI prompt security platform to protect firm AI/LLM-based applications from adversarial attacks and prompt injections
  • Building and automating a security testing framework to validate protection mechanisms for various LLM use cases
  • Owning solutions that are expected to operate and perform at scale across the organisation
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation, across different time zones
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Forward Deployed Engineer (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
Canada
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 3+ years of experience in software development, AI/ML engineering, or system integration
  • Proficiency in Python and Golang, with the ability to write clean, efficient code
  • Familiarity with AI/ML concepts
  • Hands-on experience with large language models (LLMs), and prompt engineering techniques are strongly preferred
  • Strong understanding of general AI agent frameworks, function calling, and retrieval-augmented generation (RAG)
  • Hands-on experience of building such a system is strongly preferred
  • Experience with cloud platforms (AWS, GCP, or Azure) and DevOps practices (CI/CD, containerization, monitoring)
  • Hands-on experience with integrating systems via APIs, webhooks, and data pipelines
  • Excellent communication and project management skills
Job Responsibility
Job Responsibility
  • Develop, configure, deploy, and optimize AI agents using Cresta’s AI platform and tools
  • Build AI agent integrations with external systems (APIs, databases, CRMs, etc.) to ensure seamless workflow integration
  • Optimize AI agent performance (e.g. fine-tune prompts and configurations) and troubleshoot issues in complex enterprise environments
  • Collaborate with customers and internal stakeholders to gather technical requirements and translate business needs into AI Agent solutions
  • Conduct interactive demos and present compelling proof-of-concepts to prospective customers, proactively gather feedback, and iteratively refine solutions to meet objectives
  • Define project milestones, create implementation plans, and coordinate execution with internal teams to ensure on-time delivery
  • Provide a tight feedback loop to our product and engineering teams — identifying gaps, building custom tooling, and influencing the roadmap through real-world deployment learnings
  • Collaborate with PMs to define agent goals, iterate rapidly based on customer feedback, and shape product capabilities that maximize customer ROI
  • Serve as a trusted technical advisor for the customer, guiding best practices for AI agent adoption and usage
  • Provide technical guidance on AI agent best practices, including architecture design, security considerations, and scalability planning
What we offer
What we offer
  • We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
  • Paid parental leave to support you and your family
  • Monthly Health & Wellness allowance
  • Work from home office stipend to help you succeed in a remote environment
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
  • Fulltime
Read More
Arrow Right

Forward Deployed Engineer (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
United States
Salary
Salary:
150000.00 - 250000.00 USD / Year
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 3+ years of experience in software development, AI/ML engineering, or system integration
  • Proficiency in Python and Golang
  • Familiarity with AI/ML concepts
  • Hands-on experience with large language models (LLMs) and prompt engineering techniques are strongly preferred
  • Strong understanding of general AI agent frameworks, function calling, and retrieval-augmented generation (RAG)
  • Hands-on experience of building such a system is strongly preferred
  • Experience with cloud platforms (AWS, GCP, or Azure) and DevOps practices (CI/CD, containerization, monitoring)
  • Hands-on experience with integrating systems via APIs, webhooks, and data pipelines
  • Excellent communication and project management skills
Job Responsibility
Job Responsibility
  • Develop, configure, deploy, and optimize AI agents using Cresta’s AI platform and tools
  • Build AI agent integrations with external systems (APIs, databases, CRMs, etc.) to ensure seamless workflow integration
  • Optimize AI agent performance (e.g. fine-tune prompts and configurations) and troubleshoot issues in complex enterprise environments
  • Collaborate with customers and internal stakeholders to gather technical requirements and translate business needs into AI Agent solutions
  • Conduct interactive demos and present compelling proof-of-concepts to prospective customers, proactively gather feedback, and iteratively refine solutions to meet objectives
  • Define project milestones, create implementation plans, and coordinate execution with internal teams to ensure on-time delivery
  • Provide a tight feedback loop to our product and engineering teams — identifying gaps, building custom tooling, and influencing the roadmap through real-world deployment learnings
  • Collaborate with PMs to define agent goals, iterate rapidly based on customer feedback, and shape product capabilities that maximize customer ROI
  • Serve as a trusted technical advisor for the customer, guiding best practices for AI agent adoption and usage
  • Provide technical guidance on AI agent best practices, including architecture design, security considerations, and scalability planning
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan
  • Remote work setup budget
  • Monthly wellness and communication stipend
  • In-office meal program and commuter benefits provided for onsite employees
  • Equity
  • Fulltime
Read More
Arrow Right

AI Engineer

As an AI Engineer, you bring traditional and Generative AI into real world use c...
Location
Location
Belgium , Brussels/Flanders
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 3 years of experience as a Machine Learning Engineer, Data Scientist, MLOps Engineer, or in a similar position
  • Experience with traditional NLP, LLMs, and conversational AI, both in experimentation phase and in production
  • Experience with vector databases for semantic search and RAG solutions in a production environment
  • Strong software engineering background and ability to write production ready code in Python
  • Experience with experiment tracking, models monitoring, LLM and NLP evaluation techniques, and deployment strategies
  • Comfortable with the machine learning lifecycle and MLOps and DevOps principles
  • Experience with at least one cloud provider and good knowledge of the data ecosystem
  • Able to coach others, give technical advice and direction, and work independently
  • Master or PhD in Machine Learning, Artificial Intelligence, Computer Engineering, or related field
  • Proficient in English, knowledge of Dutch and/or French is a plus.
Job Responsibility
Job Responsibility
  • Design and implement solutions that require from traditional AI to LLMs, from semantic search to conversational AI
  • Train, fine-tune, improve, and deploy ML models
  • Write production ready code to serve online, batch, and real time models
  • Build applications and software to serve AI driven use cases
  • Work in close collaboration with Data Scientists, MLOps Engineers and Data Engineers to integrate all parts of the solution
  • Help build solutions and/or operate with clients on a medium to long-term basis.
What we offer
What we offer
  • Mobility options (including a company car)
  • insurance coverage
  • meal vouchers
  • eco-cheques
  • continuous learning opportunities through the Sopra Steria Academy
  • opportunity to connect with fellow Sopra Steria colleagues at various team events.
Read More
Arrow Right

Middle Python Engineer

Our client is providing cloud-based software solutions for the professional and ...
Location
Location
Poland; Croatia
Salary
Salary:
Not provided
eleks.com Logo
ELEKS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional experience in Python backend development
  • Practical experience with LLMs / AI integrations (OpenAI, Azure OpenAI, LangChain, Hugging Face, or similar)
  • Hands-on experience with Azure Cloud in preferred
  • Ability to read and understand .NET (C#) codebase (no coding required)
  • Upper-Intermediate English level or higher
Job Responsibility
Job Responsibility
  • Design, develop, and maintain Python-based components supporting LLM functionalities
  • Integrate LLMs and AI services into the existing .NET-based product
  • Work with Azure services to deploy and scale AI-driven solutions
  • Contribute to CI/CD and DevOps processes (migration from Jenkins to Azure DevOps)
  • Collaborate with cross-functional teams to ensure smooth integration and delivery
  • Participate in architecture and design discussions, proposing improvements for performance and scalability
  • Support and maintain existing chatbot functionality based on Python
What we offer
What we offer
  • Close cooperation with a customer
  • Challenging tasks
  • Competence development
  • Team of professionals
  • Dynamic environment with low level of bureaucracy
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

GenAI Prompt Security Engineer

This role sits within the newly formed GenAI Security team, which is responsible...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly motivated self-starter with excellent interpersonal and problem-solving skills
  • Bachelor’s degree or equivalent work experience
  • Good oral and written communication skills
  • Significant relevant industry work experience
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Expertise in a major programming language such as Python and/or Go, and associated tooling (Git, Maven, IDEs, Jenkins, Bitbucket etc)
  • Expertise in designing and implementing secure APIs and libraries
  • Experience in Generative AI, LLM frameworks, LLM prompt engineering and/or adversarial testing is a bonus
  • Experience with Cyber engineering and Operations, which could include DevSecOps or MLSecOps
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
Job Responsibility
Job Responsibility
  • Designing, developing, optimizing, and enhancing a GenAI prompt security platform to protect firm AI/LLM-based applications from adversarial attacks and prompt injections
  • Building and automating a security testing framework to validate protection mechanisms for various LLM use cases
  • Owning solutions that are expected to operate and perform at scale across the organisation
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation, across different time zones
  • Fulltime
Read More
Arrow Right

Engineering Manager for Observability/CI/CD and Cloud

Lead the AI-Driven Evolution of Groupon’s Global Engineering Platform. At Groupo...
Location
Location
Dublin; Madrid; Prague; Valencia; Warsaw
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years’ experience leading infrastructure, DevOps, or SRE teams (5+ people), ideally in high-change, scale-up environments
  • Deep technical expertise in cloud-native platforms, observability, infrastructure as code, and CI/CD tooling
  • Proven success operationalizing AI tools within engineering workflows
  • Strategic, resilient, and pragmatic approach: ready to own results and thrive under shifting priorities
  • Exceptional communication: able to simplify complexity and effectively partner with C-level and global teams
  • Bachelor’s or Master’s in Computer Science (or similar)—or equivalent industry experience
Job Responsibility
Job Responsibility
  • Lead & Inspire: Build and mentor a high-performing, globally distributed team of CI/CD and Observability engineers (5-10 direct reports), coaching them in cutting-edge AI-assisted workflows and best practices
  • Modernize Core Infrastructure: Spearhead the migration from legacy platforms (Jenkins, ELK) to cloud-native solutions (GitHub Actions, Google Cloud Logging, GCP Prometheus/Grafana). Eliminate “straggler” pipelines and drive cost-efficient, reliable operations
  • AI-First Engineering: Operationalize AI tools (Claude Code, Copilot, ChatGPT, etc.) for everything from log analysis and incident summaries to automated infrastructure as code, making AI-augmented engineering a daily norm
  • Architect & Optimize: Oversee a hybrid tech stack (Kubernetes, Envoy, Terraform, GCP, AWS), ensuring platforms are fast, scalable, and “self-healing” via LLM integrations
  • Collaborate Globally: Act as a thought leader and cross-functional partner, advocating for AI-driven developer experience and collaborating with leaders in SRE, Product, and Cloud
  • Drive Transformation: Deliver strategic projects with tight deadlines and direct business impact, such as the Jenkins-to-GHA and ELK-to-GCP migrations, while maintaining a high standard of technical excellence and cost efficiency
What we offer
What we offer
  • Drive real, high-visibility change at the heart of a company undergoing major transformation
  • Work on complex technical and operational challenges in a fast-paced, AI-first environment
  • Accelerate your impact—and your team’s—using industry-leading AI and automation tools
  • Influence engineering practices across a global platform impacting millions of users
Read More
Arrow Right