CrawlJobs Logo

LLM & AI DevOps Engineer

https://www.roberthalf.com Logo

Robert Half

Location Icon

Location:
United States , Remote

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join our team as a DevOps Engineer specializing in Artificial Intelligence (AI) and Large Language Model (LLM) infrastructure. You will play a critical role in architecting, deploying, and optimizing scalable AI platforms using modern DevOps practices and state-of-the-art tools.

Job Responsibility:

  • Build, automate, and manage CI/CD pipelines for deploying and maintaining AI/LLM workloads
  • Collaborate with AI engineers and data scientists to streamline model deployment, versioning, and monitoring
  • Design and maintain cloud infrastructure using Infrastructure as Code (IaC) platforms such as Terraform and Ansible
  • Orchestrate and manage containerized AI environments using Kubernetes
  • Implement robust monitoring and logging solutions utilizing Grafana and Prometheus
  • Optimize AI model inference and training workloads—especially for NVIDIA GPU-powered environments
  • Apply strict security and compliance standards for all infrastructure components
  • Diagnose and resolve production issues, continuously improving reliability and scalability of AI services

Requirements:

  • Proven experience as a DevOps Engineer, preferably supporting AI or machine learning platforms
  • Hands-on expertise with Kubernetes (EKS, AKS, GKE, or on-prem), Docker, Terraform, and Ansible
  • Experience with monitoring/observability tools such as Grafana and Prometheus
  • Familiarity with NVIDIA GPU drivers, CUDA, and hardware provisioning for machine learning tasks
  • Proficiency in at least one scripting language (Python, Bash, etc.)
  • Cloud platform experience (AWS, GCP, Azure)
  • hybrid/on-premise a plus
  • Previous work with MLOps tools and data pipeline automation is highly desirable
  • Bachelor’s degree in Computer Science or related field, or equivalent professional experience

Nice to have:

  • Previous work with MLOps tools and data pipeline automation is highly desirable
  • Cloud platform experience (AWS, GCP, Azure)
  • hybrid/on-premise a plus
What we offer:
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan

Additional Information:

Job Posted:
January 29, 2026

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for LLM & AI DevOps Engineer

Principal AI Engineer

At JFrog, we’re reinventing DevOps to help the world’s greatest companies innova...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A bachelor's degree or higher in Computer Science, Data Science, or a related field
  • Proven experience in software development
  • Proficiency in LLM-related tools, processes, and frameworks, including OpenAI Models and APIs, Hugging Face Transformers, LangChain, vector databases, and prompt management tools like PromptPerfect/PromptBase and Guardrails
  • Experience with cloud platforms, such as AWS, Google Cloud, or Azure
  • Proficiency in Python programming
  • Experience deploying LLM-based applications in a production environment
  • Excellent problem-solving and analytical skills
  • Experience with CI / CD tools
  • Strong communication skills and the ability to collaborate effectively in a team
Job Responsibility
Job Responsibility
  • Recommend and test agentic productivity tools
  • Collaborate with key organizational stakeholders to understand AI requirements and design end-to-end AI productivity solutions
  • Explore and experiment with novel ML and AI techniques and architectures to drive DevX and productivity innovation
  • Evaluate and recommend ML and AI tools and frameworks to enhance productivity and effectiveness
  • Provide technical guidance and mentorship to development teams on AI and ML technologies and practices
  • Define meaningful KPIs and closely monitor cost
Read More
Arrow Right

Senior Software Engineer - Build AI Tools

This role sits within the newly formed GenAI Security team, which is responsible...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly motivated self-starter with excellent interpersonal and problem-solving skills
  • Bachelor’s degree or equivalent work experience
  • Good oral and written communication skills
  • Significant relevant industry work experience
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Expertise in a major programming language such as Python and/or Go, and associated tooling (Git, Maven, IDEs, Jenkins, Bitbucket etc)
  • Expertise in designing and implementing secure APIs and libraries
  • Experience in Generative AI, LLM frameworks, LLM prompt engineering and/or adversarial testing is a bonus
  • Experience with Cyber engineering and Operations, which could include DevSecOps or MLSecOps
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
Job Responsibility
Job Responsibility
  • Designing, developing, optimizing, and enhancing a GenAI prompt security platform to protect firm AI/LLM-based applications from adversarial attacks and prompt injections
  • Building and automating a security testing framework to validate protection mechanisms for various LLM use cases
  • Owning solutions that are expected to operate and perform at scale across the organisation
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation, across different time zones
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Forward Deployed Engineer (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
Canada
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 3+ years of experience in software development, AI/ML engineering, or system integration
  • Proficiency in Python and Golang, with the ability to write clean, efficient code
  • Familiarity with AI/ML concepts
  • Hands-on experience with large language models (LLMs), and prompt engineering techniques are strongly preferred
  • Strong understanding of general AI agent frameworks, function calling, and retrieval-augmented generation (RAG)
  • Hands-on experience of building such a system is strongly preferred
  • Experience with cloud platforms (AWS, GCP, or Azure) and DevOps practices (CI/CD, containerization, monitoring)
  • Hands-on experience with integrating systems via APIs, webhooks, and data pipelines
  • Excellent communication and project management skills
Job Responsibility
Job Responsibility
  • Develop, configure, deploy, and optimize AI agents using Cresta’s AI platform and tools
  • Build AI agent integrations with external systems (APIs, databases, CRMs, etc.) to ensure seamless workflow integration
  • Optimize AI agent performance (e.g. fine-tune prompts and configurations) and troubleshoot issues in complex enterprise environments
  • Collaborate with customers and internal stakeholders to gather technical requirements and translate business needs into AI Agent solutions
  • Conduct interactive demos and present compelling proof-of-concepts to prospective customers, proactively gather feedback, and iteratively refine solutions to meet objectives
  • Define project milestones, create implementation plans, and coordinate execution with internal teams to ensure on-time delivery
  • Provide a tight feedback loop to our product and engineering teams — identifying gaps, building custom tooling, and influencing the roadmap through real-world deployment learnings
  • Collaborate with PMs to define agent goals, iterate rapidly based on customer feedback, and shape product capabilities that maximize customer ROI
  • Serve as a trusted technical advisor for the customer, guiding best practices for AI agent adoption and usage
  • Provide technical guidance on AI agent best practices, including architecture design, security considerations, and scalability planning
What we offer
What we offer
  • We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
  • Paid parental leave to support you and your family
  • Monthly Health & Wellness allowance
  • Work from home office stipend to help you succeed in a remote environment
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
  • Fulltime
Read More
Arrow Right

Forward Deployed Engineer (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
United States
Salary
Salary:
150000.00 - 250000.00 USD / Year
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 3+ years of experience in software development, AI/ML engineering, or system integration
  • Proficiency in Python and Golang
  • Familiarity with AI/ML concepts
  • Hands-on experience with large language models (LLMs) and prompt engineering techniques are strongly preferred
  • Strong understanding of general AI agent frameworks, function calling, and retrieval-augmented generation (RAG)
  • Hands-on experience of building such a system is strongly preferred
  • Experience with cloud platforms (AWS, GCP, or Azure) and DevOps practices (CI/CD, containerization, monitoring)
  • Hands-on experience with integrating systems via APIs, webhooks, and data pipelines
  • Excellent communication and project management skills
Job Responsibility
Job Responsibility
  • Develop, configure, deploy, and optimize AI agents using Cresta’s AI platform and tools
  • Build AI agent integrations with external systems (APIs, databases, CRMs, etc.) to ensure seamless workflow integration
  • Optimize AI agent performance (e.g. fine-tune prompts and configurations) and troubleshoot issues in complex enterprise environments
  • Collaborate with customers and internal stakeholders to gather technical requirements and translate business needs into AI Agent solutions
  • Conduct interactive demos and present compelling proof-of-concepts to prospective customers, proactively gather feedback, and iteratively refine solutions to meet objectives
  • Define project milestones, create implementation plans, and coordinate execution with internal teams to ensure on-time delivery
  • Provide a tight feedback loop to our product and engineering teams — identifying gaps, building custom tooling, and influencing the roadmap through real-world deployment learnings
  • Collaborate with PMs to define agent goals, iterate rapidly based on customer feedback, and shape product capabilities that maximize customer ROI
  • Serve as a trusted technical advisor for the customer, guiding best practices for AI agent adoption and usage
  • Provide technical guidance on AI agent best practices, including architecture design, security considerations, and scalability planning
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan
  • Remote work setup budget
  • Monthly wellness and communication stipend
  • In-office meal program and commuter benefits provided for onsite employees
  • Equity
  • Fulltime
Read More
Arrow Right

AI Engineer

As an AI Engineer, you bring traditional and Generative AI into real world use c...
Location
Location
Belgium , Brussels/Flanders
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 3 years of experience as a Machine Learning Engineer, Data Scientist, MLOps Engineer, or in a similar position
  • Experience with traditional NLP, LLMs, and conversational AI, both in experimentation phase and in production
  • Experience with vector databases for semantic search and RAG solutions in a production environment
  • Strong software engineering background and ability to write production ready code in Python
  • Experience with experiment tracking, models monitoring, LLM and NLP evaluation techniques, and deployment strategies
  • Comfortable with the machine learning lifecycle and MLOps and DevOps principles
  • Experience with at least one cloud provider and good knowledge of the data ecosystem
  • Able to coach others, give technical advice and direction, and work independently
  • Master or PhD in Machine Learning, Artificial Intelligence, Computer Engineering, or related field
  • Proficient in English, knowledge of Dutch and/or French is a plus.
Job Responsibility
Job Responsibility
  • Design and implement solutions that require from traditional AI to LLMs, from semantic search to conversational AI
  • Train, fine-tune, improve, and deploy ML models
  • Write production ready code to serve online, batch, and real time models
  • Build applications and software to serve AI driven use cases
  • Work in close collaboration with Data Scientists, MLOps Engineers and Data Engineers to integrate all parts of the solution
  • Help build solutions and/or operate with clients on a medium to long-term basis.
What we offer
What we offer
  • Mobility options (including a company car)
  • insurance coverage
  • meal vouchers
  • eco-cheques
  • continuous learning opportunities through the Sopra Steria Academy
  • opportunity to connect with fellow Sopra Steria colleagues at various team events.
Read More
Arrow Right

Ai Azure Enterprise Automation Engineer

Baptist Health Information Services is looking for an Enterprise Automation Engi...
Location
Location
United States , Jacksonville
Salary
Salary:
Not provided
baptistjax.com Logo
Baptist Health (Florida)
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or Equivalent Experience
  • Over 5 years of Information Technology Experience Required
  • Experience designing or implementing AI-driven automation agents that support IT operations, observability, or cloud management by autonomously identifying and resolving issues
  • Familiarity with Large Language Model (LLM) integration (e.g., OpenAI, Claude, Gemini) for code generation, decision support, or infrastructure recommendations
  • Exposure to multi-agent orchestration frameworks such as LangChain, AutoGen, or Microsoft Autonomous Agents for coordinating complex, layered workflows
  • Integration of AI agents into DevOps workflows or incident response tooling
  • Understanding of prompt engineering, retrieval-augmented generation (RAG), or vector database utilization (e.g., Azure Cognitive Search, Weaviate) in the context of enterprise systems
  • Contributions to open-source automation or AI platforms that demonstrate thought leadership or technical innovation
  • Familiarity with healthcare IT standards and constraints (e.g., HIPAA compliance, identity management in clinical workflows) as they apply to automation and AI integration
  • Azure VMs, Virtual Networks, Storage Accounts, Azure AD
Job Responsibility
Job Responsibility
  • Expert level engineering skills across a broad range of technology stacks and programming languages
  • As an SRE at Baptist Health you will be a member of a team dedicated to improving our resiliency, reliability, observability, and scalability through different methodologies and tools
  • You will have the drive to improve and define how we automate, observe, scale, and operate enterprise services
  • Design and build infrastructure & systems that provide high levels of scalability, reliability, performance, and security across Azure and on-prem environments
  • Automate manual processes by designing and implementing end-to-end automation pipelines that reduce operational friction, eliminate repetitive tasks, and enforce consistency through Infrastructure-as-Code and CI/CD practices
  • Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for all core services
  • Improve observability of all enterprise services with actionable monitoring, logging, and alerting using tools like Azure Monitor, Application Insights, and SolarWinds
  • Develop playbooks and runbooks to guide operations teams and support staff in managing infrastructure efficiently and safely
  • Partner with Digital Cloud Development Operations, Application Development, and Product teams to ensure new systems are designed for reliability and maintainability
  • Work closely with vendors and cloud providers (Azure, AWS, GCP) to optimize infrastructure and troubleshoot escalated issues
  • Fulltime
Read More
Arrow Right

Middle Python Engineer

Our client is providing cloud-based software solutions for the professional and ...
Location
Location
Poland; Croatia
Salary
Salary:
Not provided
eleks.com Logo
ELEKS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional experience in Python backend development
  • Practical experience with LLMs / AI integrations (OpenAI, Azure OpenAI, LangChain, Hugging Face, or similar)
  • Hands-on experience with Azure Cloud in preferred
  • Ability to read and understand .NET (C#) codebase (no coding required)
  • Upper-Intermediate English level or higher
Job Responsibility
Job Responsibility
  • Design, develop, and maintain Python-based components supporting LLM functionalities
  • Integrate LLMs and AI services into the existing .NET-based product
  • Work with Azure services to deploy and scale AI-driven solutions
  • Contribute to CI/CD and DevOps processes (migration from Jenkins to Azure DevOps)
  • Collaborate with cross-functional teams to ensure smooth integration and delivery
  • Participate in architecture and design discussions, proposing improvements for performance and scalability
  • Support and maintain existing chatbot functionality based on Python
What we offer
What we offer
  • Close cooperation with a customer
  • Challenging tasks
  • Competence development
  • Team of professionals
  • Dynamic environment with low level of bureaucracy
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right