LLM & AI DevOps Engineer Job at Robert Half (Remote)

Sr. Cloud Infrastructure Engineer (Ai & Llm Platforms)

We are seeking a specialized Infrastructure Engineer to bridge the gap between o...

Location

Salary:

Not provided

Q6 Cyber

Expiration Date

Until further notice

Requirements

5+ years of experience in DevOps, Platform Engineering, or SRE, with at least 1-2 years specifically focused on AI/ML infrastructure
Proven track record of building production-grade RAG pipelines or LLM-integrated applications
Thrives in 'day zero' environments where the tools and protocols (like MCP) are evolving weekly
Deep understanding of the security implications of LLMs (prompt injection, data leakage, and secure tool execution)
Experience working with substantial datasets (over 1bn objects, dozens or hundreds of TBs) and the challenges of leveraging AI tools with these data sets
Bachelor's degree or equivalent in computer science or related field
Cloud & Orchestration: AWS/GCP/Azure, Kubernetes, Terraform, Helm
AI Frameworks: LangChain, LlamaIndex, LangGraph
Data & Vectors: Pinecone, Milvus, Qdrant, or pgvector
Apache Kafka/Pulsar

Job Responsibility

Guide the architecture that will allow us to leverage AI tools with our large existing data stores and incoming streams of realtime intelligence
Work closely with other infrastructure engineers and software development teams to integrate AI tools into existing systems
Design, deploy, and maintain Model Context Protocol (MCP) servers to allow LLMs to securely interact with our internal databases, APIs, and external tooling
Build and orchestrate sandboxed, scalable environments (e.g., using Docker or specialized runtimes) where users can safely build and execute AI agents
Develop and manage the infrastructure for our internal RAG (Retrieval-Augmented Generation) pipeline, including vector database management (e.g., Pinecone, Weaviate, or pgvector) and automated embedding pipelines
Utilize Kubernetes (K8s) and Infrastructure as Code (Terraform/Pulumi) to deploy LLM-related tools, ensuring high availability and low latency for model inference and data retrieval
Implement strict guardrails for data privacy within LLM workflows, ensuring internal datasets remain secure while being accessible to authorized AI tools

What we offer

We offer a competitive compensation package and comprehensive benefits package

Fulltime

Senior DevOps Engineer, AI

LogicMonitor® is the AI-first hybrid observability platform powering the next ge...

Location

India , Pune

Salary:

Not provided

LogicMonitor

Expiration Date

Until further notice

Requirements

4+ years of experience in DevOps or similar roles
Proven experience with AWS (preferred), and GCP in production environments
Strong expertise in Infrastructure as Code practices
Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security
Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems
Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other

Job Responsibility

Multi-Cloud Enablement: Expand and manage application hosting across AWS and Google Cloud, ensuring performance, flexibility, and resilience
Infrastructure as Code (IaC): Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments
Cost Optimization: Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives
Cloud Security: Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks
Observability: Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution
Kubernetes Management: Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience
Automation & Scripting: Create Python and Bash scripts to automate repetitive tasks, streamline workflows, and improve operational efficiency

Senior DevOps Engineer (AI & Cloud Infrastructure)

We are seeking a Senior DevOps Engineer to design, deploy, and operate the next ...

Location

United States , Palo Alto

Salary:

175000.00 - 250000.00 USD / Year

Inflection AI

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience in DevOps, Site Reliability Engineering, or ML Infrastructure supporting high-scale, production systems
Deep expertise in Azure and AWS, including storage, compute, networking, databases, and cloud-native monitoring services
Strong Kubernetes administration experience, including GPU scheduling, operator deployment, and management of core infrastructure components
experience with Slurm is highly desirable
Proven experience deploying, scaling, and operating Large Language Models (LLMs) and inference engines such as vLLM, TGI, or Triton
Strong experience with modern DevOps tooling: Terraform, Helm, Kustomize, ArgoCD, GitHub Actions or GitLab CI, Prometheus, Grafana, and Clickhouse
Advanced scripting and automation skills in Python and Bash, with the ability to debug complex distributed systems and optimize performance at scale
Demonstrated ability to troubleshoot LLM servers, Kubernetes workloads, GPU utilization, and cloud infrastructure bottlenecks
Have a bachelor’s degree or equivalent in a related field to the offered position requirements.

Job Responsibility

Architect, deploy, and operate large-scale LLM inference servers and AI applications with a focus on low latency, high availability, and production reliability
Design, provision, and maintain complex cloud architectures across Azure and AWS, including storage, compute, networking, databases, and native LLM services
Manage GPU-enabled Kubernetes clusters and Slurm-based HPC environments, optimizing resource allocation for AI training and inference workloads
Deploy and operate core Kubernetes infrastructure components and operators (GPU operators, ingress controllers, service meshes, CNIs, CSIs, and storage drivers)
Build scalable infrastructure-as-code and deployment workflows using Terraform, Helm, Kustomize, ArgoCD, and GitOps best practices
Design and maintain centralized observability systems using Prometheus, Grafana, Clickhouse, and cloud-native monitoring tools
Participate in on-call rotations, lead incident response, perform post-mortems, and continuously improve system reliability and SLAs.

What we offer

Diverse medical, dental and vision options
401k matching program
Unlimited paid time off
Parental leave and flexibility for all parents and caregivers
Support of country-specific visa needs for international employees living in the Bay Area
Meaningful equity component.

Fulltime

Senior Java/Kotlin Engineer (AI-Driven DevOps & Automation)

We are looking for a Senior Java/Kotlin Engineer who goes beyond traditional dev...

Location

Colombia

Salary:

Not provided

Parser Limited

Expiration Date

Until further notice

Requirements

Strong experience in Java and/or Kotlin backend development
Solid understanding of software design, APIs, and distributed systems
Experience with CI/CD pipelines and DevOps practices
Hands-on experience with: Static code analysis tools
Dependency management and security remediation
Familiarity with AI-assisted coding tools (e.g., Claude, GitHub Copilot, etc.)
Experience working with Git-based workflows and multi-repo environments

Job Responsibility

Backend Development: Design, build, and maintain scalable backend services using Java/Kotlin
Deliver production-ready features with high quality and performance standards
Collaborate with product and engineering teams to translate requirements into technical solutions
AI-Driven DevOps & Automation: Use Claude (or similar agentic AI tools) to identify and fix vulnerabilities
Automate code improvements across repositories
Generate and maintain unit and integration tests using AI from code context and diffs
Continuously improve CI/CD workflows using AI-assisted processes
AI Readiness & Engineering Enablement: Improve AI readiness of repositories: clean architecture, modular structure, clear interfaces and contracts, type safety and documentation for LLM consumption
Build guardrails for AI usage: prompt design and versioning, output validation and consistency checks, safe code generation practices

What we offer

The chance to work in innovative projects with leading brands that use the latest technologies that fuel transformation
The opportunity to be part of an amazing, multicultural community of tech experts
A competitive compensation package and medical insurance
A flexible working environment

Fulltime

DevOps Engineer (Azure | Terraform | Ansible | Agentic AI for Infra/Monitoring/FinOps)

Job Summary: We are looking for a DevOps Engineer with strong hands-on experienc...

Location

India , Bangalore South

Salary:

Not provided

Wissen

Expiration Date

Until further notice

Requirements

3–5 years of experience in MLOps / ML Engineering / Cloud Engineering
Proficient in designing and deploying end-to-end ML pipelines
Terraform for Azure infrastructure automation
Python for ML, automation, and GenAI workflows
Azure Compute, Storage, Networking, and Identity
Running ML & GenAI workloads at scale on Azure
Supporting data pipelines for ML and LLM workloads
Experience with LangGraph for LLM workflow and agent orchestration
Hands-on exposure to Claude models, including skills/plugins integration
Understanding of prompt management, agent execution, and orchestration patterns

Job Responsibility

Build, deploy, and manage comprehensive MLOps and LLMOps pipelines on Azure
Design and oversee CI/CD pipelines for machine learning models and large language model workflows utilizing Harness or Azure DevOps
Streamline the promotion of models, prompts, and agent workflows between environments through automation
Establish approval gates, implement rollback mechanisms, and facilitate controlled release processes
Oversee the lifecycle of ML models and LLM-driven workflows, including their training, assessment, deployment, monitoring, and retraining
Administer Azure Machine Learning workspaces, computing resources, environments, model registries, and endpoints
Integrate LLM workflows and agent-centric architectures using LangGraph
Support the incorporation of Claude-based models, skills, and plugins into enterprise-level applications
Operationalize prompt versioning, orchestration strategies, and agent workflows in live production settings
Set up and govern Azure ML and Generative AI infrastructure via Terraform as Infrastructure as Code (IaC)

Fulltime

New

Ai Engineer - Azure & C# .Net

We are seeking a capable and solutions‑focused AI Engineer to join our growing A...

Location

United Kingdom

Salary:

60000.00 GBP / Year

360 Resourcing Solutions

Expiration Date

Until further notice

Requirements

Hands-on experience building solutions with Azure AI Services and integrating them into applications and/or data solutions
Working knowledge of Azure OpenAI and common GenAI patterns (prompting, evaluation, basic RAG)
Some experience with ML delivery (training, packaging, deployment, monitoring) in a live environment
Proficiency in C#/.NET plus SQL fundamentals
Understanding of vector search concepts (embeddings, chunking, retrieval) and secure API integration
Experience using Git-based workflows and CI/CD pipelines (e.g., GitHub or Azure DevOps)
Strong communication and problem-solving skills, including clear technical documentation

Job Responsibility

Build and enhance GenAI solutions using Azure OpenAI, Azure AI Services, and Copilot extensibility, following agreed patterns
Implement RAG architectures (vector retrieval, embeddings, prompt strategies) with secure LLM integrations
Build conversational assistants and workflow automations using Copilot Studio and the Power Platform
Contribute to experimentation, prototyping, and PoC development to evaluate AI capabilities and feasibility
Support evaluation and integration of third‑party AI tools or APIs where required, working with the team to meet governance and security requirements
Develop and operationalise ML models with appropriate review and documentation
Translate prototypes into robust services, collaborating with engineering colleagues to meet non-functional requirements
Create ML pipelines and automate lifecycle workflows using team tooling and standards
Assist with monitoring, optimisation, escalating risks and issues where appropriate
Build AI-enabled services and APIs using .NET/C#, Azure Functions, and REST under guidance on patterns and quality

Fulltime

New

Ai Engineer - Azure & C# .Net Or Python

We are seeking a capable and solutions-focused AI Engineer to join our growing A...

Location

United Kingdom

Salary:

60000.00 GBP / Year

360 Resourcing Solutions

Expiration Date

Until further notice

Requirements

Hands-on experience building solutions with Azure AI Services and integrating them into applications and/or data solutions
Working knowledge of Azure OpenAI and common GenAI patterns (prompting, evaluation, basic RAG)
Some experience with ML delivery (training, packaging, deployment, monitoring) in a live environment
Proficiency in either C#/.NET or Python, plus SQL fundamentals
Understanding of vector search concepts (embeddings, chunking, retrieval) and secure API integration
Experience using Git-based workflows and CI/CD pipelines (e.g., GitHub or Azure DevOps)
Strong communication and problem-solving skills, including clear technical documentation
Based in the UK with valid right to work

Job Responsibility

Build and enhance GenAI solutions using Azure OpenAI, Azure AI Services, and Copilot extensibility
Implement RAG architectures (vector retrieval, embeddings, prompt strategies) with secure LLM integrations
Build conversational assistants and workflow automations using Copilot Studio and the Power Platform
Contribute to experimentation, prototyping, and PoC development to evaluate AI capabilities and feasibility
Support evaluation and integration of third-party AI tools or APIs
Develop and operationalise ML models with appropriate review and documentation
Translate prototypes into robust services, collaborating with engineering colleagues
Create ML pipelines and automate lifecycle workflows using team tooling and standards
Assist with monitoring, optimisation, escalating risks and issues
Build AI-enabled services and APIs using .NET/C#, Python, Azure Functions, and REST

Fulltime

Aws Agentic Framework Engineer / DevOps (Langgraph Focus)

We are looking for an experienced engineer to build and enhance observability ca...

Location

Salary:

Not provided

Intellias

Expiration Date

Until further notice

Requirements

5+ years of experience as a Software / DevOps / Platform Engineer
Strong experience with Python (FastAPI, APIs, async workflows)
Hands-on experience with LangGraph or similar agent orchestration frameworks
Experience working with LLM-based systems (OpenAI, Anthropic, etc.)
Strong knowledge of AWS (EKS, Lambda, API Gateway, etc.)
Experience with Kubernetes and Terraform
Understanding of stateful workflows and distributed systems
Experience building and integrating APIs and microservices
Familiarity with CI/CD processes and cloud-native development

Job Responsibility

Design and implement agent workflows using LangGraph
Build stateful, multi-step AI pipelines with complex decision logic
Orchestrate interactions between multiple agents and external systems
Integrate LLM-based components into production-grade applications
Ensure scalability and reliability of agent execution flows
Collaborate with platform teams to integrate agent workflows with AWS infrastructure
Optimize performance and cost efficiency of agent-based systems
Contribute to architecture and best practices for agentic systems

Select Country

LLM & AI DevOps Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?