CrawlJobs Logo

Senior DevOps Engineer, Cloud Infrastructure

United States, Atlanta · Job Posted February 14, 2026
Apply Position
Job Link Share

Job Description

The Senior DevOps Engineer, Cloud Infrastructure, leads a team dedicated to developing, deploying, and scaling cloud infrastructure that’s secure, reliable, and optimized for high performance. This hands-on role combines strategic oversight with technical leadership, supporting both project initiatives and operational excellence across cloud environments.

Job Responsibility

  • Infrastructure as Code: Design and implement infrastructure as code to build and deploy cloud solutions effectively
  • Full-Service Lifecycle Management: Improve service life cycles, from design through deployment, operation, and refinement, focusing on reliability and scalability
  • Monitor and Maintain Services: Ensure live services run smoothly by measuring and monitoring availability, latency, and overall system health, proactively identifying areas for improvement
  • Scale with Automation: Scale systems sustainably through automation and push for enhancements that improve reliability, performance, and operational efficiency
  • Optimize Infrastructure Costs: Drive initiatives to optimize infrastructure for cost-effectiveness without compromising performance or security
  • Incident Response and Postmortems: Lead sustainable incident response efforts and conduct blameless postmortems to ensure continuous improvement and resilience
  • Tool Selection and Evaluation: Have opinions on and experience with orchestration tools such as GitLab and ArgoCD, guiding best practices for the team
  • AWS Expertise: Leverage and enhance Amazon Cloud environments to support current and future infrastructure needs, staying informed on new services and practices.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or an equivalent combination of education and experience
  • 5 years in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles
  • 3+ years of experience with AWS services (EC2, S3, ELB, VPC, IAM) or equivalent cloud environments, with a strong understanding of AWS best practices
  • 3+ years of experience running Linux-based production systems, with in-depth knowledge of Linux operating systems
  • Hands-on experience managing, deploying, and troubleshooting Kubernetes clusters
  • Proficiency in Bash, Python, or other scripting languages, used for automation and infrastructure management
  • Expertise with tools like Ansible, Terraform, or CloudFormation to deploy and manage infrastructure at scale
  • Experience with monitoring technologies such as Grafana, Prometheus, AlertManager, to maintain visibility into system health
  • Proficiency in Git and experience with platforms like GitLab or GitHub for collaborative code management
  • Curiosity and Initiative: You’re curious, unafraid to ask “why,” and proactive in exploring solutions and innovative ideas
  • High Availability Mindset: You prioritize resilience and reliability in everything you design and deploy
  • Must have legal right to work in the U.S.

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior DevOps Engineer, Cloud Infrastructure

8 matching positions

Senior DevOps Engineer (AI & Cloud Infrastructure)

We are seeking a Senior DevOps Engineer to design, deploy, and operate the next ...
Location
Location
United States , Palo Alto
Salary
Salary:
175000.00 - 250000.00 USD / Year
inflection.ai Logo
Inflection AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in DevOps, Site Reliability Engineering, or ML Infrastructure supporting high-scale, production systems
  • Deep expertise in Azure and AWS, including storage, compute, networking, databases, and cloud-native monitoring services
  • Strong Kubernetes administration experience, including GPU scheduling, operator deployment, and management of core infrastructure components
  • experience with Slurm is highly desirable
  • Proven experience deploying, scaling, and operating Large Language Models (LLMs) and inference engines such as vLLM, TGI, or Triton
  • Strong experience with modern DevOps tooling: Terraform, Helm, Kustomize, ArgoCD, GitHub Actions or GitLab CI, Prometheus, Grafana, and Clickhouse
  • Advanced scripting and automation skills in Python and Bash, with the ability to debug complex distributed systems and optimize performance at scale
  • Demonstrated ability to troubleshoot LLM servers, Kubernetes workloads, GPU utilization, and cloud infrastructure bottlenecks
  • Have a bachelor’s degree or equivalent in a related field to the offered position requirements.
Job Responsibility
Job Responsibility
  • Architect, deploy, and operate large-scale LLM inference servers and AI applications with a focus on low latency, high availability, and production reliability
  • Design, provision, and maintain complex cloud architectures across Azure and AWS, including storage, compute, networking, databases, and native LLM services
  • Manage GPU-enabled Kubernetes clusters and Slurm-based HPC environments, optimizing resource allocation for AI training and inference workloads
  • Deploy and operate core Kubernetes infrastructure components and operators (GPU operators, ingress controllers, service meshes, CNIs, CSIs, and storage drivers)
  • Build scalable infrastructure-as-code and deployment workflows using Terraform, Helm, Kustomize, ArgoCD, and GitOps best practices
  • Design and maintain centralized observability systems using Prometheus, Grafana, Clickhouse, and cloud-native monitoring tools
  • Participate in on-call rotations, lead incident response, perform post-mortems, and continuously improve system reliability and SLAs.
What we offer
What we offer
  • Diverse medical, dental and vision options
  • 401k matching program
  • Unlimited paid time off
  • Parental leave and flexibility for all parents and caregivers
  • Support of country-specific visa needs for international employees living in the Bay Area
  • Meaningful equity component.
  • Fulltime
Read More
Arrow Right

Senior DevOps / Cloud Engineer

Robert Half is looking for a Senior DevOps / Cloud Engineer who can take ownersh...
Location
Location
United States , Saint Louis
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • U.S. Citizenship required
  • 5+ years in a DevOps, Cloud, or similar engineering role
  • Strong AWS experience
  • Some exposure to Azure is helpful but not required
  • Experience building CI/CD pipelines (Jenkins, GitHub Actions, GitLab, Azure DevOps, etc.)
  • Background with infrastructure as code tools (Terraform, CloudFormation, etc.)
  • Experience with containers (Docker, Kubernetes, EKS/AKS)
  • Comfortable scripting (Python, Bash, PowerShell, etc.)
  • Solid understanding of cloud architecture, networking, and security basics
  • Must be able to pass enhanced background checks depending on the project
Job Responsibility
Job Responsibility
  • Building and improving CI/CD pipelines to support faster, more reliable releases
  • Managing and scaling AWS environments (Azure experience is a plus)
  • Writing and maintaining infrastructure as code (Terraform, CloudFormation, etc.)
  • Automating wherever possible—deployments, monitoring, recovery, etc.
  • Working closely with dev, ops, and security teams to keep systems stable, secure, and performant
  • Helping teams adopt better DevOps practices and modern tooling
  • Troubleshooting issues in production and tightening up monitoring/alerting
  • Mentoring team members and sharing best practices across the organization
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right
New

Senior Infrastructure & Cloud Engineer

We are looking for a Senior Infrastructure & Cloud Engineer to join our Infrastr...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
https://feverup.com/fe Logo
Fever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Professional working proficiency in English (C1 or higher) — mandatory
  • Professional working proficiency in French (C1 or higher) — mandatory
  • Ability to communicate effectively with international and French-speaking stakeholders, both written and verbal
  • Ability to create and maintain technical documentation in both languages
  • 5+ years of hands-on experience in Infrastructure Engineering, Systems Engineering, Cloud Engineering, or similar roles
  • 3+ years of production experience working with AWS environments
  • Proven experience designing, implementing, and operating hybrid cloud architectures
  • Previous experience in a senior-level position, leading technical initiatives and mentoring engineers
  • Experience working in international and multicultural environments
  • Strong knowledge of core AWS services: EC2, VPC, S3, IAM, RDS, Route 53, CloudFront, ELB / ALB
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain secure, scalable, and resilient infrastructure across AWS and on-premise environments
  • Lead the implementation and optimization of hybrid cloud architectures
  • Manage and optimize AWS services including EC2, VPC, S3, IAM, RDS, Route 53, CloudFront, ECS, EKS, Lambda, API Gateway, and Step Functions
  • Design and operate secure connectivity solutions between cloud and datacenter environments using AWS Direct Connect, Transit Gateway, VPNs, and VPC Peering
  • Manage Linux and Windows-based infrastructure platforms and associated services
  • Administer virtualization platforms such as VMware vSphere, ESXi, Hyper-V, or KVM
  • Manage enterprise storage environments, including NetApp solutions, backup, replication, and disaster recovery strategies
  • Implement Infrastructure as Code using Terraform and/or CloudFormation
  • Develop automation and operational tooling using Python, Bash, and PowerShell
  • Build and maintain CI/CD pipelines to support infrastructure and platform deployments
What we offer
What we offer
  • 40% discount on all Fever events and experiences
  • Home office friendly anywhere in Spain
  • Responsibility from day one and professional and personal growth
  • Great work environment with a young, diverse team of talented people to work with
  • Health insurance and other benefits such as Flexible remuneration with a 100% tax exemption through Cobee's platform
  • English Lessons
  • Gympass Membership
  • Possibility to receive in advance part of your salary through Payflow
  • Attractive compensation package consisting of base salary and the potential to earn a significant bonus for top performance
  • Fulltime
Read More
Arrow Right

Senior Cloud Infrastructure Engineer

We’re seeking a seasoned Cloud Infrastructure Engineer with deep expertise in au...
Location
Location
Salary
Salary:
180000.00 - 250000.00 USD / Year
lancedb.com Logo
LanceDB
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in DevOps, Cloud Infrastructure, or SRE roles, with hands-on experience in public cloud platforms (AWS, Azure, GCP, Heroku)
  • Strong experience operating and supporting production distributed systems and/or databases-as-a-service in a public cloud service provider, where it was the primary product for the company
  • Experience designing and managing complex production environments using Kubernetes and Helm
  • Expertise in IaC tools (Puppet, Terraform, Ansible, CloudFormation) and configuration management
  • Deep understanding of networking, security, and cloud architecture best practices
  • Experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk)
  • Strong knowledge of CI/CD tools (GitHub Actions) and containerization (Docker, Kubernetes)
  • You like working with a small, high-caliber team with a lot of autonomy and drive, and you can iterate fast
Job Responsibility
Job Responsibility
  • Design & Build Cloud Infrastructure: Architect and manage secure, scalable cloud environments (AWS, Azure, GCP) using IaC tools like Terraform and CloudFormation
  • Automate Everything: Develop and maintain automation scripts to streamline deployments, monitoring, and system operations
  • Systems Reliability: Implement monitoring/alerting solutions (Prometheus, Grafana, Datadog) to proactively address performance bottlenecks and ensure 99.9% uptime
  • Security & Compliance: Enforce security policies, manage secrets (Vault, AWS KMS), and ensure compliance with industry standards (GDPR, SOC2)
  • Troubleshoot & Optimize: Resolve complex infrastructure issues and lead cost-optimization initiatives for cloud resources
  • Collaborate & Mentor: Partner with software engineering teams to integrate DevOps practices into SDLC and mentor junior engineers on IaC and cloud best practices
What we offer
What we offer
  • Medical, dental, vision, and life insurance
  • 401(k) retirement plan
  • Flexible Spending Accounts (FSA) and Health Savings Accounts (HSA)
  • Commuter benefits
  • Generous paid time off
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Senior DevOps Engineer, Infrastructure

The DevOps team has a vision of building a reliable and effective cloud infrastr...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
airwallex.com Logo
Airwallex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • DevOps engineer and hands-on experience with at least one programming language (Python/Ansible/shell)
  • software engineering background (Infra/Platform Engineer), and have experience in cost management, access management, and self-service platform development
  • operations background (Service Mesh Istio & K8s), and experience in traffic management and architecture of service mesh
  • expert in public cloud, large-scale K8s clusters and services, Istio service mesh, service registry, service discovery, Infra, and Service Reliability
  • For Staff level: around 10 years or at least 8 years experience, able to demonstrate more in-depth understanding in the complexity of projects and the design of code systems
  • For Senior level: around 6 years or at least 5 years experience, able to solve problems related to coding and system design independently
Job Responsibility
Job Responsibility
  • Lead the design of our DevOps strategy
  • Work closely with cross-functional teams to develop and evolve the cloud infrastructure optimized for performance, scalability and security
  • Bring a level of expertise in operational platform management, developing observability and detection capabilities
  • Build consensus amongst DevOps Engineers across geographies in different feature teams about the tools and practices we use, and evolving that consensus into agreed standards
  • Fulltime
Read More
Arrow Right

Senior Cloud Infrastructure Engineer

Our client is a technology-driven organization delivering advanced digital, clou...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
solasit.ie Logo
Solas IT Recruitment
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading DevOps or SRE teams in AWS environments
  • Strong Linux/Unix systems background
  • Hands-on expertise with AWS + Terraform infrastructure provisioning
  • Experience migrating traditional hosted systems to cloud-native architectures
  • Strong understanding of CI/CD pipelines and automation practices
  • Experience supporting microservices deployments in AWS
  • Hands-on container orchestration experience (EKS, ECS, Docker)
  • Knowledge of distributed systems, networking, cloud security, IaaS & PaaS architectures
  • Familiarity with IAM, authentication, certificates, and identity management
  • Experience working in Agile/Scrum environments
Job Responsibility
Job Responsibility
  • Lead infrastructure strategy, processes, metrics, and governance frameworks
  • Ensure cloud platforms are fully leveraged for performance, scalability, and cost efficiency
  • Define target architectures aligned with business goals
  • Oversee production environments, stability, uptime, and monitoring systems
  • Design and implement proactive observability and monitoring solutions
  • Drive cloud transformation and migrations from legacy infrastructure
  • Optimize infrastructure costs and resource utilization
  • Collaborate cross-functionally with Development, QA, Release, and Operations teams
  • Align tooling, workflows, and best practices across DevOps and engineering teams
What we offer
What we offer
  • Opportunity to work with a fast-growing, innovation-focused organization
  • Career progression pathways
  • Continuous learning and development support
  • Flexible working model
  • Competitive salary based on exp
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer (Cloud & DevOps)

At 3Shape, we use cloud platforms to deliver secure, reliable services to both i...
Location
Location
Denmark , Copenhagen
Salary
Salary:
Not provided
3shape.com Logo
3Shape
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5 years of experience
  • Minimum 3 years of professional C#/.NET backend development experience, ideally in a cloud environment
  • Minimum 3 years of hands-on DevOps/SRE experience, ideally in a role combining software development and operations
  • Strong backend engineering fundamentals (design, performance, security, and maintainability)
  • Experience with API design, automated testing, code reviews, and building maintainable systems
  • Experience with containerized workloads and Kubernetes (e.g., Azure Kubernetes Service)
  • Curiosity for modern engineering practices and a strong understanding of core Azure concepts (networking, compute, storage, identity, and databases)
  • Experience with monitoring/observability in Azure (e.g., Azure Monitor, Application Insights, Log Analytics) and incident handling is a plus
  • A strong ownership mindset: automation-first, focus on reliability, and continuous improvement of quality and stability
Job Responsibility
Job Responsibility
  • Combine backend development with SRE/DevOps practices to help build and operate one of 3Shape's most central platform capabilities
  • Own authentication and authorization, as well as enterprise user management
  • Design, implement, and maintain backend services in the Account domain, delivering features end-to-end
  • Serve as the team's primary point of contact for DevOps topics and drive improvements across CI/CD, AKS/Kubernetes, Infrastructure as Code, observability, and platform stability
  • Collaborate with platform and product teams across 3Shape to align on Azure standards and best practices
What we offer
What we offer
  • Central Copenhagen location
  • Attractive healthcare package
  • Breakfast every day
  • Delicious and healthy lunch cooked by private chefs
  • Globally recognized tech company
  • Diverse and international work environment
  • Social clubs, monthly social activities, and various in-team activities
Read More
Arrow Right

Senior Software Engineer (Cloud & DevOps)

At 3Shape, we use cloud platforms to deliver secure, reliable services to both i...
Location
Location
Denmark , Copenhagen
Salary
Salary:
Not provided
3shape.com Logo
3Shape
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5 years of experience, of which Minimum 3 years of professional C#/.NET backend development experience, ideally in a cloud environment.
  • Minimum 3 years of hands-on DevOps/SRE experience, ideally in a role combining software development and operations.
  • Strong backend engineering fundamentals (design, performance, security, and maintainability).
  • Experience with API design, automated testing, code reviews, and building maintainable systems.
  • Experience with containerized workloads and Kubernetes (e.g., Azure Kubernetes Service).
  • Curiosity for modern engineering practices and a strong understanding of core Azure concepts (networking, compute, storage, identity, and databases).
  • Experience with monitoring/observability in Azure (e.g., Azure Monitor, Application Insights, Log Analytics) and incident handling is a plus.
  • A strong ownership mindset: automation-first, focus on reliability, and continuous improvement of quality and stability.
Job Responsibility
Job Responsibility
  • Design, implement, and maintain backend services in the Account domain, delivering features end-to-end from implementation and testing to deployment readiness.
  • Be the team's primary point of contact for DevOps topics and drive improvements across CI/CD, AKS/Kubernetes, Infrastructure as Code, observability, and platform stability.
  • Collaborate with platform and product teams across 3Shape to align on Azure standards and best practices especially around Infrastructure as Code, observability, and operational readiness.
  • Help define actionable alerts and dashboards, improve runbooks, and build safe automation so incidents can be detected, triaged, and mitigated quickly even outside normal working hours.
What we offer
What we offer
  • Central Copenhagen location
  • An attractive healthcare package to keep you fit and well.
  • Breakfast every day, and a delicious and healthy lunch cooked by our private chefs.
  • A joint purpose: to enable dentists to provide superior dental care to every patient, every time.
  • Fulltime
Read More
Arrow Right