CrawlJobs Logo

Site Reliability & Infrastructure Automation Engineer

United States, Durham Employment contract 110000.00 - 140000.00 USD / Year · Job Posted June 17, 2026
Apply Position
Job Link Share

Job Description

Piper Companies is hiring a Site Reliability & Infrastructure Automation Engineer for a growing, technology-driven insurance organization located in Durham, NC. The Site Reliability & Infrastructure Automation Engineer will support and modernize infrastructure environments by combining traditional IT operations with modern cloud and SRE practices. The Site Reliability & Infrastructure Automation Engineer is a hybrid position requiring 3 days onsite per week in Durham, NC.

Job Responsibility

  • Design and maintain synthetic monitoring for critical applications, services, and APIs
  • Build dashboards, alerts, and telemetry to improve system observability
  • Automate operational tasks using Python, PowerShell, or similar scripting languages
  • Develop and manage Infrastructure-as-Code using Terraform and cloud-native tools
  • Troubleshoot cloud and SaaS environments while improving reliability and performance
  • Collaborate across development, infrastructure, and application teams to enhance operational best practices

Requirements

  • 2+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • Strong experience with AWS services (Lambda, S3, CloudFormation, etc.)
  • Hands-on experience with Terraform for Infrastructure-as-Code
  • Experience with Python scripting and automation
  • Knowledge of observability tools such as Dynatrace, AppDynamics, or similar platforms
  • Bachelor’s degree (preferably in Computer Science, Engineering, or related technical field)

What we offer

  • Health
  • Vision
  • Dental
  • PTO
  • Paid Holidays
  • 10% bonus
  • 7.5% long-term incentive

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability & Infrastructure Automation Engineer

8 matching positions

Intermediate Site Reliability Engineer SRE – AI Reliability & Automation

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer, Infrastructure Foundations

The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to su...
Location
Location
United States
Salary
Salary:
113082.00 - 175725.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in an SRE/Operations/DevOps role as part of a team
  • Experience with shell and any scripting languages used in an SRE context (Python, Go, Bash, Ruby
  • we primarily use Python) and configuration management tools (Puppet, Ansible
  • we use Puppet)
  • Experience designing and managing infrastructure security for large fleets of diverse services
  • Experience with technical response during security incidents
  • Experience with package management on Linux systems (we use Debian)
  • Strong Linux system-level troubleshooting skills
  • History of automating tasks and processes, identifying process gaps, and finding automation opportunities
  • Strong English language skills (verbal and written) and ability to work independently, as an effective part of a globally distributed team working across multiple time zones
Job Responsibility
Job Responsibility
  • Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
  • Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
  • Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform
  • Work closely with product teams helping them bring scalable functionality to our users by assisting in the architectural design of new services and making them operate at scale
  • Participating in a 24/7 on-call rotation shared across the broader SRE team. This includes taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure
  • Collaborating with a global, cross-functional team in an asynchronous communication environment
  • Mentoring peers in your areas of technical and operational strength
  • Ability and willingness to travel 1-2 times a year for in-person events and team meetings
  • Most importantly, share our values and work in accordance with them
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer - Infrastructure

We are actively looking for a talented Site Reliability Engineer to join the Inf...
Location
Location
United States , San Mateo
Salary
Salary:
130000.00 - 280000.00 USD / Year
verkada.com Logo
Verkada
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must have a BS, MS, or PhD in Computer Science, or similar technical field of study
  • Minimum of 1-2+ years of experience in a similar position
  • Experience in at least one scripting language (preferably Python)
  • Experience with one of the major cloud platforms (preferably AWS)
  • Experience with Kubernetes
  • Experience with Terraform
  • Enthusiasm for learning about new technologies and tooling
Job Responsibility
Job Responsibility
  • Keep our infrastructure up!
  • Improve infrastructure automation
  • Define infrastructure roadmap
  • Provide technical support for engineers on other teams
What we offer
What we offer
  • Generous company paid medical, dental & vision insurance coverage
  • Unlimited paid time off & 11 companywide paid holidays
  • Wellness allowance
  • Commuter benefits
  • Healthy lunches and dinners provided daily
  • Generous paid parental leave policy & fertility benefits
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer - Automation Platform

Join a team of passionate and hardworking entrepreneurs to transform healthcare!...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of site reliability engineering experience
  • Experience with AWS, Terraform, Kubernetes, GitHub Actions supporting applications deployment developed on the JVM and/or TypeScript
  • Proactive, curious, collaborative and eager to learn
  • Proven experience with cloud services such as AWS, Azure or Google Cloud
  • Solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
  • Proficiency in at least one programming language (Go, Java, Ruby, Python etc.) and a deep understanding of infrastructure as code principles
Job Responsibility
Job Responsibility
  • Collaborating with Feature teams to ensure services align with developer needs
  • Driving improvements by evaluating new technologies and processes
  • Defining best practices (golden paths) for software development and deployment
  • Developing and maintaining tools and services that facilitate implementation of best practices
  • Ensuring reliability, scalability, traceability, and monitoring of services and infrastructure
  • Collaborating on roadmap delivery
What we offer
What we offer
  • Free Health Insurance for you
  • Up to 14 days of RTT
  • A flexible workplace policy offering both hybrid and office-based modes
  • Flexibility days allowing to work in EU countries and the UK 10 days per year
  • Wellbeing program with free mental health and coaching through moka.care
  • Special support package for caregivers and workers with disabilities
  • Lunch voucher with Swile card
  • Work Council subsidy for sport club membership or creative activities
  • Bicycle subsidy
  • Public transportation reimbursement
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer - Automation Platform

Join a team of passionate and hardworking entrepreneurs to transform healthcare....
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of site reliability engineering experience
  • Experience with AWS, Terraform, Kubernetes, GitHub Actions supporting applications deployment developed on the JVM and/or TypeScript
  • Proactive, curious, collaborative and eager to learn
  • Proven experience with cloud services such as AWS, Azure or Google Cloud
  • Solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
  • Proficiency in at least one programming language (Go, Java, Ruby, Python etc.) and a deep understanding of infrastructure as code principles
Job Responsibility
Job Responsibility
  • Collaborating with Feature teams to ensure services align with developer needs
  • Driving improvements by evaluating new technologies and processes
  • Defining best practices ("golden paths") for software development and deployment
  • Developing and maintaining tools and services that facilitate best practices
  • Ensuring reliability, scalability, traceability, and monitoring of services and infrastructure
  • Collaborating on roadmap delivery
What we offer
What we offer
  • Company health insurance through partner Allianz
  • Minimum 28 days of paid leave
  • Parent Care Program: one additional month of leave on top of legal parental leave
  • Free mental health and coaching services through partner Moka.care
  • For caregivers and workers with disabilities, a package including adaptation of remote policy, extra days off for medical reasons, and psychological support
  • Flexible workplace policy offering both hybrid and office-based mode
  • Work from EU countries and the UK for up to 10 days per year
  • Reimbursement of public transportation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer – Infrastructure

The Site Reliability Engineer (SRE) will ensure the reliability, scalability, an...
Location
Location
United States , Atlanta
Salary
Salary:
Not provided
tier4group.com Logo
Tier4 Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
  • Proven experience as a Site Reliability Engineer or Systems Engineer
  • Strong proficiency in Terraform and Ansible for infrastructure automation
  • Hands-on experience with Kubernetes, Docker, or other container orchestration tools
  • Proficiency in scripting languages such as Python or Bash
  • In-depth knowledge of Google Cloud Platform (GCP) services including compute, networking, storage, Kubernetes, and security
  • Solid understanding of VMware virtualization and enterprise storage systems (e.g., Pure Storage)
  • Experience with networking technologies including VLANs, VPNs, and routing protocols
  • Strong grasp of IT infrastructure and operations principles, including systems integration and automation best practices
  • Excellent communication and collaboration skills
Job Responsibility
Job Responsibility
  • Design, build, and maintain secure, compliant infrastructure using Infrastructure as Code tools such as Terraform and Ansible
  • Automate provisioning and management of servers, storage, networks, Kubernetes clusters, and related systems across cloud and on-premises environments
  • Develop tools and processes for automated deployment, configuration, monitoring, and alerting
  • Collaborate with cross-functional teams to implement scalable and reliable cloud and data center solutions
  • Participate in incident response, on-call rotations, and post-incident reviews to improve system resilience
  • Monitor system performance and availability using service-level agreements (SLAs), objectives (SLOs), and indicators (SLIs)
  • proactively troubleshoot and resolve reliability, performance, or security issues
  • Create and maintain disaster recovery and business continuity plans for critical systems
  • Continuously analyze and improve infrastructure efficiency, scalability, and performance
  • Stay current with emerging technologies and recommend tools or practices to enhance platform capabilities
  • Fulltime
Read More
Arrow Right
New

Staff Engineer, Site Reliability Engineer

OnStar is a cornerstone of General Motors' connected services—bringing safety, s...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in SRE, DevOps, or systems engineering, including experience managing or mentoring high-impact teams
  • Track record of building and maintaining high-scale, cloud-native systems (preferably AWS, GCP, or Azure)
  • Expertise in container orchestration and deployment strategies using Kubernetes and CI/CD pipelines
  • Proficiency in Python, Go, or Java, with strong code review and readability standards
  • Experience leading cross-functional infrastructure projects, configuration strategy, or organizational tooling initiatives
  • Ability to think and act under pressure
  • Strong communication skills
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable, fault-tolerant, and observable infrastructure supporting OnStar mobile and web experiences, in-vehicle services, and the backend platforms and integrations that power them
  • Champion configuration management, infrastructure refactoring, and testing frameworks to strengthen system resilience
  • Partner across SRE, development, and product teams to improve service reliability, deployment safety, and incident response practices
  • Drive internal consultation and strategic planning on reliability standards for new OnStar capabilities, customer-facing releases, and platform initiatives
  • Define and evolve observability strategy using tools such as Prometheus, Grafana, and Datadog, with automated alerting and actionable SLO dashboards
  • Own and improve on-call practices, manage blameless postmortems, and guide root cause analysis to eliminate recurring failures
  • Mentor engineers and help shape a high-performance culture rooted in extreme ownership and operational excellence
  • Support compliance and privacy-driven engineering initiatives across connected services, with potential crossover into areas like data retention and safety certification tooling
  • Fulltime
Read More
Arrow Right

Cloud Engineer / Site Reliability Engineer (SRE)

Location
Location
United States , Orlando
Salary
Salary:
75.00 USD / Hour
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on AWS experience with solid understanding of core AWS services
  • Experience supporting and troubleshooting AWS and Azure cloud environments
  • Terraform experience for Infrastructure as Code
  • Docker/containerization experience
  • Strong troubleshooting and problem-solving skills
  • Ability to translate requirements into technical execution
  • Experience performing cloud architecture and diagramming
  • Experience supporting deployments, environments, and site standups
  • Strong communication and collaboration skills
Job Responsibility
Job Responsibility
  • Support cloud infrastructure and deployments across AWS and Azure
  • Troubleshoot infrastructure and application-related cloud issues
  • Build and maintain Terraform-based infrastructure
  • Support Docker/containerized environments
  • Create architecture diagrams and technical documentation
  • Work closely with engineering and project teams to execute cloud initiatives
  • Assist with automation and operational improvement efforts
  • Fulltime
Read More
Arrow Right