Site Reliability Engineering Manager Jobs (Hybrid work), 4 job offers

About the Site Reliability Engineering Manager role

Looking for Site Reliability Engineering Manager jobs? This senior leadership role sits at the critical intersection of software engineering and IT operations, responsible for building and guiding teams that ensure the ultimate reliability, scalability, and performance of large-scale, user-facing systems. An SRE Manager is more than a technical expert; they are a people leader, a process architect, and a strategic partner who instills a culture of engineering excellence and operational rigor.

Professionals in this role typically lead a team of Site Reliability Engineers, focusing on their mentorship, career growth, and the overall health of the team. Their core mission is to define and uphold a reliability standard for the organization's services. This involves establishing and tracking Service Level Objectives (SLOs) and Indicators (SLIs) that align with business goals, and implementing the processes and tooling needed to meet them. A primary day-to-day responsibility is overseeing the incident management lifecycle—ensuring swift response, effective communication during outages, and conducting thorough post-incident reviews (postmortems) to drive permanent improvements and prevent recurrence.

Common responsibilities for those in SRE Manager jobs include collaborating with product and development engineering managers to embed reliability principles early in the software development lifecycle. They advocate for and implement robust observability stacks (encompassing monitoring, logging, and tracing) to gain deep system insights. Driving automation is paramount; they guide their teams to eliminate manual toil through Infrastructure as Code (IaC), automated remediation, and self-healing systems. Furthermore, they are accountable for capacity planning, disaster recovery strategies, and ensuring operational security best practices are followed.

Typical skills and requirements for this profession include extensive prior hands-on experience in SRE, DevOps, or cloud infrastructure roles, coupled with several years of technical leadership and people management. A deep, practical knowledge of cloud platforms (like AWS, GCP, or Azure), container orchestration (especially Kubernetes), and modern programming or scripting languages (such as Python or Go) is essential. They must possess strong expertise in observability tools, incident command systems, and automation frameworks. Beyond technical prowess, successful SRE Managers demonstrate exceptional communication and stakeholder management skills, an unwavering commitment to blameless postmortems, and a strategic ability to balance urgent operational needs with long-term reliability investments. If you are seeking leadership jobs that blend deep technical architecture with team development and strategic operational oversight, a career as a Site Reliability Engineering Manager offers a challenging and impactful path.

Filters

Site Reliability Engineering Manager Jobs (Hybrid work)

About the Site Reliability Engineering Manager role

Filters