CrawlJobs Logo
Briefcase Icon
Category Icon

Site Reliability Engineering Manager Jobs (Remote work)

3 Job Offers

Filters
New
Staff Site Reliability Engineer - Incident Management & Reliability
Save Icon
Lead incident management and reliability engineering for Confluent's multi-cloud data streaming platform. You'll need 10+ years of SRE experience, deep cloud expertise, and proficiency with tools like Rootly. This remote Canada role offers a chance to drive org-wide reliability practices within a...
Location Icon
Location
Canada
Salary Icon
Salary
225100.00 - 264500.00 CAD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Manager, Site Reliability Engineering and Incident Management
Save Icon
Lead our Site Reliability Engineering and Incident Management team in Atlanta. You will drive platform resilience, oversee critical incident response, and mentor a skilled team. This role requires deep cloud expertise and a passion for building reliable, scalable systems in a fast-paced SaaS envi...
Location Icon
Location
United States , Atlanta
Salary Icon
Salary
118000.00 - 160000.00 USD / Year
planetdds.com Logo
Planet DDS
Expiration Date
Until further notice
Site Reliability Engineering Manager
Save Icon
Lead a globally distributed SRE team at the Wikimedia Foundation, supporting infrastructure used by hundreds of millions. Utilize your hands-on expertise in cloud, Linux, Kubernetes, and IaC to guide critical projects and ensure reliability. This remote US role offers the chance to mentor enginee...
Location Icon
Location
United States of America
Salary Icon
Salary
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Looking for Site Reliability Engineering Manager jobs? This senior leadership role sits at the critical intersection of software engineering and IT operations, responsible for building and guiding teams that ensure the ultimate reliability, scalability, and performance of large-scale, user-facing systems. An SRE Manager is more than a technical expert; they are a people leader, a process architect, and a strategic partner who instills a culture of engineering excellence and operational rigor. Professionals in this role typically lead a team of Site Reliability Engineers, focusing on their mentorship, career growth, and the overall health of the team. Their core mission is to define and uphold a reliability standard for the organization's services. This involves establishing and tracking Service Level Objectives (SLOs) and Indicators (SLIs) that align with business goals, and implementing the processes and tooling needed to meet them. A primary day-to-day responsibility is overseeing the incident management lifecycle—ensuring swift response, effective communication during outages, and conducting thorough post-incident reviews (postmortems) to drive permanent improvements and prevent recurrence. Common responsibilities for those in SRE Manager jobs include collaborating with product and development engineering managers to embed reliability principles early in the software development lifecycle. They advocate for and implement robust observability stacks (encompassing monitoring, logging, and tracing) to gain deep system insights. Driving automation is paramount; they guide their teams to eliminate manual toil through Infrastructure as Code (IaC), automated remediation, and self-healing systems. Furthermore, they are accountable for capacity planning, disaster recovery strategies, and ensuring operational security best practices are followed. Typical skills and requirements for this profession include extensive prior hands-on experience in SRE, DevOps, or cloud infrastructure roles, coupled with several years of technical leadership and people management. A deep, practical knowledge of cloud platforms (like AWS, GCP, or Azure), container orchestration (especially Kubernetes), and modern programming or scripting languages (such as Python or Go) is essential. They must possess strong expertise in observability tools, incident command systems, and automation frameworks. Beyond technical prowess, successful SRE Managers demonstrate exceptional communication and stakeholder management skills, an unwavering commitment to blameless postmortems, and a strategic ability to balance urgent operational needs with long-term reliability investments. If you are seeking leadership jobs that blend deep technical architecture with team development and strategic operational oversight, a career as a Site Reliability Engineering Manager offers a challenging and impactful path.

Filters

×
Countries
Category
Location
Work Mode
Salary