CrawlJobs Logo

Manager, Site Reliability Engineering and Incident Management

planetdds.com Logo

Planet DDS

Location Icon

Location:
United States, Atlanta

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

118000.00 - 160000.00 USD / Year

Job Description:

Planet DDS is seeking a Manager, Site Reliability Engineering and Incident Management, to manage our Site Reliability Engineering function as well as our external incident response function for our production operations. To be successful, the manager will need to be self-motivated, communicate clearly, and operate with a sense of urgency in a fast-paced environment.

Job Responsibility:

  • Lead and mentor a team of SREs and Incident Managers
  • Foster a culture of reliability, accountability, and continuous improvement
  • Collaborate with engineering teams to design resilient platform architectures
  • Oversee the incident response process for outages and service disruptions
  • Ensure timely detection, escalation, and resolution of incidents
  • Drive post-incident reviews (PIRs) and root cause analysis
  • Implement improvements based on lessons learned to prevent recurrence
  • Mature and enforce best practices for incident response and runbooks
  • Automate operational tasks to reduce toil and improve efficiency
  • Maintain observability tools (monitoring, alerting, logging)
  • Define and maintain incident management policies and escalation procedures
  • Drive initiatives for chaos engineering, capacity planning, and disaster recovery testing

Requirements:

  • 7+ years in SRE, DevOps, or Infrastructure roles
  • 3+ years in Incident Management leadership
  • Deep understanding of reliability, scalability, and performance optimization
  • Multi-cloud expertise in AWS, Azure, or GCP
  • Understanding of DNS, load balancing, firewalls, and compliance frameworks
  • Knowledge of fundamental cloud security (e.g., identity and access management, firewalls)
  • Deep understanding of logging and monitoring and security best practices
  • Strong collaboration and communication skills
  • Bachelor’s Degree in a relevant major or equivalent years of experience is a plus

Nice to have:

  • Dental industry knowledge
  • Experience working in B2B SaaS companies
  • Experience with cloud containers, specifically Kubernetes

Additional Information:

Job Posted:
December 11, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.