This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Planet DDS is seeking a Manager, Site Reliability Engineering and Incident Management, to manage our Site Reliability Engineering function as well as our external incident response function for our production operations. To be successful, the manager will need to be self-motivated, communicate clearly, and operate with a sense of urgency in a fast-paced environment.
Job Responsibility:
Lead and mentor a team of SREs and Incident Managers
Foster a culture of reliability, accountability, and continuous improvement
Collaborate with engineering teams to design resilient platform architectures
Oversee the incident response process for outages and service disruptions
Ensure timely detection, escalation, and resolution of incidents
Drive post-incident reviews (PIRs) and root cause analysis
Implement improvements based on lessons learned to prevent recurrence
Mature and enforce best practices for incident response and runbooks
Automate operational tasks to reduce toil and improve efficiency
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.