CrawlJobs Logo

Site Reliability Engineering Manager

https://checkr.com Logo

Checkr

Location Icon

Location:
United States, Denver

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

197000.00 - 232000.00 USD / Year

Job Description:

We’re looking for a Site Reliability Engineering Manager with extensive leadership and observability experience in cloud-based applications. In this role, you will lead, manage, and mentor a team of SREs, define and track metrics related to the company's SLO, SLI, and SLAs, and operationalize incident management, communication, and incident handling. The SRE Manager will be responsible for the availability and performance of all external and internal-facing application endpoints that help drive Checkr’s business. Extensive knowledge of AWS, Kubernetes, and event orchestration is desired. Tooling knowledge with Datadog, PagerDuty, and Atlassian (Jira, Confluence) is highly preferred to identify strategies to improve our full-stack telemetry and monitoring capabilities. Mentoring SREs contributing to observability-related work, as well as to their career development.

Job Responsibility:

  • Expand and improve our observability and monitoring footprint in line with cost efficiency
  • Drive and delegate the day-to-day escalations and incidents with on-call engineering teams
  • Collaborate with other Engineering Managers to define metrics and dashboarding requirements
  • Ensure stakeholders and partners are informed of incidents and incident trends while working with other departments, such as account managers, legal, and marketing, for outbound communication
  • Review the work of the SRE team, help them get unblocked, and provide mentoring
  • Meet with the team and individuals weekly to collaborate and discuss topics related to processes, planning, and goals
  • Manage and assist the on-call incident commander and owners in resolving production reliability issues, ensuring timely communication, retrospectives, and postmortems are performed and delivered
  • Participate in design and production reviews for new features, products, or infrastructure
  • Assist in planning for the growth of Checkr’s infrastructure, reliability/resiliency, and resources

Requirements:

  • 8+ years working in a relevant role, including 4+ years of technical leadership experience mentoring engineers
  • 4+ years of experience architecting and administrating observability stacks, either managed or self-hosted (e.g., Datadog, New Relic, Prometheus, Elastic Stack/ELK, OpenTelemetry)
  • Experience with operation of containerized microservices running on the public cloud, asynchronous event processing, and databases
  • Knowledge of Linux, Git, and CI/CD pipelines
  • On-call support of highly available production systems
  • Designing and building new tools to automate repetitive tasks, prevent incidents or improve MTTR using programming language such as Python
  • Experience with automation and Infrastructure as Code using tools like Terraform, Terragrunt, or Cloud Formation
  • Understanding of how application components interact and experience contributing to architectural discussions
  • Unwavering commitment to operational security and best practices
  • Ownership: identify problems, propose solutions, and then coach and guide a team to implement them
  • Connection: motivated to help other teams improve their service reliability and continuous improvement of tooling and services
What we offer:
  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • Up to 25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend

Additional Information:

Job Posted:
June 18, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.