CrawlJobs Logo

Staff Site Reliability Engineer - Observability

https://www.cvshealth.com/ Logo

CVS Health

Location Icon

Location:
United States

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

118450.00 - 284280.00 USD / Year

Job Description:

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Job Responsibility:

  • Design and implement comprehensive observability solutions tailored for edge computing environments
  • Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs
  • Build and optimize dashboards, visualizations, and alerting systems
  • Implement distributed tracing and log aggregation systems
  • Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind
  • Drive proactive identification of issues in edge facilities
  • Lead incident postmortems
  • Develop and maintain tools, scripts, and automation to enhance observability pipelines
  • Evaluate and integrate industry-standard observability tools
  • Optimize observability data storage, retention, and querying
  • Mentor and guide junior SREs and engineers
  • Partner with solution, engineering, and business teams
  • Lead cross-functional initiatives to improve observability
  • Stay current with emerging observability trends, tools, and methodologies
  • Contribute to the development of observability standards, runbooks, and documentation
  • Drive cost optimization for observability infrastructure

Requirements:

  • 7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field
  • 5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar
  • 3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments
  • Bachelor's degree, or equivalent experience (HS diploma + 4 years relevant experience)

Nice to have:

  • Experience with implementation of AIOps
  • Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure
  • Strong proficiency in programming/scripting languages (e.g., Python, java) for automation and tooling in distributed environments
  • Expertise working in edge computing environments with a large number of remote facilities
  • Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing
  • Familiarity with chaos engineering principles to validate observability systems in edge environments
  • Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes
  • Strong problem-solving skills with a proactive, analytical mindset
  • Excellent communication and collaboration skills
  • Ability to mentor and lead technical initiatives
  • Comfortable working in a fast-paced, dynamic environment
  • Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie)
  • Deep understanding of monitoring, logging, and tracing concepts
  • Familiarity with cloud infrastructure, CI/CD pipelines, and edge-specific deployment patterns
What we offer:
  • Affordable medical plan options
  • 401(k) plan with matching company contributions
  • Employee stock purchase plan
  • No-cost wellness screenings
  • Tobacco cessation and weight management programs
  • Confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Retiree medical access

Additional Information:

Job Posted:
September 30, 2025

Expiration:
October 23, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.