Staff Site Reliability Engineer

Staff Site Reliability Engineer - Observability

CVS Health

Location:
United States

Category:
IT - Software Development

Contract Type:
Employment contract

Salary:

118450.00 - 284280.00 USD / Year

Save Job

Job offer has expired

Job Description:

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Job Responsibility:

Design and implement comprehensive observability solutions tailored for edge computing environments
Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs
Build and optimize dashboards, visualizations, and alerting systems
Implement distributed tracing and log aggregation systems
Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind
Drive proactive identification of issues in edge facilities
Lead incident postmortems
Develop and maintain tools, scripts, and automation to enhance observability pipelines
Evaluate and integrate industry-standard observability tools
Optimize observability data storage, retention, and querying
Mentor and guide junior SREs and engineers
Partner with solution, engineering, and business teams
Lead cross-functional initiatives to improve observability
Stay current with emerging observability trends, tools, and methodologies
Contribute to the development of observability standards, runbooks, and documentation
Drive cost optimization for observability infrastructure

Requirements:

7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field
5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar
3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments
Bachelor's degree, or equivalent experience (HS diploma + 4 years relevant experience)

Nice to have:

Experience with implementation of AIOps
Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure
Strong proficiency in programming/scripting languages (e.g., Python, java) for automation and tooling in distributed environments
Expertise working in edge computing environments with a large number of remote facilities
Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing
Familiarity with chaos engineering principles to validate observability systems in edge environments
Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes
Strong problem-solving skills with a proactive, analytical mindset
Excellent communication and collaboration skills
Ability to mentor and lead technical initiatives
Comfortable working in a fast-paced, dynamic environment
Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie)
Deep understanding of monitoring, logging, and tracing concepts
Familiarity with cloud infrastructure, CI/CD pipelines, and edge-specific deployment patterns

What we offer:

Affordable medical plan options
401(k) plan with matching company contributions
Employee stock purchase plan
No-cost wellness screenings
Tobacco cessation and weight management programs
Confidential counseling and financial coaching
Paid time off
Flexible work schedules
Family leave
Dependent care resources
Colleague assistance programs
Tuition assistance
Retiree medical access

Additional Information:

Job Posted:
September 30, 2025

Expiration:
October 23, 2025

Employment Type:

Fulltime

Work Type:

Remote work

CVS Health - All Job Offers

Job Link Share: