Staff Observability Operations Engineer Job at CVS Health (Hartford)

Job Description

We are currently seeking several experienced and highly skilled Staff Observability Operations Engineers with a strong background in Site Reliability Engineering (SRE), modern observability practices, and the management and implementation of observability and event management platforms. Responsibilities include deploying observability solutions, administration of platforms, release management, system upgrades, integrations, troubleshooting incidents, and continuous planning to enhance platform performance. Successful candidates will play a key role in ensuring our observability infrastructure meets the current and future needs of CVS Health’s dynamic environment.

Job Responsibility

Deploy and implement modern observability solutions
Manage and administer observability and event management platforms
Coordinate and manage release cycles for observability platforms
Troubleshoot and resolve incidents related to observability platforms
Continuously monitor and enhance platform performance
Collaborate with cross-functional stakeholders
Provide training and mentoring to junior engineers
Ensure compliance and security of observability platforms
Maintain documentation of observability platform configurations
Generate and analyze reports on platform performance and capacity

Requirements

7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
Experience developing and administering ServiceNow ITOM event management solutions
Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
Hands-on experience deploying, managing, and administering observability platforms
Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
Proven ability to troubleshoot and resolve complex technical issues
Experience monitoring platform performance and implementing enhancements to support scalability
Knowledge of compliance and security standards related to observability platforms
Excellent communication skills, both verbal and written
Experience with configuring and leveraging source code management tools and workflows
Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON
Preferred certifications: ITIL 4 Practitioner, DevOps Institute Observability Foundation, ServiceNow CIS-Event Management Implementer, xMatters Integrator

Nice to have

ITIL 4 Practitioner: Monitoring and Event Management
DevOps Institute Observability Foundation
DevOps Institute Site Reliability Engineering Foundation or Practitioner
ServiceNow CIS-Event Management Implementer
ServiceNow Certified Application Developer
xMatters Integrator

What we offer

Affordable medical plan options
a 401(k) plan (including matching company contributions)
an employee stock purchase plan
No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
confidential counseling and financial coaching
Paid time off
flexible work schedules
family leave
dependent care resources
colleague assistance programs
tuition assistance
retiree medical access

CVS Health - All Job Offers

Select Country

Staff Observability Operations Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Staff Observability Operations Engineer

Principal Network Engineer, Operations & Observability

Staff Operations AI Engineer

Staff Software Engineer, DevProd (Observability)

Staff Observability Data Infrastructure Engineer

Staff Security Software Engineer - Security Operations

Staff Software Engineer, Add-on Operations

Staff Software engineer - Authentication and Security Observability

Staff Software Engineer, Add-on Operations

Our AI answers in your language