This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As an Infrastructure Site Reliability Engineer, you will be responsible for designing, implementing, and managing the infrastructure systems and tools that enable reliability and performance of our technology platforms supporting various business initiatives within CVS Health. This role requires a strong background in infrastructure engineering and a commitment to proactive monitoring, troubleshooting, and optimizing systems for maximum uptime and performance.
Job Responsibility:
Manage and maintain various systems and infrastructure, such as servers, storage, mainframe, iSeries, backup, archive, and recovery
Participate in on-call rotation to ensure availability and uptime of critical systems
Develop and maintain best practices documentation, including system architecture diagrams, standard operating procedures, and runbooks
Perform system and application performance analysis
Streamline and optimize operational processes, procedures, and documentation
Develop, modify, and implement incident and problem management processes
Establish comprehensive SRE process
Collaborate with development teams to participate in code reviews, performance optimization, and application deployment processes
Drive reliability engineering practices, including monitoring, alerting, incident management, capacity planning, and disaster recovery
Automate infrastructure deployments, upgrades, and maintenance tasks
Analyze historical usage patterns and growth projections to forecast future capacity requirements
Establish and maintain monitoring systems to track performance and utilization
Requirements:
7+ years of experience in Infrastructure Engineering, System Administration, or related roles
3+ years of experience with cloud platforms (e.g., Amazon Web Services, Microsoft Azure) and infrastructure-as-code tools (e.g., Terraform, CloudFormation)
2+ years of experience in at least one configuration management tool such as Ansible, Puppet, or Chef
2+ years of experience with containerization technologies such as Docker and container orchestration platforms like Kubernetes
2+ years of experience in networking principles and protocols, including TCP/IP, DNS, load balancing, and firewalls
1+ years of experience with incident management, performance monitoring, and capacity planning tools
Bachelor's degree or equivalent experience (High School Diploma and 4 years relevant experience)
Nice to have:
Excellent troubleshooting and problem-solving skills, with the ability to identify, communicate, and resolve technical issues swiftly
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.