This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a critical enabler achieving a high resiliency during operations and also continuously improving through design during the software development lifecycle. The Lead SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model. The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications. Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups. It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.
Job Responsibility:
Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation
Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s
Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
Collaborates with Engineering & support teams, including participation in escalations, and blameless postmortems
Work closely with customer-facing support teams to empower them with SRE insights and tooling
Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
Continuously optimize the L2/support operations work via SRE workflow automation
Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams
Actively engage and drive AI Ops adoption across teams
Requirements:
8+ years of work experience evolving to a SRE engineer
3-5 years of experience in continuously improving and transforming IT operations ways of working
Bachelor’s degree in Computer Science, Information Technology or a related field
Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
Highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams
A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes
Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
A firm understanding of cloud archticture for distributed environments
Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
Infrastructure: Azure/AWS cloud platforms and/or Client / server environments
Nice to have:
Prior experience involving in shaping transformation developing SRE solutions would be a plus
What we offer:
Opportunities to learn and develop every day through a wide range of programs
Internal digital platforms that promote self-learning
Development programs according to Leadership skills
Specialized training according to the role
Learning experiences with internal and external providers
Recognition programs for seniority, behavior, leadership, moments of life, among others
Financial wellness programs that will help you reach your goals in all stages of life
A flexibility program that will allow you to balance your personal and work life, adapting your working day to your lifestyle
Wellness Line, thousands of Agreements and Discounts, Scholarship programs for your children, Aid Plans for different moments of life