CrawlJobs Logo

SRE Lead Design & Support Engineer

pepsico.com Logo

Pepsico

Location Icon

Location:
Mexico , Miguel Hidalgo

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

This is a critical enabler achieving a high resiliency during operations and also continuously improving through design during the software development lifecycle. The Lead SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model. The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications. Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups. It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.

Job Responsibility:

  • Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
  • Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation
  • Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s
  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, and blameless postmortems
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
  • Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams
  • Actively engage and drive AI Ops adoption across teams

Requirements:

  • 8+ years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • Highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
  • Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
  • Infrastructure: Azure/AWS cloud platforms and/or Client / server environments

Nice to have:

Prior experience involving in shaping transformation developing SRE solutions would be a plus

What we offer:
  • Opportunities to learn and develop every day through a wide range of programs
  • Internal digital platforms that promote self-learning
  • Development programs according to Leadership skills
  • Specialized training according to the role
  • Learning experiences with internal and external providers
  • Recognition programs for seniority, behavior, leadership, moments of life, among others
  • Financial wellness programs that will help you reach your goals in all stages of life
  • A flexibility program that will allow you to balance your personal and work life, adapting your working day to your lifestyle
  • Wellness Line, thousands of Agreements and Discounts, Scholarship programs for your children, Aid Plans for different moments of life

Additional Information:

Job Posted:
January 29, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for SRE Lead Design & Support Engineer

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
301000.00 - 401000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience designing, developing and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • A track record of developing highly available distributed systems using technologies like AWS, MySQL and Kubernetes
  • Meaningful experience contributing in or driving parts of the Incident Lifecycle process, enabling actionable insights that improve the quality culture, reliability, resilience, and system performance
  • 4+ years working in a Site Reliability or Production Engineering team
  • Experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design
  • Experience in making impactful changes in a large code base, and have developed a suite of tools and practices that enable you and your team to do so safely
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team
Job Responsibility
Job Responsibility
  • Own and deliver quarterly goals for your team, lead engineers on your team through ambiguity to solve open-ended problems, and ensure that everyone is supported throughout delivery
  • Support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs
  • Proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis
  • Support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefits
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
358000.00 - 458000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Demonstrate curiosity with empathy, and strong opinions loosely held
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Write high quality code that is easily understood and used by others
  • Thrive in ambiguity, and are comfortable moving from low level language idioms all the way to the architecture of large systems to understand how they work
  • Growth and impact trajectory demonstrates that you have mastered gathering and iterating on feedback from your engineering and cross-functional peers
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental leave
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Affirm is reinventing credit to make it more honest and friendly, giving consume...
Location
Location
Spain
Salary
Salary:
101000.00 - 131000.00 EUR / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Ability to write high quality code that is easily understood and used by others
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
  • Equivalent practical experience or a Bachelor’s degree in a related field
  • Based in Spain for the role
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
  • Participate in an on-call rotation
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefit
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Visa sponsorship
  • Fulltime
Read More
Arrow Right

Senior Product Manager - AppTrust

At JFrog, we’re reinventing DevOps to help the world’s greatest companies innova...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in E2E Product Management, preferably in B2B products and SaaS platforms
  • Experience driving elements of the product development lifecycle such as product vision, go-to-market strategy, driving requirements, UX, and product launch
  • Experience with user-facing products
  • solid understanding of UX and product design
  • Technical experience in Engineering, DevOps, SRE, and Tech Support — a huge advantage
  • Experience in driving strategic initiatives in a cross-organization environment
  • Excellent analytical, interpersonal, and problem-solving skills
Job Responsibility
Job Responsibility
  • Own the full cycle of product development including ideation, competitive analysis, client validation, discovery with R&D, spec writing, launching and monitoring
  • Understand customer needs and gather product requirements, identify market opportunities, and define product vision and strategy
  • Work closely with multiple teams within the company to deliver a high-quality B2D product on schedule, including Sales, Support, Marketing, and Engineering
  • Master the product and lead the requirements through the full lifecycle, from ideation to development and launch
  • Build positive relationships and trust through strong cross-team interactions, and get buy-in for the product vision across internal and external stakeholders
  • Identify, design, experiment, and iterate product decisions by leveraging data and evidence gathered from customer usage and interviews, market research, and usage/adoption metrics
Read More
Arrow Right

Engineering Manager

The Engineering Manager is responsible for leading a team of software engineers ...
Location
Location
France , Paris
Salary
Salary:
65000.00 - 80000.00 EUR / Year
beamy.io Logo
Beamy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7 years of coding experience (back-end / full stack)
  • At least one significant prior experience managing a full-stack engineering squad (2-3 years)
  • Strong leadership to guide your team
  • Extensive technical expertise for coaching
  • Pragmatic mindset for identifying clear, effective solutions
  • Comfortable building, running and continuously iterating on squad rituals
  • Comfortable collaborating with cross-functional counterparts: PM, Product Designer
  • Comfortable supporting and coaching your IC direct reports, and giving them regular feedback
  • Comfortable managing in both French and English, in a remote context
  • Comfortable being hands-on in a full-stack context, through code and design document review or code contributions
Job Responsibility
Job Responsibility
  • Technical Leadership at Squad Level: Collaborate with Product Manager & Designer on roadmap, project management, and strategic planning
  • Provide technical guidance and architectural oversight, with selective hands-on coding (20-30%)
  • Enable engineers through architectural guidance, technical decision-making, and removing technical blockers
  • Oversee code review process and standards
  • Ensure team meets velocity and quality targets through effective prioritization and resource allocation
  • Define and communicate technical direction and architecture decisions for the squad
  • Analyze requirements, assess feasibility, and ensure appropriate technical documentation
  • Guarantee project goal achievement, identify risks early, and orchestrate solutions to blockers
  • Champion best practices (unit testing, TDD, CI/CD, etc.) in collaboration with the QA team
  • Drive implementation of clean code principles, testing standards, release processes, and pair programming culture
What we offer
What we offer
  • Four-day week
  • Professional development plan
  • Sick child leave
  • Mental health benefits
  • Employee Resource Groups (ERG)
  • Fulltime
Read More
Arrow Right

Devops Engineer

Come join us and shape the technological future of Digital Payments! You’ll be p...
Location
Location
Salary
Salary:
Not provided
likereply.com Logo
Like Reply
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 1+ year of experience in a similar role (SRE, Cloud infrastructure/DevOps)
  • Strong understanding of Continuous Integration and Continuous Deployment workflows
  • Hands-on experience with integration tools such as Jenkins or equivalent
  • Expertise in one or more scripting languages (Shell, Python, etc.)
  • Background in configuration management and automation tools (Ansible, ArgoCD)
  • Solid grasp of container technologies like Docker, Kubernetes, Rancher, OpenShift and Helm
  • Experience with monitoring & observability: Prometheus, Grafana, ELK/EFK Stack
  • Experience with leading cloud platforms (MS Azure)
  • Bachelor’s in Computer Science, Computer Engineering or equivalent
  • Solid communication skills and team-oriented mindset
Job Responsibility
Job Responsibility
  • Design and build tools and automated solutions to support hosted services
  • Deploy and maintain Continuous Integration (CI) and Continuous Delivery (CD) workflows
  • Oversee and track continuous deployment processes
  • Assess tools and technologies to consistently enhance the team's expertise and capabilities
  • Work closely with application engineers and provide training to developers when necessary
Read More
Arrow Right

Senior DevOps Engineer - Developer Experience

We're changing how the world works with data. We built the Data Productivity Clo...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
matillion.com Logo
Matillion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience with DevOps and/or SRE
  • Experience with code signing and build management tools (Gradle, Maven, ant etc)
  • Familiarity with software configuration management systems and source code version control systems (specifically, GitHub)
  • Some experience with Developer Portals (specifically Backstage)
  • Metrics driven: you are proficient in measuring success, value and gaps
  • A strong background in software engineering and release engineering, with coding experience in one or more of the following languages/libraries/platforms: Java, JavaScript/Typescript, NodeJS, Python, Go, AWS, Docker, Serverless, React, Terraform
  • Inquisitiveness- digging into problems and solutions to understand the underlying technology
  • Autonomy - ability to work on a task and solve problems independently
  • Motivation - sets personal challenges and constantly looking to stretch themselves
  • Problem solving - recognition of problems and recasting difficult-to-solve problems in order to find unique and innovative solutions
Job Responsibility
Job Responsibility
  • Developer Empathy - Gather holistic feedback from Developers and design solutions to improve their lived experience
  • Generative AI - Innovate ways to use Generative AI for Developer assistance and guide the internal Developer community on AI tools
  • Governance and enforcement of policies and procedures around source control, changes, releases, etc
  • Design and implement powerful, well-maintained, and user-friendly development tools, IDEs, and infrastructure that drive high velocity in code development, testing and version control
  • Streamline workflows - Simplify and optimise development workflows to incorporate automation and eliminate unnecessary processes
  • Perform administration and troubleshooting, and provide support to all engineers
  • Enhance collaboration - Work closely among the different Engineering teams to work as one cohesive unit
  • Measurement-driven efficiency - Familiarity with Engineering metrics such as DORA and how to use them to proactively drive efficiency
  • Develop new skills by working with other members of the team
  • Work with the Team Lead and Manager to identify training goals
What we offer
What we offer
  • Company Equity
  • 27 days paid time off
  • 12 days of Company Holiday
  • 5 days paid volunteering leave
  • Group Mediclaim (GMC)
  • Enhanced parental leave policies
  • MacBook Pro
  • Access to various tools to aid your career development
  • Fulltime
Read More
Arrow Right