CrawlJobs Logo

Sre design & support engineer

pepsico.com Logo

Pepsico

Location Icon

Location:
India , Hyderabad

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for a self-driven, software engineering mindset SRE engineer to • Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes • Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation, The SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model. The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications. Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.

Job Responsibility:

  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
  • Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams
  • Actively engage and drive AI Ops adoption across teams

Requirements:

  • 8-11 years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
  • The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
  • Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
  • Infrastructure: Azure/AWS cloud platforms and/or Client / server environments

Nice to have:

Prior experience involving in shaping transformation developing SRE solutions would be a plus

Additional Information:

Job Posted:
January 15, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Sre design & support engineer

DevOps and SRE Engineer

The DevOps and SRE Engineer will be responsible for building and maintaining hig...
Location
Location
Salary
Salary:
Not provided
aciinfotech.com Logo
ACI Infotech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or related field
  • 5+ years of experience in DevOps/SRE roles supporting high-availability SaaS
  • Proven expertise in Kubernetes administration (EKS, GKE, or AKS)
  • Strong experience with Terraform, Helm, and GitOps pipelines
  • Skilled in CI/CD pipeline design and maintenance
  • Knowledge of monitoring, alerting, and logging (Prometheus, Grafana, OpenTelemetry)
  • Strong fundamentals in cloud networking and security
  • Calm, methodical, and automation-first mindset
Job Responsibility
Job Responsibility
  • Design and maintain CI/CD pipelines with progressive delivery
  • Operate and scale EKS, GKE, or AKS clusters with strong multi-tenancy
  • Instrument systems using Prometheus, Grafana, and OpenTelemetry
  • Run incident response, postmortems, and capacity planning
  • Harden networking, IAM, and secret management
  • Deliver automated, repeatable environments using GitOps and IaC
  • Ensure clear SLOs with meaningful alerting and manage error budgets
  • Drive cloud cost efficiency per transaction while maintaining reliability
  • Fulltime
Read More
Arrow Right

Hadoop SRE Engineer - VP

The Engineering Lead Analyst is a senior level position responsible for leading ...
Location
Location
United States , Tampa, Florida; Irving, Texas
Salary
Salary:
113840.00 - 170760.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
  • Proven track record of designing highly available platforms and services supporting various types of workloads
Job Responsibility
Job Responsibility
  • Serve as a technology subject matter expert for internal and external stakeholders
  • Provide direction for all firm mandated controls and compliance initiatives
  • Create a technology domain roadmap for Cloudera Hadoop Platform and Hadoop on Google Cloud Platform
  • Ensure that all integration of functions meet business goals
  • Define necessary system enhancements to deploy new products and process enhancements
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Exhibit knowledge of how own specialty area contributes to the business and apply knowledge of competitors, products and services
  • Advise or mentor junior team members
  • Impact the engineering function by influencing decisions through advice, counsel or facilitating services
What we offer
What we offer
  • medical, dental & vision coverage
  • 401(k)
  • life, accident, and disability insurance
  • wellness programs
  • paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
  • Fulltime
Read More
Arrow Right

Expert Site Reliability Engineer

Expert Site Reliability Engineer provides technical expertise and strategic guid...
Location
Location
India
Salary
Salary:
Not provided
uk.alterahealth.com Logo
Altera Digital Health Inc. UK
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree (Preferred)
  • 8+ years relevant work experience
  • 5–7 years Expert level experience providing systems engineering in assigned product
  • 8+ years experience with healthcare products in a support, development or consultancy environment
  • Experience with Windows Server and IIS
  • Experience with SQL
  • Experience in Application support
Job Responsibility
Job Responsibility
  • Provide continual technical guidance and support to the client on an ongoing basis
  • Collaborate with the internal technical teams to ensure successful implementation and integration of the proposed solutions
  • Collaborate with business stakeholders and TAM to understand business requirements and objectives
  • Design solutions that align with Hosting best practices, industry standards, and organizational business priorities
  • Develop and document overall technical architecture for the client
  • Design and document integration of various systems, components, and third-party services
  • Create architectural diagrams and documentation
  • Identify potential technical risks and provide mitigation strategies
  • Proactively address the challenges related to project deliverables and client environments
  • Review Control systems for your assigned client on a weekly basis and take appropriate actions to mitigate issues
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

At Ledger, we are looking for an experienced Reliability Engineer to join our SR...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years on cloud engineering at scale, on organizations operating SaaS solutions
  • Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
  • Strong knowledge on observability practices, with experience implementing and managing Logging, Monitoring and Alerting framework with solutions such as Datadog or Prometheus/Grafana/Loki.
  • Experience of cross-functional work and the ability to demonstrate a collaborative approach with regards to building key relationships across the organization and define projects scope, goals, plan and deliverables
  • Customer focused with the ability to identify and understand both internal and external customer's needs
  • Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business
  • Excellent presentation and written communication
  • Ability to deal with ambiguity, high level of pressure and rapidly changing environments
  • Engineering degree.
Job Responsibility
Job Responsibility
  • Participate in building a DevOps / SRE culture and enable the transition to modern infrastructure management and deployment practices
  • Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence and challenge scope / deadlines
  • Perform integration of platform software components
  • Participate to design and deliver solutions to improve the availability, scalability, latency, and efficiency of systems
  • Influence and create standards & best practices in support of service level objectives
  • Automate key SRE metrics including SLOs/SLAs and error budgets
  • Provide expert support to our level-2/application support team, to troubleshoot priority incidents, and conduct post-mortems
  • Apply analytics on past incidents and usage patterns to predict issues and take proactive actions
  • Ensure control of technical debt and promote quality practices
  • Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability
What we offer
What we offer
  • Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow
  • Flexibility: A hybrid work policy
  • Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks and drinks
  • Medical: Comprehensive health insurance policy offering extensive medical, dental and vision care coverage
  • Well-being: Personal development, coaching & fitness with our dedicated partners
  • Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days
  • High tech: Access to high performance office equipment and gadgets, including Apple products
  • Transport: Ledger reimburses part of your preferred means of transportation
  • Discounts: Employee discount on all our products.
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

The InfraOps team’s primary goal is to enable and empower Kiddom’s engineering b...
Location
Location
United States , New York City
Salary
Salary:
160000.00 - 200000.00 USD / Year
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science or a related field
  • 5+ years professional software engineering experience
  • Experience with Java, or Python, Go, Clojure in a production environment
  • Experience designing and building REST APIs
  • Exposure to authorization technologies (OAuth)
  • Experience with continuous integration and automation tools and processes
  • Strong knowledge of design patterns and software engineering best practices
  • Excellent problem solving and debugging skills
  • Strong acumen or exposure to DevOps or SRE methodologies
  • Keen sense for SecOps.
Job Responsibility
Job Responsibility
  • Evangelizing and fostering a healthy DevOps culture here at Kiddom, working with teams to establish best practices and help guide new and existing services.
  • Practicing Infrastructure as Code (IaC) wherever possible, giving us the confidence in repeatable processes that can be automated.
  • Grow our DevOps efforts from small scale to large scale multi-region
  • Share ownership of the entire infrastructure architecture
  • Aim for high availability, high resiliency
  • Support the engineering team with tools to evaluate the performance of their code in production environments, speed up CI/CD pipeline, & feature verification
  • support the engineering team with tools to speed up CI/CD pipeline, feature verification
  • Design and build a scalable, generalized framework for third-party API integrations
  • Leverage existing infrastructure and components to build RESTful web services
  • Build APIs and robust testing environments for internal and external developers
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year.
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State.
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
358000.00 - 458000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience designing, developing, advocating as a point subject of reference, and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
  • Track record of managing, driving and improving the Incident Livecycle process from live incident management through retrospective and post-incident analysis to provide actional insights to enhance overall system reliability, resilience, and performance
  • 7+ years experience in Site Reliability or Production Engineering teams
  • Demonstrate curiosity with empathy, and strong opinions loosely held
  • Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
  • Write high quality code that is easily understood and used by others
  • Thrive in ambiguity, and are comfortable moving from low level language idioms all the way to the architecture of large systems to understand how they work
  • Growth and impact trajectory demonstrates that you have mastered gathering and iterating on feedback from your engineering and cross-functional peers
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team and key stakeholders of an organization
Job Responsibility
Job Responsibility
  • Set technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects
  • Collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed
  • Act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes
  • Take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental leave
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps ...
Location
Location
Poland
Salary
Salary:
301000.00 - 401000.00 PLN / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience designing, developing and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • A track record of developing highly available distributed systems using technologies like AWS, MySQL and Kubernetes
  • Meaningful experience contributing in or driving parts of the Incident Lifecycle process, enabling actionable insights that improve the quality culture, reliability, resilience, and system performance
  • 4+ years working in a Site Reliability or Production Engineering team
  • Experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design
  • Experience in making impactful changes in a large code base, and have developed a suite of tools and practices that enable you and your team to do so safely
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team
Job Responsibility
Job Responsibility
  • Own and deliver quarterly goals for your team, lead engineers on your team through ambiguity to solve open-ended problems, and ensure that everyone is supported throughout delivery
  • Support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs
  • Proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis
  • Support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts
  • Foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • Help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefits
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right