CrawlJobs Logo

Site Reliability Engineer - Vice President

India, Pune · Job Posted May 16, 2026
Apply Position
Job Link Share

Job Description

The Site Reliability Engineer (SRE) is a strategic professional accountable for the daily operations, architectural resilience, and overall implementation of SRE principles in a complex, critical, and largescale multi-disciplinary environment. This role requires a comprehensive understanding of multiple technology domains and their interaction to achieve business objectives. As a recognized technical authority, you will apply an in depth understanding of the business impact of technical contributions and provide advice and counsel on strategic solutions. We are seeking a passionate and experienced SRE to join our Production Management team. In this role, you will be instrumental in enhancing the reliability, performance, and efficiency of our Applications and Services. You will drive our strategy for end-to-end observability and resiliency, collaborating across the organization to ensure our services are stable, scalable, and fault tolerant. This is a key role that will influence strategic decisions and foster a culture of technical excellence and accountability.

Job Responsibility

  • Foster a culture of transparency, innovation, and accountability that encourages continuous improvement
  • Communicate the progress and impact of SRE initiatives to stakeholders at all levels
  • Operate effectively within a highly regulated environment, ensuring compliance with all relevant requirements
  • Ensure critical business applications meet stringent operational resilience requirements, including adherence to defined impact tolerances
  • Oversee advanced recovery testing, including Production Swing Tests, Data Recovery Tests, and chaos engineering practices
  • Drive the adoption and development of automation, such as One Touch Recovery solutions, to minimize recovery time
  • Partner with development teams to leverage cloud native services and established resiliency patterns to enhance application reliability
  • Collaborate across the organization to develop and scale observability solutions using modern tools for metrics, logging, and tracing
  • Partner with development teams to effectively instrument applications, providing deep insights into system health and performance

Requirements

  • 13 + Years of deep understanding of SRE concepts, including SLOs, SLIs, error budgets, and toil reduction
  • Demonstrable experience with Disaster Recovery planning, resiliency testing, and fault tolerant distributed system design
  • Proficiency in deploying, managing, and troubleshooting applications on OpenShift/Kubernetes
  • Hands on experience with modern observability tools (e.g., Prometheus, Grafana, Loki, Mimir, Tempo, AppDynamics)
  • Experience with Infrastructure as Code (IaC), configuration management, and automation tools (e.g., Ansible, Terraform)
  • Experience creating, modifying, and managing Helm charts for application deployment
  • Significant professional experience in production management, software development, or an equivalent field, with a strong focus on Site Reliability Engineering
  • Expertise in analyzing complex application, database, network, and OS issues within large scale, customer facing systems
  • A service-oriented attitude combined with excellent problem-solving and strategic thinking skills
  • Strong communication and diplomacy skills, with a proven ability to work effectively across multiple business and technical teams

Nice to have

  • Experience with major public cloud providers (e.g., Google Cloud, AWS, Azure)
  • Proven experience delivering software and infrastructure using Agile frameworks
  • Experience presenting technical strategy to senior and executive level audiences
  • Experience writing or maintaining code in Java, Python, Gco, or similar languages

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer - Vice President

8 matching positions

Site Reliability Engineering Analyst - Assistant Vice President

The Engineer Sr Analyst is an intermediate level position responsible for a vari...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, prioritize, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Contribute to the budgetary requirement definition for assigned product area, develop functional specifications, and create project plans and software release schedules
  • Partner with business and development teams to identify engineering requirements and assist in defining application and system requirements and processes and maintain engineering relationships with the end user/client
  • Ensure requirements/tasks from technology departments and/or end users are communicated to stakeholders
  • Provide solutions and processes in accordance with audit initiatives and requirements and consult with Business Information Security officers (BISOs) and TISOs
  • Exhibit in-depth understanding of engineering concepts and principles
  • Assist with training activities and mentor junior team members
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Automate Core Processes: Design, develop, and implement automation solutions to replace manual activities, repetitive processes, to support migrations to new infrastructure
  • Continuous Improvement: Proactively identify opportunities for process improvements and efficiency gains across the service lifecycle
  • Support AI Integration: Collaborate with development and data science teams to support the seamless integration of services with AI solutions
  • Fulltime
Read More
Arrow Right

National Vice President of Coffee Services

American Food & Vending is seeking an experienced National Vice President of Cof...
Location
Location
United States , Liverpool
Salary
Salary:
Not provided
afvusa.com Logo
American Food & Vending
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in operations, engineering, business, or related field (MBA preferred)
  • 5–10+ years of leadership experience in coffee services, OCS, beverage operations, or equipment-intensive environments
  • Hands-on knowledge of commercial coffee equipment, including installation, maintenance, troubleshooting, and lifecycle management
  • Proven experience leading multi-site or national operations teams
  • Strong process orientation with experience building SOPs, KPIs, and operational dashboards
  • Ability to partner cross-functionally with Sales, Procurement, and Service organizations
  • Willingness to travel nationally 75% of the time
Job Responsibility
Job Responsibility
  • Own the end-to-end operational performance of the coffee services program, including equipment standards, service levels, and field execution
  • Establish and maintain commercial coffee equipment standards across all office coffee environments, including bean-to-cup, single-cup, batch brew, and specialty systems
  • Partner with service, operations, and vendors to ensure equipment reliability, uptime, maintenance protocols, and lifecycle management
  • Evaluate, test, and approve new coffee equipment and technologies for operational scalability and customer experience
  • Develop and oversee equipment deployment, replacement, and upgrade strategies nationwide
  • Lead teams responsible for installation, maintenance, repair, and ongoing service of coffee equipment
  • Define and monitor service-level agreements (SLAs), response times, and performance metrics
  • Ensure consistent quality, cleanliness, and functionality across all customer locations
  • Identify root causes of service issues and implement corrective action plans
  • Develop standardized operating procedures (SOPs) for coffee service operations and equipment handling
What we offer
What we offer
  • Weekly pay
  • 401K with company match
  • Employee Assistance Program
  • Eligible employees offered Medical, Prescription, Dental, and Vision Plans, FSA/HSA
  • Ongoing training and development programs
  • Bonus Programs for eligible positions
  • Fulltime
Read More
Arrow Right

Apps Development Sr Manager - Vice President

A senior-level position responsible for accomplishing results by designing, impl...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of relevant experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering
  • Hands-on working experience with container orchestration using OpenShift and Kubernetes
  • Strong, demonstrable experience with CI/CD tools, specifically Tekton and Harness
  • Extensive experience with observability and monitoring stacks, including Prometheus and Grafana
  • Proficiency in Infrastructure as Code (IaC) and configuration management tools
  • Experience with scripting and automation
  • Ability to work proactively and independently to address project requirements, and articulate issues/challenges with enough lead time to mitigate project delivery risks
  • A history of conducting code reviews and ensuring high standards for infrastructure and automation code
  • Basic knowledge of industry practices and standards in the DevOps and SRE space
  • Consistently demonstrates clear and concise written and verbal communication
Job Responsibility
Job Responsibility
  • Design, build, and maintain the CI/CD infrastructure and tools, with a focus on Tekton and Harness
  • Manage, scale, and secure OpenShift container platforms, ensuring high availability and reliability
  • Develop and manage infrastructure as code (IaC) to automate provisioning and configuration of environments
  • Implement and manage a comprehensive observability stack using tools like Prometheus, Grafana, and others to monitor system health, performance, and reliability
  • Collaborate with development teams to create a seamless developer experience and ensure applications are built with scalability, reliability, and security in mind
  • Utilize in-depth knowledge and skills across multiple infrastructure and development areas to provide technical oversight for the platform
  • Contribute to the formulation of strategies for platform engineering and DevOps functional areas
  • Provide evaluative judgment based on the analysis of factual data in complicated and unique situations, including root cause analysis and problem resolution
  • Impact the DevOps and Platform Engineering area through monitoring delivery of end results and ensuring essential procedures are followed and contribute to defining standards
  • Appropriately assess risk when technical decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients, and assets, by driving compliance with applicable laws, rules, and regulations, adhering to Policy, and applying sound ethical judgment
  • Fulltime
Read More
Arrow Right

Custody Support - Application Support Technical Lead - Vice President

Team/Role Overview: The person would be a part of the Custody Support team suppo...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Practical problem solving and strategic thinking skills
  • Demonstrated leadership, interpersonal skills and relationship building skills
  • Service oriented attitude
  • Ability to work in a fast-paced environment
  • Experience working or leading requirement gathering efforts for multiple large development projects at one-time
  • Proficient using basic technical tools and systems
  • Good interpersonal and communication skills
  • Blockchain and Crypto Currency Technology Fundamentals Understanding of distributed ledgers, consensus mechanisms, and cryptographic principles as they apply to digital asset custody and settlement workflows
  • Cloud and Distributed Technology Support Experience Experience in supporting target technology stacks such as ECS, OpenShift, and microservices at L2/L3 level in distributed, cloud-native environments
  • Site Reliability Engineering (SRE) Hands-on experience with modern observability, monitoring tools, automation tools, and resiliency management — including Open Telemetry, Grafana, and Google Cloud Observability (GCO)
Job Responsibility
Job Responsibility
  • Lead a team of technical experts for the new product offering - Digital Custody Assets
  • Partner with multiple technology teams to ensure appropriate integration of functions to meet goals
  • identify and define necessary system enhancements
  • analyze existing system logic, identify problems
  • and recommend and implements solutions
  • Performing Incident/Outage Management, pro-actively collaborate with Development and Infrastructure partners to identify and remediate stability risks, and engage in crisis management and cross region handholding of issues
  • Investigation of incidents reported across a range of applications within our Custody Digital Assets and Settlements applications
  • Engagement with ITIL processes including Major Incident management, problem management, change management etc
  • Operate independently to identify process bottlenecks and proactively drive improvements by engaging the appropriate teams
  • Experience in performing resiliency activities such as disaster recovery coordination from Production to Contingency site from an Application Support perspective, and Application component level resiliency tests
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Hybrid working model (up to 2 days working at home per week)
  • Fulltime
Read More
Arrow Right

Java Engineering Lead - Vice President

The Engineering Lead is a key role within Citi’s Payment delivery organization. ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on lead developer with technical ability having 10+ years of design & development experience
  • Practitioner and Advocate of CLEAN code practice
  • Practitioner of AI tools in driving productivity and efficiencies
  • Experience in design and development of medium to large-scale applications using open-source tech stack - Spring boot, Microservices, Kafka
  • Strong Java skills
  • Experience with databases: MongoDB
  • Experience in writing unit tests and integration tests using standard frameworks, ensuring minimized technical debt
  • Experience in building CI/CD pipelines and single-click deployment - Harness, Lightspeed, Openshift
  • Ability to drive engineering deliveries and handle multiple concurrent initiatives
  • Experience with testing concepts (TDD, BDD) and JUnit
Job Responsibility
Job Responsibility
  • Provide technical leadership and strategy for solutions, applications, and systems across the platform
  • Define and implement a robust data governance framework, including strategies for data storage, high availability, fault tolerance, and disaster recovery
  • Practice and enforce strong engineering principles and standards to guide the team in achieving automation and delivery goals
  • Provide expertise to identify and translate system requirements into software design artifacts
  • Drive experiments and Proof of Concept (PoC) to assess new solutions and application paths
  • Work proactively & independently to address project requirements, and articulate issues/challenges at the appropriate time to address project delivery risks
  • Follow industry-wide best practices to minimize the technical debt of software deliverables
  • Interface and coordinate tasks with internal and external technical resources, collaborate to provide estimates, develop overall implementation plans, and serve as a lead to implement installation, customization, and integration efforts
  • Apply your skill and experience within a fast-paced operations-centric environment towards developing architecture and design for the regulatory platform at large
  • Fulltime
Read More
Arrow Right

Digital Software Engineering Lead Analyst – Vice President

The Digital S/W Engineer Lead Analyst is a lead-level professional role. This in...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of progressive software development experience, demonstrating expert-level proficiency in JavaScript and Java frameworks (e.g., React.js, Spring Boot), and databases (e.g., Oracle, MongoDB, PostgreSQL)
  • Expert in Modern Application Architecture: Mastery of modern application architecture principles, including microservices, event-driven architectures, serverless, and cloud-native patterns
  • Deep expertise in Data Structures, Algorithms, and Object-Oriented Design Principles with Java
  • Proven leadership in leveraging and integrating Artificial Intelligence (AI) and Machine Learning (ML) tools to optimize development workflows, enhance code quality, and drive intelligent features
  • Extensive experience with Microservices frameworks (e.g., Spring Boot, Quarkus), Event-Driven Services (e.g., Kafka, RabbitMQ), and advanced Cloud-Native Application Development (AWS, Azure, GCP)
  • Multiple years of experience leading the design and implementation of Service-Oriented and Microservices architectures, including advanced REST, GraphQL, and gRPC implementations
  • Full Stack Architecture & Leadership: Demonstrated ability to architect, design, develop, and maintain complex, enterprise-grade full-stack solutions, encompassing both front-end and back-end components of robust web applications, with an emphasis on scalability and performance
  • Front-End Expertise: Expert-level proficiency in designing and developing highly intuitive, performant, and accessible user interfaces using cutting-edge JavaScript frameworks (e.g., React, Angular, Vue), advanced HTML5, and CSS (e.g., SASS/LESS, CSS-in-JS)
  • Back-End Mastery: Extensive experience in architecting and developing scalable server-side logic and sophisticated APIs using languages such as Java, Python, or similar, with a focus on high-throughput and low-latency systems
  • Advanced Database & Data Architecture Expertise: Comprehensive knowledge of SQL and PL/SQL, with a deep understanding of Relational Database Management Systems (RDBMS), particularly Oracle, including advanced database design, performance tuning, data warehousing, and NoSQL databases
Job Responsibility
Job Responsibility
  • Strategic Technical Leadership: Provide expert guidance and strategic oversight across the entire software development lifecycle, partnering continuously with senior stakeholders to align technical solutions with business objectives
  • Architectural Stewardship: Lead the design and evolution of robust, scalable, and secure enterprise applications, defining architectural patterns and ensuring adherence to best practices in cutting-edge technologies and software design patterns
  • Team & Project Leadership: Drive complex engineering initiatives within Agile delivery teams, fostering a culture of collaboration, excellence, and continuous improvement. Lead sprint goal achievement, oversee code quality, and actively participate in and lead broader Citi technical communities and advanced Agile/Scrum processes
  • Mentorship & Coaching: Act as a technical mentor and coach for junior and intermediate engineers, fostering their growth, critical thinking, and advanced problem-solving capabilities
  • Advanced Problem Solving & Troubleshooting: Exhibit mastery in analyzing and resolving intricate coding, application performance, and design challenges. Lead cross-functional efforts to diagnose and troubleshoot complex system issues
  • Proactive Root Cause Analysis: Spearhead thorough investigations to identify systemic root causes of development and performance bottlenecks, leading the implementation of comprehensive, long-term defect resolutions and preventative measures
  • Technical Vision & Acumen: Demonstrate a profound and forward-looking understanding of technical requirements, emerging trends, and their strategic implications for solutions under development, ensuring future-proof designs
  • Containerization, Orchestration & Cloud Strategy: Drive the strategic adoption and optimization of Docker for application containerization, Kubernetes for efficient service orchestration, and other cloud-native technologies to build resilient and scalable infrastructure
  • Communication, Risk & Stakeholder Management: Master effective communication of progress, proactively anticipate and mitigate technical and project bottlenecks, provide expert escalation management, and adeptly identify, assess, track, and manage issues and risks at strategic and operational levels
  • Process and System Optimization: Champion and lead initiatives to streamline, automate, and eliminate redundant processes within architecture, build, delivery, production operations, and across various business areas, driving significant efficiency gains and innovation
  • Fulltime
Read More
Arrow Right

Strategic Initiatives Program Manager – Vice President

Engineer the future of global finance. At Citi, our Tech team doesn’t just suppo...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in software engineering, site reliability engineering (SRE), or technology risk and controls
  • Experience in a program or project management role, delivering complex, cross-functional technology initiatives
  • Proven expertise in analyzing complex application, database, network, and OS issues across distributed, large-scale, customer-facing systems
  • Strong understanding of resiliency principles, including disaster recovery, data recovery, and high-availability architecture
  • Excellent communication skills and a proven ability to work effectively across multiple business and technical teams
  • Bachelor's degree in Computer Science, Engineering, or an equivalent field.
Job Responsibility
Job Responsibility
  • Implement Enhanced Testing and Recovery
  • Design and Architecture
  • Proactive Vulnerability Management
  • Operational Resilience Adherence
  • Performance Measurement and Reporting
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Vice President, Applications Support Technology Lead Analyst

The Apps Support Lead Analyst is a seasoned professional role providing Level 2 ...
Location
Location
Japan , Chiyoda, Tokyo
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–10 years’ experience in L2 application production support in a securities/investment bank or financial services trading environment
  • Demonstrated experience providing trade floor support to Front Office users in Equities or capital markets
  • Excellent business-level English communication (written/verbal)
  • Japanese language capability desirable
  • Proven ability to prioritize and multi-task effectively under extreme time pressure in a real-time trading environment
  • Strong diagnostic skills including analysis of application/server logs, GC logs, thread/heap dumps, and traces
  • Hands-on experience with monitoring/alerting platforms (e.g., ITRS Geneos, Grafana, or equivalents)
  • Working knowledge of Change Management and deployment practices, including CI/CD pipelines and rollback procedures
  • Experience with middleware messaging technologies (IBM MQ, Solace, Kafka, Tibco EMS, or similar)
  • Familiarity with incident/problem management tooling (e.g., ServiceNow/JIRA) and structured RCA/problem management
Job Responsibility
Job Responsibility
  • Provide Level 2 production support for Equities trading applications, acting as the primary technical escalation point for trading-impacting incidents
  • Respond to critical incidents during market hours, executing rapid diagnosis and restoration activities to minimize business disruption
  • Maintain hands-on trade floor coverage, delivering direct support to Front Office users in a high-pressure, real-time environment
  • Serve as a key liaison between business users and Technology (development, infrastructure, vendors), ensuring timely triage, escalation, and resolution
  • Perform deep technical troubleshooting across applications and environments, including analysis of logs and runtime evidence to identify root cause and remediation paths
  • Proactively monitor production using enterprise tooling (e.g., ITRS Geneos) to detect anomalies and prevent outages
  • Execute operational routines including start-of-day checks, continuous monitoring, and regional handover to support global coverage
  • Support production integrity activities, including same-day risk reconciliations and data consistency validation across trading systems
  • Manage change, deployment, and release execution using CI/CD and Change Management controls, including rollback readiness and zero-impact implementation practices
  • Drive service stability through post-incident review, problem management input, and continuous improvement initiatives across stability/efficiency/effectiveness
  • Fulltime
Read More
Arrow Right