CrawlJobs Logo

Site Reliability Engineer - Vice President

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Pune

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Site Reliability Engineer (SRE) is a strategic professional accountable for the daily operations, architectural resilience, and overall implementation of SRE principles in a complex, critical, and largescale multi-disciplinary environment. This role requires a comprehensive understanding of multiple technology domains and their interaction to achieve business objectives. As a recognized technical authority, you will apply an in depth understanding of the business impact of technical contributions and provide advice and counsel on strategic solutions. We are seeking a passionate and experienced SRE to join our Production Management team. In this role, you will be instrumental in enhancing the reliability, performance, and efficiency of our Applications and Services. You will drive our strategy for end-to-end observability and resiliency, collaborating across the organization to ensure our services are stable, scalable, and fault tolerant. This is a key role that will influence strategic decisions and foster a culture of technical excellence and accountability.

Job Responsibility:

  • Foster a culture of transparency, innovation, and accountability that encourages continuous improvement
  • Communicate the progress and impact of SRE initiatives to stakeholders at all levels
  • Operate effectively within a highly regulated environment, ensuring compliance with all relevant requirements
  • Ensure critical business applications meet stringent operational resilience requirements, including adherence to defined impact tolerances
  • Oversee advanced recovery testing, including Production Swing Tests, Data Recovery Tests, and chaos engineering practices
  • Drive the adoption and development of automation, such as One Touch Recovery solutions, to minimize recovery time
  • Partner with development teams to leverage cloud native services and established resiliency patterns to enhance application reliability
  • Collaborate across the organization to develop and scale observability solutions using modern tools for metrics, logging, and tracing
  • Partner with development teams to effectively instrument applications, providing deep insights into system health and performance

Requirements:

  • 13 + Years of deep understanding of SRE concepts, including SLOs, SLIs, error budgets, and toil reduction
  • Demonstrable experience with Disaster Recovery planning, resiliency testing, and fault tolerant distributed system design
  • Proficiency in deploying, managing, and troubleshooting applications on OpenShift/Kubernetes
  • Hands on experience with modern observability tools (e.g., Prometheus, Grafana, Loki, Mimir, Tempo, AppDynamics)
  • Experience with Infrastructure as Code (IaC), configuration management, and automation tools (e.g., Ansible, Terraform)
  • Experience creating, modifying, and managing Helm charts for application deployment
  • Significant professional experience in production management, software development, or an equivalent field, with a strong focus on Site Reliability Engineering
  • Expertise in analyzing complex application, database, network, and OS issues within large scale, customer facing systems
  • A service-oriented attitude combined with excellent problem-solving and strategic thinking skills
  • Strong communication and diplomacy skills, with a proven ability to work effectively across multiple business and technical teams

Nice to have:

  • Experience with major public cloud providers (e.g., Google Cloud, AWS, Azure)
  • Proven experience delivering software and infrastructure using Agile frameworks
  • Experience presenting technical strategy to senior and executive level audiences
  • Experience writing or maintaining code in Java, Python, Gco, or similar languages

Additional Information:

Job Posted:
May 16, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer - Vice President

Senior Vice President, Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 8+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 5+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS.
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organization
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

Vice President - Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

DevOps, Site Reliability Engineer, Vice President

The Vice President, Technology (DevOps/SRE) will lead the engineering and operat...
Location
Location
United States , Jersey City
Salary
Salary:
142320.00 - 213480.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
June 15, 2026
Flip Icon
Requirements
Requirements
  • 6-10 years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering, with demonstrated ownership of production platforms and delivery outcomes
  • Hands-on administration and troubleshooting skills across Linux and Windows, including strong command-line diagnostics and log analysis
  • Strong experience with Kubernetes and/or OpenShift, including Helm-based deployments and cluster troubleshooting
  • Experience with automation/configuration management (Ansible and Ansible Tower/Starfleet or equivalent) and a strong bias toward eliminating manual operational work
  • Demonstrated experience driving vulnerability remediation, patching, and platform hardening in partnership with security/compliance teams
  • Proven ability to plan and execute platform migrations and upgrades (OS, middleware, databases), including change management, runbooks, and production readiness
  • Strong communication and stakeholder management skills
  • able to influence engineering teams and senior leaders while remaining hands-on in critical technical work
Job Responsibility
Job Responsibility
  • CI/CD ownership: Architect, implement, and operate scalable CI/CD pipelines and release workflows
  • define standards for build, test, security scanning, and deployment automation
  • Tooling and platform engineering: Provide deep expertise across Jenkins, UDeploy, Tekton, Harness (or equivalent) including architecture, configuration, upgrades, and governance
  • Incident and pipeline triage: Diagnose and remediate failed pipelines (Jenkins/UDeploy) and deployment issues quickly
  • drive root-cause analysis and implement preventative controls
  • Hands-on systems administration: Perform command-line troubleshooting and administration across Linux and Windows
  • partner with infrastructure teams to resolve OS, network, and runtime issues impacting production
  • Platform migrations and upgrades: Lead and execute OS (e.g., RHEL) and platform upgrade initiatives across middleware and databases
  • plan cutovers, rollback strategies, and production readiness
  • Middleware lifecycle management: Coordinate upgrades for critical runtimes and middleware (Node.js, Python, JDK, Nginx, Tomcat)
What we offer
What we offer
  • medical, dental & vision coverage
  • 401(k)
  • life, accident, and disability insurance
  • wellness programs
  • paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Analyst - Assistant Vice President

The Engineer Sr Analyst is an intermediate level position responsible for a vari...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years of relevant experience in an Engineering role
  • Experience working in Financial Services or a large complex and/or global environment
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, prioritize, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement
Job Responsibility
Job Responsibility
  • Contribute to the budgetary requirement definition for assigned product area, develop functional specifications, and create project plans and software release schedules
  • Partner with business and development teams to identify engineering requirements and assist in defining application and system requirements and processes and maintain engineering relationships with the end user/client
  • Ensure requirements/tasks from technology departments and/or end users are communicated to stakeholders
  • Provide solutions and processes in accordance with audit initiatives and requirements and consult with Business Information Security officers (BISOs) and TISOs
  • Exhibit in-depth understanding of engineering concepts and principles
  • Assist with training activities and mentor junior team members
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Automate Core Processes: Design, develop, and implement automation solutions to replace manual activities, repetitive processes, to support migrations to new infrastructure
  • Continuous Improvement: Proactively identify opportunities for process improvements and efficiency gains across the service lifecycle
  • Support AI Integration: Collaborate with development and data science teams to support the seamless integration of services with AI solutions
  • Fulltime
Read More
Arrow Right

Strategic Initiatives Program Manager – Vice President

Engineer the future of global finance. At Citi, our Tech team doesn’t just suppo...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in software engineering, site reliability engineering (SRE), or technology risk and controls
  • Experience in a program or project management role, delivering complex, cross-functional technology initiatives
  • Proven expertise in analyzing complex application, database, network, and OS issues across distributed, large-scale, customer-facing systems
  • Strong understanding of resiliency principles, including disaster recovery, data recovery, and high-availability architecture
  • Excellent communication skills and a proven ability to work effectively across multiple business and technical teams
  • Bachelor's degree in Computer Science, Engineering, or an equivalent field.
Job Responsibility
Job Responsibility
  • Implement Enhanced Testing and Recovery
  • Design and Architecture
  • Proactive Vulnerability Management
  • Operational Resilience Adherence
  • Performance Measurement and Reporting
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Project Engineer

Mechanical Project Engineer is a vital contributor to the success of our constru...
Location
Location
United States , Abilene, Texas
Salary
Salary:
65000.00 - 80000.00 USD / Year
rkindustries.com Logo
RK Mechanical, Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in a related position
  • College/university graduate or equivalent combination of skills and experience generally expected for specified technical roles
  • Fully competent in all conventional aspects of subject matter or functional area
  • Plans and conducts work requiring judgment in independent evaluation, selection and substantial adaptation/modification of standards
  • Devises new solutions to problems encountered
  • Independently performs most assignments with instruction
  • Receives guidance for unusual or complex problems and supervisor approval for changes in standards
Job Responsibility
Job Responsibility
  • Manage contractual agreements with owners, contractors, subcontractors, material suppliers, field staff, and within RKMI’s management system
  • Ensure daily corporate documentation is completed and up to date, including time cards, daily reports, additional work authorizations, receiving documents, as-built drawings, etc.
  • Negotiate terms, conditions, and scope of work for contractual agreements issued to RKMI in accordance with corporate policies and procedures, and estimate bid proposal
  • Prepare and distribute initial project budget
  • Coordinate and attend RKMI in-house pre-construction meetings
  • Ensure permits and/or licenses are obtained and current for project
  • Coordinate timely completion and thorough buy-out procedures on materials and equipment, in conjunction with the superintendent and purchasing department, with emphasis on maintaining all buy-outs under the established budget
  • Ensure superintendent’s take-offs are complete, accurate and on time
  • Buy-out, negotiate and issue all lower-tier subcontract(s), with emphasis on complete scopes in compliance with the contract documents and within the established budget
  • Oversee and coordinate the project submittal approval process
What we offer
What we offer
  • competitive benefits
  • hands-on training and development opportunities through RK University
  • accredited apprentice program
  • leadership and technical learning opportunities
Read More
Arrow Right

Digital Software Engineering Lead Analyst – Vice President

The Digital S/W Engineer Lead Analyst is a lead-level professional role. This in...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of progressive software development experience, demonstrating expert-level proficiency in JavaScript and Java frameworks (e.g., React.js, Spring Boot), and databases (e.g., Oracle, MongoDB, PostgreSQL)
  • Expert in Modern Application Architecture: Mastery of modern application architecture principles, including microservices, event-driven architectures, serverless, and cloud-native patterns
  • Deep expertise in Data Structures, Algorithms, and Object-Oriented Design Principles with Java
  • Proven leadership in leveraging and integrating Artificial Intelligence (AI) and Machine Learning (ML) tools to optimize development workflows, enhance code quality, and drive intelligent features
  • Extensive experience with Microservices frameworks (e.g., Spring Boot, Quarkus), Event-Driven Services (e.g., Kafka, RabbitMQ), and advanced Cloud-Native Application Development (AWS, Azure, GCP)
  • Multiple years of experience leading the design and implementation of Service-Oriented and Microservices architectures, including advanced REST, GraphQL, and gRPC implementations
  • Full Stack Architecture & Leadership: Demonstrated ability to architect, design, develop, and maintain complex, enterprise-grade full-stack solutions, encompassing both front-end and back-end components of robust web applications, with an emphasis on scalability and performance
  • Front-End Expertise: Expert-level proficiency in designing and developing highly intuitive, performant, and accessible user interfaces using cutting-edge JavaScript frameworks (e.g., React, Angular, Vue), advanced HTML5, and CSS (e.g., SASS/LESS, CSS-in-JS)
  • Back-End Mastery: Extensive experience in architecting and developing scalable server-side logic and sophisticated APIs using languages such as Java, Python, or similar, with a focus on high-throughput and low-latency systems
  • Advanced Database & Data Architecture Expertise: Comprehensive knowledge of SQL and PL/SQL, with a deep understanding of Relational Database Management Systems (RDBMS), particularly Oracle, including advanced database design, performance tuning, data warehousing, and NoSQL databases
Job Responsibility
Job Responsibility
  • Strategic Technical Leadership: Provide expert guidance and strategic oversight across the entire software development lifecycle, partnering continuously with senior stakeholders to align technical solutions with business objectives
  • Architectural Stewardship: Lead the design and evolution of robust, scalable, and secure enterprise applications, defining architectural patterns and ensuring adherence to best practices in cutting-edge technologies and software design patterns
  • Team & Project Leadership: Drive complex engineering initiatives within Agile delivery teams, fostering a culture of collaboration, excellence, and continuous improvement. Lead sprint goal achievement, oversee code quality, and actively participate in and lead broader Citi technical communities and advanced Agile/Scrum processes
  • Mentorship & Coaching: Act as a technical mentor and coach for junior and intermediate engineers, fostering their growth, critical thinking, and advanced problem-solving capabilities
  • Advanced Problem Solving & Troubleshooting: Exhibit mastery in analyzing and resolving intricate coding, application performance, and design challenges. Lead cross-functional efforts to diagnose and troubleshoot complex system issues
  • Proactive Root Cause Analysis: Spearhead thorough investigations to identify systemic root causes of development and performance bottlenecks, leading the implementation of comprehensive, long-term defect resolutions and preventative measures
  • Technical Vision & Acumen: Demonstrate a profound and forward-looking understanding of technical requirements, emerging trends, and their strategic implications for solutions under development, ensuring future-proof designs
  • Containerization, Orchestration & Cloud Strategy: Drive the strategic adoption and optimization of Docker for application containerization, Kubernetes for efficient service orchestration, and other cloud-native technologies to build resilient and scalable infrastructure
  • Communication, Risk & Stakeholder Management: Master effective communication of progress, proactively anticipate and mitigate technical and project bottlenecks, provide expert escalation management, and adeptly identify, assess, track, and manage issues and risks at strategic and operational levels
  • Process and System Optimization: Champion and lead initiatives to streamline, automate, and eliminate redundant processes within architecture, build, delivery, production operations, and across various business areas, driving significant efficiency gains and innovation
  • Fulltime
Read More
Arrow Right

Vice President, Applications Support Technology Lead Analyst

The Apps Support Lead Analyst is a seasoned professional role providing Level 2 ...
Location
Location
Japan , Chiyoda, Tokyo
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5–10 years’ experience in L2 application production support in a securities/investment bank or financial services trading environment
  • Demonstrated experience providing trade floor support to Front Office users in Equities or capital markets
  • Excellent business-level English communication (written/verbal)
  • Japanese language capability desirable
  • Proven ability to prioritize and multi-task effectively under extreme time pressure in a real-time trading environment
  • Strong diagnostic skills including analysis of application/server logs, GC logs, thread/heap dumps, and traces
  • Hands-on experience with monitoring/alerting platforms (e.g., ITRS Geneos, Grafana, or equivalents)
  • Working knowledge of Change Management and deployment practices, including CI/CD pipelines and rollback procedures
  • Experience with middleware messaging technologies (IBM MQ, Solace, Kafka, Tibco EMS, or similar)
  • Familiarity with incident/problem management tooling (e.g., ServiceNow/JIRA) and structured RCA/problem management
Job Responsibility
Job Responsibility
  • Provide Level 2 production support for Equities trading applications, acting as the primary technical escalation point for trading-impacting incidents
  • Respond to critical incidents during market hours, executing rapid diagnosis and restoration activities to minimize business disruption
  • Maintain hands-on trade floor coverage, delivering direct support to Front Office users in a high-pressure, real-time environment
  • Serve as a key liaison between business users and Technology (development, infrastructure, vendors), ensuring timely triage, escalation, and resolution
  • Perform deep technical troubleshooting across applications and environments, including analysis of logs and runtime evidence to identify root cause and remediation paths
  • Proactively monitor production using enterprise tooling (e.g., ITRS Geneos) to detect anomalies and prevent outages
  • Execute operational routines including start-of-day checks, continuous monitoring, and regional handover to support global coverage
  • Support production integrity activities, including same-day risk reconciliations and data consistency validation across trading systems
  • Manage change, deployment, and release execution using CI/CD and Change Management controls, including rollback readiness and zero-impact implementation practices
  • Drive service stability through post-incident review, problem management input, and continuous improvement initiatives across stability/efficiency/effectiveness
  • Fulltime
Read More
Arrow Right