Site Reliability Engineering Lead Job at Citi (New York)

Site Reliability Engineering Lead

Citi

Location:
United States , New York

Category:
IT - Administration

Contract Type:
Not provided

Salary:

142320.00 - 213480.00 USD / Year

Save Job

Apply Position

Job Description:

We are seeking an experienced and motivated team member to support our AI and DevOps Platform Support team in North America. This role is responsible for contributing to the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used across the organization. The ideal candidate will help lead a team of SRE and Support engineers, facilitate incident and problem resolution, and collaborate with engineering and development teams to enhance platform services and supportability. The role includes short‑term planning and coordination of actions and resources within the team.

Job Responsibility:

Demonstrates a strong understanding of how application support contributes to the overall technology function and organizational objectives
Assist with vendor relationship management, including coordination with offshore managed services
Support efforts to improve service levels for end users by enhancing operational efficiencies and strengthening incident management, problem management, and knowledge‑sharing practices
Partner with development teams to guide improvements in application stability and supportability
Contribute to frameworks for managing capacity, throughput, and latency
Assist in defining and implementing application onboarding guidelines and standards
Support team members by fostering a collaborative environment and encouraging skill development
Participate in cost‑reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training
Participate in business review meetings to help align technology tools and strategies with business requirements
Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program
Perform other duties and functions as assigned
Support platform leadership in defining the platform roadmap and partnering with engineering teams and business stakeholders
Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills
Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency
Support the enterprise‑wide observability strategy, including monitoring, logging, tracing, and alerting
Maintain hands‑on familiarity with platform architecture and services as needed for operational support
Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed
Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments

Requirements:

6–10 years of relevant experience in a hands‑on technical or support leadership role
Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long‑term maintainability
Experience working with senior stakeholders or technology partners
Demonstrated experience supporting IT service improvements or platform stability initiatives
Strong communication and presentation skills, with the ability to convey technical concepts clearly
Experience supporting or contributing to technical roadmaps or operational workstreams
Experience participating in resilience‑related activities such as incident simulations, disaster recovery exercises, or stability testing
Ability to collaborate with cross‑functional support teams and technology groups
Strong organizational and workload‑planning skills
Consistently demonstrates clear and concise written and verbal communication skills
Ability to communicate appropriately with relevant stakeholders
Working knowledge of Generative AI concepts preferred
Experience with CI/CD and configuration management tools preferred
Experience with Red Hat OpenShift or similar Kubernetes technologies preferred
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred
Experience writing or maintaining code in Java, Python, Go, or similar languages preferred
Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred
Bachelor’s/University degree required
Master’s degree preferred

Nice to have:

Working knowledge of Generative AI concepts
Experience with CI/CD and configuration management tools
Experience with Red Hat OpenShift or similar Kubernetes technologies
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis
Experience writing or maintaining code in Java, Python, Go, or similar languages
Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK)

What we offer:

medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
discretionary and formulaic incentive and retention awards

Additional Information:

Job Posted:
March 21, 2026

Expiration:
May 15, 2026

Employment Type:

Fulltime

Work Type:

Hybrid work

Citi - All Job Offers

Job Link Share:

Site Reliability Engineering Lead

Citi

Location:
United States , New York

Category:
IT - Administration

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
March 21, 2026

Expiration:
May 15, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Site Reliability Engineering Lead

Lead Site Reliability Engineer

Site Reliability Engineering Support Lead

Site Reliability Engineering Manager

Site Reliability Engineer Application Development Technical Lead Analyst

Lead Site Reliability Engineer

Senior Site Reliability Engineer