This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Orion Tech- SRE Lead is a hands-on leader responsible for shaping and delivering best-in-class Observability for our Premier Customers in Services Technology. This role reports into the Head of SRE Services and sits alongside other members of SRE enablement team. You will define the long-term vision, build and scale modern Observability & Monitoring, Toil Reduction by building Efficiency capabilities across business lines, and lead a small team of SREs. This is a blended leadership and engineering role – the ideal candidate pairs strategic vision with the technical depth to resolve real-world telemetry challenges across on-prem, cloud, and container-based environments (ECS, Kubernetes, etc.).
Job Responsibility:
Define and own the roadmap for Engineering enablers for Project Orion team aligned with enterprise reliability and SRE Services organization goals
Translate Organization strategy into an actionable delivery plan in partnership with Services Products, Operations & Engineering function, delivering incremental, high-value milestones
Understand Critical Business Services functional scope and translate into End-to-End monitoring solutions
Periodic review and analyze application monitoring TOIL and collaborate with stakeholders and remediate them as per organization goal
Identify manual operations use cases which are performed by Level 1 functions. Create a strategic plan to automate
Drive reusability and efficiency by tracking problem statements raised by Orion Level 1 Function by providing milestone delivery plan
Ability to Design & Build strategic observability dashboard including gold signals like SLO, SLI, Latency & business metrics in a single pane of glass
Lead and mentor SREs, fostering a technical growth and SRE mindset
Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
Use Jira/Agile workflows to track and report on strategic enablers coverage, adoption, and contribution to improved client experience
Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices
Evaluate and integrate new technologies and tooling to enhance our Monitoring & Observability capabilities
Remove inefficiencies and provide solutions to enable unified views of consolidated End-to-End client journeys for Payments & other Services critical user journeys
Collaborate closely with the architecture function to support implementation of observability Nonfunctional requirements as part of SDLC lifecycle
Foster AI adoption by building use cases performed by Orion L1 Functions and remediation using Citi AI tech stack
Lead people management responsibilities for your direct team, including management of headcount, goal setting, performance evaluation, compensation, and hiring
Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations
Requirements:
16+ years of experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including 5+ years in senior leadership roles
Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, Google Cloud), and container platforms (ECS, Kubernetes)
Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
Experience leading teams and managing people across geographically distributed locations
Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
Strong collaboration skills and experience working across horizontal infrastructure teams, building consensus and delivering changes
Ability to stay up to date with market trends and apply them to improve internal tooling and design decisions
Good understanding of AI tech stack, should be able to create a business case and solve using Citibank AI solutions
Excellent written and verbal communication skills
able to influence and articulate complex technical concepts to technical and non-technical audiences
Bachelor's or Master's degree in computer science, Engineering, Information Systems, or a related technical field