This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The SRE Observability Lead Engineer is a hands-on leader responsible for shaping and delivering the future of Observability across Services Technology. This role reports into the Head of SRE Services and sits within a small central enablement team. You will define the long-term vision, build and scale modern observability capabilities across business lines, and lead a small team of SREs delivering reusable observability services. This is a blended leadership and engineering role – the ideal candidate pairs strategic vision with the technical depth to resolve real-world telemetry challenges across on-prem, cloud, and container-based environments (ECS, Kubernetes, etc.). You’ll work closely with architecture & other engineering functions to not only resolve common challenges affecting SREs aligned to LoBs, but will ensure observability is embedded as a non-functional requirement (NFR) for all new services going live. You will collaborate with platform and infrastructure teams to ensure enterprise-scale, not siloed solutions. You will also be responsible for managing a small, high-impact team of SREs based in your region. This role requires a comprehensive understanding of observability challenges across Services (Payments, Securities Services, Trade, Digital & Data) and the ability to influence outcomes at the enterprise level. Strong commercial awareness, technical credibility, and excellent communication skills are essential to negotiate internally, influence peers, and drive change. Some external communication may be necessary.
Job Responsibility:
Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals
Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture
Lead and mentor SREs across Services, fostering a technical growth and SRE mindset
Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards
Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms
Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”)
Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments
Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight
Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning
Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience
Remove inefficiencies and provide solutions to enable unified views of consolidated SLOs for critical E2E client journeys for Payments & other Services critical user journeys
Influence and align senior stakeholders across functions (applications, infrastructure, controls, and audit) to drive observability investment for critical client flows across Services
Represent Services in working groups to influence enterprise observability standards, ensuring feedback from Services is reflected
Lead people management responsibilities for your direct team, including management of headcount, goal setting, performance evaluation, compensation, and hiring
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behaviour, conduct and business practices, and escalating, managing and reporting control issues with transparency, as well as effectively supervise the activity of others and create accountability with those who fail to maintain these standards
Requirements:
Relevant experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles
Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes)
Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems
Experience leading teams and managing people across geographically distributed locations
Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale
Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations
Strong collaboration skills and experience working across federated teams, building consensus and delivering change
Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions
Excellent written and verbal communication skills
able to influence and articulate complex concepts to technical and non-technical audiences
Education:Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or a related technical field
What we offer:
27 days annual leave (plus bank holidays)
A discretional annual performance related bonus
Private Medical Care & Life Insurance
Employee Assistance Program
Pension Plan
Paid Parental Leave
Special discounts for employees, family, and friends
Access to an array of learning and development resources