This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The SRE Analytics Lead is a strategic professional who thrives at the intersection of engineering, data, and operations. This role reports to Head of SRE Services and is crucial for building a comprehensive metrics ecosystem for Services business that reflects the true state of our platforms and progress against Production engineering goals. The position involves designing production engineering dashboards, building data pipelines, driving operational reporting, and collaborating across teams to maintain platform reliability and recoverability.
Job Responsibility:
Design, build, and own key Production Engineering dashboards and metrics pipelines, with hands-on ownership across enterprise tools like Tableau, Grafana, Jira, and ServiceNow, giving teams the visibility to make smarter, faster decisions in day-to-day operations and incident response
Establish enterprise aligned consistent frameworks and guiding teams in adopting them, you will help mature how the wider production organization defines, tracks, and acts on engineering health and operational risk
Own the end-to-end data pipeline – from extraction (via APIs or queries), transformation, validation, and delivery – for SRE & wider Production metrics ensuring fully alignment with bank's Agile workflows
Have an automation first mindset - Challenge the status quo, collaborate & contribute innovative solutions to the wider SMBF Production capabilities to improve visibility of key engineering metrics
Track and improve critical production OKRs across Services Production such as MTTR, MTTD, change success rate, recovery automation/Swing tests, alert volume, and toil, by providing actionable insights
Utilise & re-use the existing enterprise solutions to create a unified view of reliability and recovery trends within Services
Collaborate with other central Observability, Architecture and Infrastructure teams to ensure the availability, quality, and consistency of engineering data
Build out data models and repositories that support historical analysis, trend forecasting, and anomaly detection
Drive executive and operational reporting to tell a real story of engineering progress, platform health, and critical business impact enabling LoBs to take data driven decisions
Support SRE tooling strategy by identifying gaps in telemetry, metrics maturity, and automation opportunities
Define and operationalize SLIs, SLOs, and error budgets in partnership with other SREs and development teams across Services, ensuring continue refinement
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behaviour, conduct and business practices, and escalating, managing and reporting control issues with transparency
Requirements:
15+ years of experience in SRE, Observability, Engineering Productivity, or Data Engineering roles
Hands-on experience with Tableau and Grafana for visualization and reporting
Strong command of data integration and engineering techniques (e.g., REST APIs, SQL, Python, ETL tools, data modelling)
Experience building metrics pipelines and data workflows across ServiceNow, Jira, Grafana, cloud telemetry, and operational systems
Familiarity with defining and implementing SLIs, SLOs, and error budget-based engineering workflows
Deep understanding of incident response, recovery processes, and engineering operations in enterprise environments & the related KPIs
Demonstrated ability to influence enterprise outcomes using data – from post-incident reviews to quarterly engineering OKRs
Strong communication skills with the ability to engage both senior technical and non-technical audiences
Demonstrated social, positive, can-do attitude to quickly learn and take own initiative to deliver creative and productive solutions
Ability to communicate well at all levels and network / influence at all levels
Ability to balance multiple demands and work both independently and as part of a matrix organisation to develop solutions
Bachelor’s degree in Computer Science, Engineering, Data Science, or a related technical field, or equivalent practical experience
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.