CrawlJobs Logo

Site Reliability Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India, Pune

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The SRE Observability Specialist is a hands-on expert, delivering the future of Observability across Services Technology. This role is a part of a central SRE enablement team within Services Production, working closely with SREs, developers, and platform teams to embed telemetry, implement SLOs, and build meaningful visualizations for key production flows — particularly in critical Payments Business.

Job Responsibility:

  • Deliver against the observability roadmap for Services Technology by building scalable, reusable telemetry solutions
  • Create and maintain dashboards and visualizations for critical client journeys, including real-time flows across Payments
  • Guide line-of-business teams in implementing SLIs/SLOs, golden signals, and effective alerting to support operational excellence
  • Support integration and adoption of observability tooling across on-prem, public cloud (AWS/GCP), and containerized environments (ECS, Kubernetes)
  • Customize shared dashboards and observability components in partnership with CTI and other central Engineering functions, ensuring usability and flexibility
  • Provide technical support and implementation guidance to SREs and developers facing integration or tooling challenges
  • Effectively manage the observability book of work for Services Technology and drive initiatives to reduce MTTD and improve recovery outcomes
  • Serve as a key connection point between line-of-business SREs and central infrastructure functions by gathering tooling feedback, surfacing systemic issues, and influencing platform enhancements via the Services Observability Forum
  • Stay current with observability trends, including AI/ML-driven insights, anomaly detection, and emerging OSS practices, and assess their applicability
  • Maintain strong knowledge of observability platform features and vendor offerings to advise teams and maximize the value of tooling investments

Requirements:

  • 10+ years of experience in SRE, Observability Engineering, or platform infrastructure roles focused on operational telemetry
  • Hands-on experience in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms
  • Deep understanding of SLIs, SLOs, Error Budgets, and telemetry best practices in high-availability environments
  • Proven ability to troubleshoot integration issues and support observability across hybrid platforms (on-prem, cloud, containers)
  • Experience building dashboards aligned to business outcomes and incident workflows, especially in critical flows like payments
  • Familiarity with modern observability tooling ecosystems, including AI/ML capabilities, trace correlation, baselining, and alert tuning
  • Strong interpersonal and collaboration skills — able to operate across federated engineering teams and central infrastructure groups
  • Experience in enablement or platform teams with a track record of scaling best practices across diverse business units
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience

Additional Information:

Job Posted:
May 03, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.