App Support Analyst, Citi

Citi

Location:
United States, Irving

Category:
IT - Administration

Contract Type:
Employment contract

Salary:

96400.00 - 144600.00 USD / Year

Save Job

Apply Position

Job Description:

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact. This role requires a foundational understanding of SRE principles and practices, including monitoring, alerting, incident management, and automation. The ideal candidate possesses a passion for data & statistics, strong analytical problem-solving skills using quantitative approaches, and a desire to learn and apply SRE concepts to improve system reliability.

Job Responsibility:

Assist in mining and analyzing large sets of data related to system performance, availability, and reliability, focusing on Golden Signals, Service Mapping, and Business Transactions
Support the identification of gaps and opportunities for improvement in system reliability and performance using data-driven insights, learning and applying trend analysis and forecasting techniques
Contribute to the development of strategic roadmaps for improving SRE practices and automation, with a focus on toil reduction
Assist in defining and deploying key SRE metrics (SLIs, SLOs, error budgets) for measuring process improvement and system reliability
Support the implementation of feedback models and automated alerting systems to enable continuous process improvement and proactive incident management
Conduct research and contribute to the creation of artifacts to drive simplification and automation of SRE processes
Participate in root cause analysis, identifying faults in code, proposing solutions, implementing changes, and optimizing code for performance (O-notation)
Assist in reviewing and authoring operational procedures related to SRE best practices and incident response, incorporating service level concepts (SLI, SLO, SLA, Error Budget)
Support the distillation of complex technical information into executive-level narratives to communicate SRE performance and initiatives
Contribute to foundational and execution work streams for process improvement and automation initiatives within the SRE domain, focusing on cloud-native applications, Docker builds, Kubernetes, Helm, Terraform, and Load Balancing Architecture
Collaborate with Domain, Technical architecture, and SRE teams to drive efficiency and reliability
Assist in developing and maintaining dashboards and reporting to provide visibility into system health and performance using observability stacks (e.g., Grafana, ELK)
Participate in on-call rotation and incident response activities as needed, gaining experience in managing production environments

Requirements:

3-5 years of experience in Data Analysis or a related field, with a demonstrated interest in SRE
Foundational understanding of SRE Fundamentals: Service Levels Concepts (SLI, SLO, SLA, Error Budget), Toil reduction, Automation, Observability primers, Chaos Engineering, and Production Management
Basic understanding of Observability Fundamentals: Golden Signals, Service Mapping, Business Transactions, Metrics-Logs correlations, Forecasting, and Trend analysis
Developing skills in Root Cause Analysis: Identifying faults in code, providing a clear path to resolution, implementing the change, and optimizing code for performance (O-notation)
Experience with at least one statistical computer language like Python, or an equivalent
Working knowledge of Linux, Docker, Kubernetes, Observability stacks (e.g., Grafana, ELK), Middleware (Kafka), and databases (Oracle)
Experience with data visualization tools and creating dashboards for monitoring and reporting on SRE metrics is a plus
Familiarity with on-call procedures, incident management tools, and SRE best practices is a plus
Excellent problem-solving skills and a willingness to learn and apply new SRE concepts
Strong communication and collaboration skills

Nice to have:

Experience with data visualization tools and creating dashboards for monitoring and reporting on SRE metrics
Familiarity with on-call procedures, incident management tools, and SRE best practices

What we offer:

Medical, dental & vision coverage
401(k)
Life, accident, and disability insurance
Wellness programs
Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays

Additional Information:

Job Posted:
September 05, 2025

Expiration:
September 10, 2025

Employment Type:

Fulltime

Work Type:

On-site work

View All Jobs In This Company

Job Link Share:

App Support Analyst