CrawlJobs Logo

App Support Analyst

https://www.citi.com/ Logo

Citi

Location Icon

Location:
United States, Irving

Category Icon
Category:
IT - Administration

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

96400.00 - 144600.00 USD / Year

Job Description:

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact. This role requires a foundational understanding of SRE principles and practices, including monitoring, alerting, incident management, and automation. The ideal candidate possesses a passion for data & statistics, strong analytical problem-solving skills using quantitative approaches, and a desire to learn and apply SRE concepts to improve system reliability.

Job Responsibility:

  • Assist in mining and analyzing large sets of data related to system performance, availability, and reliability, focusing on Golden Signals, Service Mapping, and Business Transactions
  • Support the identification of gaps and opportunities for improvement in system reliability and performance using data-driven insights, learning and applying trend analysis and forecasting techniques
  • Contribute to the development of strategic roadmaps for improving SRE practices and automation, with a focus on toil reduction
  • Assist in defining and deploying key SRE metrics (SLIs, SLOs, error budgets) for measuring process improvement and system reliability
  • Support the implementation of feedback models and automated alerting systems to enable continuous process improvement and proactive incident management
  • Conduct research and contribute to the creation of artifacts to drive simplification and automation of SRE processes
  • Participate in root cause analysis, identifying faults in code, proposing solutions, implementing changes, and optimizing code for performance (O-notation)
  • Assist in reviewing and authoring operational procedures related to SRE best practices and incident response, incorporating service level concepts (SLI, SLO, SLA, Error Budget)
  • Support the distillation of complex technical information into executive-level narratives to communicate SRE performance and initiatives
  • Contribute to foundational and execution work streams for process improvement and automation initiatives within the SRE domain, focusing on cloud-native applications, Docker builds, Kubernetes, Helm, Terraform, and Load Balancing Architecture
  • Collaborate with Domain, Technical architecture, and SRE teams to drive efficiency and reliability
  • Assist in developing and maintaining dashboards and reporting to provide visibility into system health and performance using observability stacks (e.g., Grafana, ELK)
  • Participate in on-call rotation and incident response activities as needed, gaining experience in managing production environments

Requirements:

  • 3-5 years of experience in Data Analysis or a related field, with a demonstrated interest in SRE
  • Foundational understanding of SRE Fundamentals: Service Levels Concepts (SLI, SLO, SLA, Error Budget), Toil reduction, Automation, Observability primers, Chaos Engineering, and Production Management
  • Basic understanding of Observability Fundamentals: Golden Signals, Service Mapping, Business Transactions, Metrics-Logs correlations, Forecasting, and Trend analysis
  • Developing skills in Root Cause Analysis: Identifying faults in code, providing a clear path to resolution, implementing the change, and optimizing code for performance (O-notation)
  • Experience with at least one statistical computer language like Python, or an equivalent
  • Working knowledge of Linux, Docker, Kubernetes, Observability stacks (e.g., Grafana, ELK), Middleware (Kafka), and databases (Oracle)
  • Experience with data visualization tools and creating dashboards for monitoring and reporting on SRE metrics is a plus
  • Familiarity with on-call procedures, incident management tools, and SRE best practices is a plus
  • Excellent problem-solving skills and a willingness to learn and apply new SRE concepts
  • Strong communication and collaboration skills

Nice to have:

  • Experience with data visualization tools and creating dashboards for monitoring and reporting on SRE metrics
  • Familiarity with on-call procedures, incident management tools, and SRE best practices
What we offer:
  • Medical, dental & vision coverage
  • 401(k)
  • Life, accident, and disability insurance
  • Wellness programs
  • Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays

Additional Information:

Job Posted:
September 05, 2025

Expiration:
September 10, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.