This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Are you a talented Site Reliability Engineer with a passion for software development and building resilient, large-scale systems? Do you have a passion for building cutting-edge enterprise products and a hands-on approach to engineering? Join Citi's Production Operation - Continuous Improvement team and be part of our commitment to transform Citi technology, leveraging game-changing capabilities to drive agility, efficiency, and innovation.
Job Responsibility:
Design for Reliability: Design, build, and maintain scalable, secure, and highly available infrastructure, using both off-the-shelf and home-grown tooling for a holistic monitoring/observability of the Production Operations services ecosystem
Innovate and Automate: Develop tools, scripts, and software in Java and/or Python to automate manual operational tasks (toil reduction), improving efficiency and system reliability for yourself and your developer counterparts
Proactive Improvement: Audit existing applications and services for weak points, propose reliability improvements, and drive the implementation of long-term solutions through code and configuration
Champion Observability: Implement comprehensive monitoring, alerting, and logging systems to ensure service health and performance visibility. Define, track, and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Requirements:
Preferably 5+ years of experience in SRE, DevOps, or Software Engineering, with demonstrated expertise in handling production systems
Strong hands-on Java development experience (Java 17/Spring Boot 3.x)