This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Act as a Subject Matter Expert (SME), providing authoritative technical and business support for Citi users. Drive the strategy for improving stability, efficiency, and risk management to help the department and overall business succeed
Lead the resolution of critical incidents, including defining strategy for prioritization, timely escalation, and communication to key stakeholders. Drive comprehensive post-mortems and root cause analysis to prevent problem recurrence
Support the planning and execution of system changes, including application releases, infrastructure maintenance, and continuity of business tests, ensuring production stability is maintained
Shape the production monitoring/observability estate by championing new features and leveraging analytics to enhance system visibility and proactive alerting
Partner with development teams to define and implement long-term improvements to application stability, performance, and recoverability
Define and champion the automation strategy to reduce operational toil, improve efficiency, and mitigate risk
Mentor and develop talent within the team, fostering a culture of knowledge sharing and continuous improvement. Provide guidance and clarify the appropriate course of action for junior team members
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Citi’s policies and code of conduct, and escalating, managing and reporting control issues with transparency
Requirements
10+ years in a Production Support role managing enterprise-level applications, with a proven track record of operating at an SME level
SME-level expertise in Unix/Linux, SQL and advanced Shell scripting
Expert-level knowledge of enterprise monitoring tools (e.g., ITRS Geneos, AppDynamics) and log aggregation platforms (e.g., Splunk, ELK)
Proven experience in mentoring junior team members and leading technical initiatives
Extensive experience with containerization platforms (OpenShift, Kubernetes)
Deep experience with relational (Oracle, MSSQL) and NoSQL (MongoDB) databases
Deep knowledge of messaging solutions (Tibco EMS, MQ, Kafka)
Significant experience working with REST APIs
Expert understanding of distributed application architecture, including networks, load balancers, storage, and authentication (AD/LDAP)
Excellent written and verbal communication skills, with the ability to articulate complex technical issues to both technical and business audiences
Bachelor's/University Degree or equivalent experience will be considered
Nice to have
Proficiency in SQL for data requests
Proficiency in a programming language (e.g., Python, Java) for automation
Experience in applying prompt engineering techniques for interacting with Generative AI / Large Language Models (LLMs)
Experience with modern observability standards and tools: Open Telemetry, Prometheus, Grafana