This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a technically hands-on Application Support Manager to lead and actively participate in the production support function for Citi's Instant Payments application. This role requires a leader capable of directly managing, troubleshooting, and developing a technical support team, ensuring the operational stability and performance of a modern, cloud-native application deployed across both Citi's Enterprise Cloud and Public Cloud environments. As an Application Support Manager, you will be directly involved in the day-to-day operational tasks, implementing and maintaining observability, resiliency, and recoverability solutions, and actively collaborating with various technology teams to ensure the highest levels of application stability and performance for our Instant Payments platform. This is an opportunity to combine deep technical expertise with team leadership in a high-visibility, high-transaction environment.
Job Responsibility
Hands-On Operational Leadership: Directly manage, mentor, and develop a technical support team while actively engaging in day-to-day operational tasks, incident response, and problem resolution for the Instant Payments application
Direct Operational Management: Take direct ownership of ensuring the operational stability and performance of the Instant Payments application across diverse cloud environments (Citi's Enterprise Cloud and Public Cloud), including active monitoring and system checks
Technical Implementation & Optimization: Lead the implementation, configuration, and continuous optimization of observability (monitoring, logging, tracing tools), resiliency (designing and implementing auto-healing and retry mechanisms), and recoverability (executing disaster recovery strategies) solutions for the cloud-native Instant Payments application. This includes writing and maintaining scripts for these functions
Service Level Execution & Improvement: Directly contribute to improving service levels by implementing operational efficiencies, performing incident management, problem management, and enhancing knowledge sharing practices for the Instant Payments application
Application Onboarding & Technical Guidance: Actively participate in defining and implementing application onboarding guidelines and standards. Provide direct technical guidance to development teams on stability and supportability improvements for the Instant Payments application
Incident & Problem Resolution: Lead and execute troubleshooting efforts for complex technical issues, perform in-depth root cause analysis, and implement permanent fixes for the Instant Payments application
Cost Efficiency & Automation: Identify and implement opportunities for cost reduction and operational efficiencies through proactive analysis, performance tuning, and the development of automation scripts and tools. Ensure adherence to support process and tool standards
Technical Communication: Effectively communicate technical details, application status, operational risks, and support initiatives to product teams, development teams, and relevant stakeholders
Risk & Compliance: Directly ensure operational risk is managed effectively and compliance with applicable policies, rules, and regulations is maintained for the Instant Payments application support function
Requirements
5+ years of progressive, hands-on experience in application support, Site Reliability Engineering (SRE), or technical operations, specifically for mission-critical, high-volume financial applications
Demonstrable direct experience with cloud-native architectures, including active configuration and management of microservices, containers (e.g., Kubernetes), and serverless technologies
Extensive practical experience with major Public Cloud platforms (e.g., AWS, Azure, GCP) and enterprise private cloud environments
Proven track record in implementing and operating comprehensive observability stacks (e.g., Prometheus, Grafana, ELK stack, Jaeger, distributed tracing)
Deep understanding and direct application of resiliency engineering principles (e.g., circuit breakers, bulkheads, retry mechanisms) and robust disaster recovery strategies
Strong technical background in instant payments or real-time financial transaction processing systems is highly desirable
Expertise in automation, scripting (e.g., Python, Go, Shell), and infrastructure-as-code principles (e.g., Terraform, CloudFormation)
Excellent communication, interpersonal, and team leadership skills, with the ability to manage and motivate a technical team while remaining deeply technical
Proven ability to troubleshoot and resolve complex technical issues independently, prioritize effectively, and make sound decisions under pressure
Bachelor's/University degree in Computer Science, Engineering, or a related technical field is required
6-10 years experience
Practical problem solving and strategic thinking skills
Demonstrated leadership, interpersonal skills and relationship building skills
Service oriented attitude
Ability to work in a fast-paced environment
Experience working or leading requirement gathering efforts for multiple large development projects at one-time
Proficient using basic technical tools and systems