This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Engineer the future of global finance. At Citi, our Tech team doesn’t just support finance – we are helping to redefine it. Every day, $5 trillion crosses through our network. We do business in 180+ countries operating at a scale few can match. From deploying advanced AI to helping shape global markets, we build systems that matter. Look to join a team where your work helps influence economies, your ideas can drive innovation and outcomes, and your growth is backed by mentorship, continuous learning and flexibility with potential hybrid work opportunities. Help solve real-world challenges that touch millions and get the opportunity to build the future of finance with Citi Tech. We are seeking an experienced and motivated Manager to lead our AI and DevOps Platform Support team in EMEA. This role is responsible for ensuring the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used by teams across the organization. The ideal candidate will lead a team of support engineers, manage incident and problem resolution, and collaborate with engineering and development teams to improve platform services and supportability. Involved in short- to medium-term planning of actions and resources for own area.
Job Responsibility:
Demonstrates an in-depth understanding of how apps support integrates within the overall technology function to achieve objectives
requires a good understanding of the industry
Vendor relationship management including oversight for all offshore managed service
Improve the service level the team provides to our end users, which includes maximizing operational efficiencies, strengthening incident management, problem management and knowledge sharing practices
Guide development teams on application stability and supportability improvements
Formulate and implement a framework for managing capacity, throughput and latency
Define and implemented application on-boarding guidelines and standards
Work with various team members on coaching them on how to maximize their potential, work better in a highly integrated team environment and focus on bringing out their strengths
Drives continued cost reductions and efficiencies across the portfolios supported by means of Root Cause Analysis reviews, Knowledge management, Performance tuning, and user training
Evaluates subordinates' performance and makes decisions on pay increases, hiring, terminations and other personnel actions
Participates in business review meetings, relating technology tools strategies to business requirements
Assures adherence to all support process and tool standards and work with Management to create new and/or enhance processes to ensure consistency and quality in “best practices” across the overall support program
Performs other duties and functions as assigned
Act as the primary point of contact for platform matters, defining the vision and roadmap in partnership with engineering leaders and business stakeholders
Champion the platform's resilience strategy by planning and executing wargaming scenarios, chaos engineering tests, and disaster recovery drills
Drive a comprehensive automation strategy to reduce manual toil, improve deployment velocity, and identify opportunities to leverage AI for operational intelligence
Define and drive the enterprise-wide observability strategy, ensuring the team has the tools and insights needed to guarantee platform health, performance, and cost-effectiveness. This includes overseeing monitoring, logging, tracing, and alerting
Remain hands-on and maintain a deep technical understanding of the platform architecture and services
Oversee the operational health of all production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are met and a robust incident management process is in place
Implement and manage comprehensive monitoring and observability strategies to ensure proactive issue detection, performance analysis, and system health checks across all supported platforms
Requirements:
Relevant experience in a technical leadership or management role with demonstrated success in building and scaling a high-performing support team
Experience of senior stakeholder management
Project management with demonstrable results in improving IT services
Exceptional communication and presentation skills, with the ability to articulate a technical vision and report on key metrics to senior leadership
A strong track record of developing and executing a strategic roadmap for a technical platform, balancing new features with a dedicated 'book of work' for stability
Demonstrable experience leading resilience initiatives such as wargaming, disaster recovery planning, and incident response simulation
Effectively share information with other support team members and with other technology teams
Ability to plan and organize workload
Consistently demonstrates clear and concise written and verbal communication skills
Ability to communicate appropriately to relevant stakeholders
Hands-on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk)