This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
A Monitoring and Observability Engineer is a strategic professional who stays abreast of developments within Observability and contributes to directional strategy by considering strategic solutions within their remit. This role is recognized as a technical authority within an area of the business. The position requires basic commercial awareness and developed communication and diplomacy skills to guide, influence, and convince colleagues in other areas and occasional external customers. This role has a significant impact on its area through complex deliverables, providing advice and counsel related to the technology or operations of the business. The work impacts an entire area, which eventually affects the overall performance and effectiveness of the sub-function/job family.
Job Responsibility:
Operating with a global footprint
Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
Managing the legacy monitoring stack across the Production Management organization within Citi
Driving the strategic delivery of end-to-end Observability solutions in Citi
Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
Persuading and influencing others through strong and comprehensive communication and diplomacy skills
Performing other duties and functions as assigned
Requirements:
OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
Grafana & Observability Stack: Proficiency in administering Geneos ITRS at scale
Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
Experience with Prometheus for metric collection and PromQL for querying
Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
Technical Documentation: Ability to create clear and concise documentation for systems and processes
Nice to have:
Application Deployment: Ability to deploy applications using Lightspeed Enterprise
Google Cloud Operations: Experience with Google Cloud operations
Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks
What we offer:
27 days annual leave (plus bank holidays)
A discretional annual performance related bonus
Private Medical Care & Life Insurance
Employee Assistance Program
Pension Plan
Paid Parental Leave
Special discounts for employees, family, and friends
Access to an array of learning and development resources