This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
A Monitoring and Observability Engineer is a strategic professional who stays abreast of developments within Observability and contributes to directional strategy by considering strategic solutions within their remit. This role is recognized as a technical authority within an area of the business. The position requires basic commercial awareness and developed communication and diplomacy skills to guide, influence, and convince colleagues in other areas and occasional external customers. This role has a significant impact on its area through complex deliverables, providing advice and counsel related to the technology or operations of the business. The work impacts an entire area, which eventually affects the overall performance and effectiveness of the sub-function/job family.
Job Responsibility:
Operating with a global footprint
Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
Managing the legacy monitoring stack across the Production Management organization within Citi
Driving the strategic delivery of end-to-end Observability solutions in Citi
Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
Persuading and influencing others through strong and comprehensive communication and diplomacy skills
Performing other duties and functions as assigned
Requirements:
OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
Proficiency in administering Geneos ITRS at scale
Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
Experience with Prometheus for metric collection and PromQL for querying
Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
Technical Documentation: Ability to create clear and concise documentation for systems and processes
6-10 years experience
Practical problem solving and strategic thinking skills
Demonstrated leadership, interpersonal skills and relationship building skills
Service oriented attitude
Ability to work in a fast-paced environment
Experience working or leading requirement gathering efforts for multiple large development projects at one-time
Proficient using basic technical tools and systems