Monitoring and Observability Engineer Job at Citi (Belfast)

Monitoring and Observability Engineer

A Monitoring and Observability Engineer is a strategic professional who stays ab...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
Proficiency in administering Geneos ITRS at scale
Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
Experience with Prometheus for metric collection and PromQL for querying
Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
Technical Documentation: Ability to create clear and concise documentation for systems and processes
6-10 years experience
Practical problem solving and strategic thinking skills
Demonstrated leadership, interpersonal skills and relationship building skills

Job Responsibility

Operating with a global footprint
Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
Managing the legacy monitoring stack across the Production Management organization within Citi
Driving the strategic delivery of end-to-end Observability solutions in Citi
Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
Persuading and influencing others through strong and comprehensive communication and diplomacy skills
Performing other duties and functions as assigned

Fulltime

Observability and Monitoring Engineer

We are seeking a highly skilled Observability and Monitoring Engineer to design,...

Location

United States , Pennington

Salary:

115000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Senior Application Programmer
3–5 years of experience in supporting IT Operations
Strong knowledge of monitoring tools (Dynatrace, Splunk)
Experience with scripting languages (Python, Perl, Unix shell)
Creative problem solver who thrives in a fast-paced environment
Must be a team player and demonstrate ability to communicate effectively with both technical and non-technical individuals
Excellent verbal and written communication skills
Clear oral communication and strong English proficiency
Self-starter, motivated, innovative, capable of handling a team and providing technical solutions
Ability to deal with complex information, processes, and relationships to derive simple solutions

Job Responsibility

Deploy and configure Dynatrace across diverse environments (Windows, Linux, Mainframe)
Onboard applications into Splunk using forwarders, source types, and indexing best practices
Define and implement tagging strategies, dashboards, and alerting policies for Dynatrace and Splunk
Enable full-stack monitoring, including APM, infrastructure, logs, and synthetic monitoring
Implement distributed tracing, anomaly detection, and performance baselining
Develop scripts and workflows for automated onboarding and configuration using APIs
Integrate monitoring solutions with ticketing tools for incident management
Establish retention policies and data governance for logs and metrics
Document onboarding processes, SOPs, and troubleshooting guides
Partner with application teams, infrastructure, and CIO stakeholders to align monitoring strategies

Fulltime

Cloud and Observability Engineer

As a Cloud and Observability Engineer you will play a critical role in ensuring ...

Location

India , Gurugram

Salary:

Not provided

Coralogix

Expiration Date

Until further notice

Requirements

Minimum 2+ years of experience as a Systems Engineer, DevOps Engineer, or similar roles, with a focus on monitoring, alerting, and observability solutions
2+ yrs of hands-on experience with and understanding of Cloud and Container technologies (GCP/Azure/AWS + K8/EKS/GKE/AKS)
Good knowledge and hands-on experience with 2 or more Observability platforms, including alert creation, dashboard creation, and infrastructure monitoring
Good understanding of CI/CD with at least one deployment and version control tool
Basic understanding and practical experience with PromQL, Prometheus's query language, for querying metrics and creating custom dashboards
Excellent problem-solving and debugging skills
Strong English verbal and written communication skills
Ability to analyze complex systems, identify inefficiencies or gaps, and propose optimized monitoring solutions
Ability to also work across US and European timezones

Job Responsibility

Extension Delivery: Build & enhance quality extension packages for alerts, dashboards and parsing rules in Coralogix Platform to improve monitoring experience for key services using our platform
Migration Delivery: Help migrate customer alerts, dashboards and parsing rules from leading competitive observability and security platforms to Coralogix
Knowledge Management: Build, maintain and evolve documentation with respect to all aspects of extensions and migration
Conduct training sessions for internal stakeholders and customer on all aspects of the platform functionality (alerts, dashboards, parsing, querying, etc.), migrations process & techniques and extensions content
Collaborate closely with internal stakeholders and customers to understand their specific monitoring needs, gather requirements, and ensure alignment during the extension building process

Fulltime

IT Monitoring & Observability Engineer

We are seeking an experienced IT Monitoring & Observability Engineer to support ...

Location

United States , Washington, DC

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Minimum of 7 years of relevant experience in IT monitoring, observability, or infrastructure operations
Hands‑on experience with OpenText Operations Bridge (OBM) and related tools including: Operations Bridge Manager, SiteScope, AI Operations Management, Optic
Extensive knowledge of multi‑vendor server operating systems
Direct experience with monitoring protocols such as SNMP and WMI
Scripting experience using PowerShell, VBScript, and/or other scripting languages
Experience managing monitoring environments with: 250+ hosts and/or 3,000+ sensors
Experience with additional monitoring platforms such as: Zenoss, PRTG, Zabbix, Nagios
Strong background monitoring: Servers, Storage, Databases, Networks, Applications
Proven ability to engineer monitoring solutions and provide technical leadership

Job Responsibility

Support and manage a unified Configuration Management Database (CMDB), ensuring accuracy and standardization
Collect, aggregate, and analyze monitoring and performance data to support ITIL processes including: Configuration, Event, Capacity, Availability, Demand, Incident and Problem Management
Assess, tune, and optimize monitoring capabilities to deliver accurate, actionable alerts for 24x7 operations teams
Design, create, and maintain intuitive dashboards showing real‑time and historical service health and performance
Configure, maintain, and optimize monitoring dashboards across diverse infrastructure components
Deploy, manage, and update Management Packs, connectors, and monitoring policies
Perform event correlation, suppression, and filtering to reduce alert noise and improve incident triage
Integrate data from third‑party monitoring tools into a centralized event console
Conduct proactive performance and availability monitoring, identify root causes, and implement preventive measures
Support continuous improvement of monitoring strategy, tooling, and operational effectiveness

What we offer

medical, vision, dental, and life and disability insurance
eligible to enroll in our company 401(k) plan
free online training

IT Monitoring & Observability Engineer

We are seeking an experienced IT Monitoring & Observability Engineer to support ...

Location

United States , Washington, DC

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Minimum of 7 years of relevant experience in IT monitoring, observability, or infrastructure operations
Hands‑on experience with OpenText Operations Bridge (OBM) and related tools including: Operations Bridge Manager, SiteScope, AI Operations Management, Optic
Extensive knowledge of multi‑vendor server operating systems
Direct experience with monitoring protocols such as SNMP and WMI
Scripting experience using PowerShell, VBScript, and/or other scripting languages
Experience managing monitoring environments with: 250+ hosts and/or 3,000+ sensors
Experience with additional monitoring platforms such as: Zenoss, PRTG, Zabbix, Nagios
Strong background monitoring: Servers, Storage, Databases, Networks, Applications
Proven ability to engineer monitoring solutions and provide technical leadership

Job Responsibility

Support and manage a unified Configuration Management Database (CMDB), ensuring accuracy and standardization
Collect, aggregate, and analyze monitoring and performance data to support ITIL processes including: Configuration, Event, Capacity, Availability, Demand, Incident and Problem Management
Assess, tune, and optimize monitoring capabilities to deliver accurate, actionable alerts for 24x7 operations teams
Design, create, and maintain intuitive dashboards showing real‑time and historical service health and performance
Configure, maintain, and optimize monitoring dashboards across diverse infrastructure components
Deploy, manage, and update Management Packs, connectors, and monitoring policies
Perform event correlation, suppression, and filtering to reduce alert noise and improve incident triage
Integrate data from third‑party monitoring tools into a centralized event console
Conduct proactive performance and availability monitoring, identify root causes, and implement preventive measures
Support continuous improvement of monitoring strategy, tooling, and operational effectiveness

What we offer

medical, vision, dental, and life and disability insurance
eligible to enroll in our company 401(k) plan

Monitoring and Observability (M&O) Manager

There are NO limits to your career: come shape the future and be part of a truly...

Location

Salary:

Not provided

OutSystems

Expiration Date

Until further notice

Requirements

STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields)
7+ years of experience in SRE, DevOps, or Software Engineering roles
Proven track record in building, scaling, and maintaining highly available, distributed systems
Strong understanding of incident management, SLAs/SLOs/SLIs, and service reliability metrics
Excellent communication, stakeholder management, and cross-functional leadership skills
Ability to foster a culture of automation, reliability, and continuous improvement
Deep, hands-on experience with the Prometheus ecosystem, Grafana, FluentBit, Elastic Stack, and OpenTelemetry
Strong, practical expertise in AWS
Deep knowledge of Kubernetes
Proficiency with Terraform (we use Spacelift)

Job Responsibility

Define and execute the M&O strategic vision and roadmap as Platform Engineering
Lead and mentor a team of M&O engineers, fostering innovation and operational excellence
Treat the M&O platform as an internal product
actively engage with engineering 'customers' (R&D) to understand their needs, gather feedback, and define the platform's roadmap
Manage and optimize cloud infrastructure costs for M&O tools and services
Own the full lifecycle of the M&O platform itself, using Infrastructure as Code, CI/CD, and SRE principles to ensure the platform is reliable, scalable, and cost-effective
Act as the primary evangelist for observability, developing 'golden paths,' documentation, and training to help teams effectively monitor their own services
Partner with development teams throughout the product lifecycle to ensure resilient, performant systems
Drive the enablement of Service Level Objectives (SLOs) by providing the tools, templates, and training for teams to define and measure their own SLOs
Develop, manage, and promote a self-service, company-wide observability platform for use by all engineering teams

Fulltime

Senior Systems Engineer – Production Monitoring and Control

The Senior Systems Engineer is responsible for the design, integration, moderniz...

Location

United States , Austin; Warren

Salary:

Not provided

General Motors

Expiration Date

Until further notice

Requirements

Bachelor's Degree or higher in Engineering, Computer Science, or related technical field
8+ years of professional experience in systems engineering and/or software engineering supporting production systems
5+ years of experience with Windows-based application stacks, including one or more of the following: .NET / C# application development or integration, Windows Server administration and configuration, SQL Server database development and administration
Experience with SQL Server and database development (schemas, queries, stored procedures, performance tuning)
Experience in Industrial IT systems and manufacturing environments, including plant floor applications, OT/IT integration, or MES/SCADA-style solutions
Strong problem-solving, communication, and interpersonal skills

Job Responsibility

Design, integrate, and support applications and systems for Plant Floor Monitoring & Control in GM Manufacturing
Lead modernization of the underlying PMC tech stack across Windows Server, SQL Server, and GE CIMPLICITY, including upgrades, refactoring, standardization, and lifecycle management
Plan and execute environment buildouts, migrations, and validations for PMC and related systems (test, pre‑prod, and production)
Build and maintain GitHub Actions and CI/CD pipelines to automate build, test, deployment, and configuration processes for PMC applications and supporting services
Collaborate on environment management, configuration management, and automation efforts to improve reliability and repeatability of deployments
Partner with reporting and data teams to support data integration, operational reporting, and visualization solutions using SQL Server and related technologies
Develop and maintain SQL Server databases, including schema design, performance tuning, stored procedures, and views used by PMC and reporting applications
Configure, integrate, and support GE CIMPLICITY -based solutions as part of the PMC ecosystem, including interfaces to plant floor and upstream/downstream systems
Implement monitoring, alerting, and logging solutions to improve observability and incident response for PMC environments
Troubleshoot and resolve system and software issues across both new and legacy systems, including Windows Server, SQL Server, and GE CIMPLICITY

What we offer

This job may be eligible for relocation benefits

Fulltime

Staff Software engineer - Authentication and Security Observability

The Login Services team sits within Core Security Engineering and owns Uber’s au...

Location

United States , Sunnyvale

Salary:

232000.00 - 258000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
8+ years of industry experience building large-scale backend platforms, with deep experience in distributed systems and production infrastructure
Strong programming experience in multiple languages (e.g., Go, Java, Python, Node.js/TypeScript), with a track record of shipping reliable systems
Demonstrated expertise designing and operating scalable distributed services, including reliability engineering and operational excellence (observability, incident response, SLAs)
Strong background in security engineering, preferably in identity/authentication and building or operating security-critical pipelines at scale
Proven ability to own complex systems end-to-end—from architecture and implementation to rollout, monitoring, and long-term maintainability—in large-scale environments

Job Responsibility

Lead architecture and execution of core authentication capabilities for human and non-human identities, delivering secure, resilient, and frictionless login experiences at Uber scale
Own and evolve Uber’s tier-zero authentication and SSO infrastructure, maintaining high availability, security, and performance for core login flows and enabling secure, policy-driven access to internal and third-party applications
Build and evolve platform services (APIs, workflows, policy enforcement) with strong engineering fundamentals: reliability, performance, observability, and safe rollout/rollback
Develop the Security Knowledge Platform, building the data/graph foundations and risk signals to categorize identity + asset risk and power multiple security and product use cases
Build the next generation of automation and intelligence—agentify IAM operations to reduce toil/cost and develop the Security Knowledge Platform to power identity + asset risk insights across Security Engineering
Partner cross-functionally and raise the bar—align stakeholders across Security/IT/Ops/Product, mentor engineers through design reviews and incident learning, and set technical direction for the team

What we offer

Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
All full-time employees are eligible to participate in a 401(k) plan
Eligible for various benefits

Fulltime

Select Country

Monitoring and Observability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Monitoring and Observability Engineer

Monitoring and Observability Engineer

Observability and Monitoring Engineer

Cloud and Observability Engineer

IT Monitoring & Observability Engineer

IT Monitoring & Observability Engineer

Monitoring and Observability (M&O) Manager

Senior Systems Engineer – Production Monitoring and Control

Staff Software engineer - Authentication and Security Observability

Our AI answers in your language