CrawlJobs Logo

Monitoring and Observability Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
United Kingdom , Belfast

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

A Monitoring and Observability Engineer is a strategic professional who stays abreast of developments within Observability and contributes to directional strategy by considering strategic solutions within their remit. This role is recognized as a technical authority within an area of the business. The position requires basic commercial awareness and developed communication and diplomacy skills to guide, influence, and convince colleagues in other areas and occasional external customers. This role has a significant impact on its area through complex deliverables, providing advice and counsel related to the technology or operations of the business. The work impacts an entire area, which eventually affects the overall performance and effectiveness of the sub-function/job family.

Job Responsibility:

  • Operating with a global footprint
  • Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
  • Managing the legacy monitoring stack across the Production Management organization within Citi
  • Driving the strategic delivery of end-to-end Observability solutions in Citi
  • Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
  • Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
  • Persuading and influencing others through strong and comprehensive communication and diplomacy skills
  • Performing other duties and functions as assigned

Requirements:

  • OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
  • Grafana & Observability Stack: Proficiency in administering Geneos ITRS at scale
  • Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
  • Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
  • Experience with Prometheus for metric collection and PromQL for querying
  • Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
  • Technical Documentation: Ability to create clear and concise documentation for systems and processes

Nice to have:

  • Application Deployment: Ability to deploy applications using Lightspeed Enterprise
  • Google Cloud Operations: Experience with Google Cloud operations
  • Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks
What we offer:
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources

Additional Information:

Job Posted:
February 17, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Monitoring and Observability Engineer

Monitoring & Observability Engineer

The Monitoring & Observability Engineer is a senior level position responsible f...
Location
Location
India , Chennai; Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-7 years of relevant experience in an Engineering & IT role
  • At least 2+ years of hands-on working experience in: Strong understanding of UI/UX principles and best practices
  • Proficient in JavaScript, TypeScript, HTML, CSS, React, and Node.js
  • Experience with backend technologies and databases (e.g., MongoDB)
  • Experience with Python Programming
  • Experience with version control systems (e.g., Git)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration skills
  • Create modular and reusable React components to streamline development and maintain consistency across the application
  • Continuously improve existing applications, addressing bugs, and implementing new features
Job Responsibility
Job Responsibility
  • Drive the best-in-class monitoring using a range of tools across all regions of Global Consumer bank
  • Drive POCs and incubate new features and capabilities
  • Be forward looking and ensure long term strategic success
  • Work closely with the monitoring operations teams, production support, performance test teams, operations, application owners and application owners to deliver best-in-class monitoring
  • Explain complicated performance bottlenecks to stakeholders
  • Understand complicated application architecture, including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Build advanced dashboards and queries
  • Be a subject matter expert for the Global Consumer Bank, including conducting brown bags and office hours
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Fulltime
Read More
Arrow Right

Observability engineer

Be the eyes and ears of our platforms with a role that puts you at the heart of ...
Location
Location
Bulgaria , Sofia
Salary
Salary:
Not provided
ebrd.com Logo
European Bank for Reconstruction and Development
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Designing, Implementing and Supporting COTS and Open Source monitoring solutions
  • Understanding of software development principles and troubleshooting application issues
  • Understanding of infrastructure management principles and troubleshooting practices
  • Understanding of performance monitoring approaches
  • Knowledge of Azure monitoring services, container monitoring
  • Understanding of telemetry standards for interoperability
  • Intermediate to advanced technology certification in the given specialism
  • Entry level service management certification such a ITIL Foundation
Job Responsibility
Job Responsibility
  • Design, automate, and optimize observability platforms for logging, metrics, and tracing
  • Expertise in consolidating and analysing application / system logs at enterprise scale, including familiarity with distributed tracing technologies, integrating with ITSM platforms
  • Proficient in scripting languages (Python, Bash, PowerShell) for task automation
  • Experience with Terraform or Ansible for deploying and configuring monitoring / logging infrastructure
  • Strong understanding of protocols (WMI, SSH, SNMP) and methods (API, Traps) for data gathering
What we offer
What we offer
  • Varied, stimulating and engaging work
  • A working culture that embraces inclusion and celebrates diversity
  • An environment that places sustainability, equality and digital transformation at the heart of what we do
  • Flexible working
  • Fulltime
Read More
Arrow Right

Observability Engineer – Splunk Focus

Join our growing Monitoring team! As a Splunk Specialist, you will collaborate c...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven expertise in Splunk Enterprise
  • Strong experience with Splunk ITSI
  • Knowledge of Cribl
  • Ability to design and implement Splunk dashboards
  • Familiarity with automation tools (e.g., Ansible)
  • Experience working in multi-regional teams is a plus
Job Responsibility
Job Responsibility
  • Provide support for monitoring tools: Splunk (Enterprise & ITSI), OpenTelemetry, Cribl, SolarWinds, Dynatrace
  • Automate daily tasks using Ansible
  • Assist development and production teams in migrating to the new Splunk Enterprise and ITSI platforms
  • Build dashboards and define relevant metrics
  • Propose and implement improvements across tools, processes, and KPIs
  • Fulltime
Read More
Arrow Right

Federal Observability Engineer

You will be part of a larger technical team, working as an Observability Enginee...
Location
Location
United States , HILL AFB
Salary
Salary:
105500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • US Citizenship Required
  • Secret Clearance Required
  • DD8750 - Security Plus or higher Security Certification (CISSP, CASP, etc)
  • Bachelor's degree preferred or Associate degree holder (technical field) with 6-8 years working experience in related fields
  • Strong understanding of cloud computing platforms (AWS, Azure, GCP)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Proficiency in scripting languages (Python, Go, Bash)
  • Experience with SQL and NoSQL databases
  • Knowledge of networking protocols (TCP/IP, HTTP)
  • Proven experience with the OpsRamp platform is a strong plus
Job Responsibility
Job Responsibility
  • Designing, implementing, and maintaining observability infrastructure in an OpsRamp environment
  • Working as part of a larger technical team supporting HPE's PCE environment and Cloud infrastructure for a Federal Customer
  • Configuring and managing data sources, defining and monitoring key performance indicators (KPIs), and analyzing performance trends
  • Configuring log collection, aggregation, and analysis within the OpsRamp platform
  • Creating and managing alerts, defining escalation paths, and integrating with incident management systems
  • Developing and implementing automated workflows and remediation actions within the OpsRamp platform
  • Designing and building custom dashboards and reports to provide key insights into system health and performance
  • Integrating OpsRamp with other monitoring and observability tools as needed
  • Ensuring data quality and integrity within the OpsRamp platform
  • Troubleshooting and resolving performance issues, application errors, and other operational problems
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...
Location
Location
United States , Oakland
Salary
Salary:
164000.00 - 208000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, or equivalent coursework
  • At least 3 years of experience building logging, metrics, and tracing infrastructure
  • Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
  • Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
  • At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
  • Excellent communication and collaboration skills
  • Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.
Job Responsibility
Job Responsibility
  • Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
  • Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
  • Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
  • Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
  • Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
  • Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
  • Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are seeking a skilled Site Reliability Engineer (SRE) with experience in AWS,...
Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
yokoy.io Logo
Yokoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with AWS services such as ECS, S3, RDS, Lambda, CloudFront, etc.
  • Experience with monitoring tools like DataDog, CloudWatch, and Grafana
  • Experience with Docker, ECS, Kubernetes or similar containerisation technologies
  • Knowledge of languages such as Bash, Python, NodeJS
  • Experience with IaC tools such as Terraform, Pulumi, and so on
Job Responsibility
Job Responsibility
  • Design, build and maintain scalable, and reliable cloud infrastructure in AWS
  • Monitor and manage the performance, reliability, and security of our systems
  • Implement, and improve monitoring tools to ensure system health, and availability
  • Work with development teams to build, and maintain scalable, resilient and secure applications
  • Participate in our on-call rotation, and resolve production issues
  • Continuously improve automation, monitoring and deployment processes
What we offer
What we offer
  • Competitive compensation, including equity in the company
  • Generous vacation days so you can rest and recharge
  • Health perks such as private healthcare
  • Fitness perks such as an onsite gym & fitness app subsidy
  • Flexible compensation plan to help you diversify and increase the net salary
  • Unforgettable Perk events, including travel to one of our hubs
  • Spring Health - Get access to 12x therapy & 12x coaching sessions per year
  • Exponential growth opportunities
  • VolunteerPerk - We offer 16 paid hours per year that you can use to give back to society by volunteering for a charity of your choice
  • Work from anywhere in the world allowance of 20 working days per year
Read More
Arrow Right

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right