CrawlJobs Logo

Monitoring and Observability Engineer

India, Pune · Job Posted March 19, 2026
Apply Position
Job Link Share

Job Description

A Monitoring and Observability Engineer is a strategic professional who stays abreast of developments within Observability and contributes to directional strategy by considering strategic solutions within their remit. This role is recognized as a technical authority within an area of the business. The position requires basic commercial awareness and developed communication and diplomacy skills to guide, influence, and convince colleagues in other areas and occasional external customers. This role has a significant impact on its area through complex deliverables, providing advice and counsel related to the technology or operations of the business. The work impacts an entire area, which eventually affects the overall performance and effectiveness of the sub-function/job family.

Job Responsibility

  • Operating with a global footprint
  • Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
  • Managing the legacy monitoring stack across the Production Management organization within Citi
  • Driving the strategic delivery of end-to-end Observability solutions in Citi
  • Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
  • Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
  • Persuading and influencing others through strong and comprehensive communication and diplomacy skills
  • Performing other duties and functions as assigned

Requirements

  • OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
  • Proficiency in administering Geneos ITRS at scale
  • Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
  • Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
  • Experience with Prometheus for metric collection and PromQL for querying
  • Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
  • Technical Documentation: Ability to create clear and concise documentation for systems and processes
  • 6-10 years experience
  • Practical problem solving and strategic thinking skills
  • Demonstrated leadership, interpersonal skills and relationship building skills
  • Service oriented attitude
  • Ability to work in a fast-paced environment
  • Experience working or leading requirement gathering efforts for multiple large development projects at one-time
  • Proficient using basic technical tools and systems
  • Good interpersonal and communication skills
  • Bachelor’s/University degree, Master’s degree preferred

Nice to have

  • Application Deployment: Ability to deploy applications using Lightspeed Enterprise
  • Google Cloud Operations: Experience with Google Cloud operations
  • Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Monitoring and Observability Engineer

8 matching positions

Monitoring and Observability Engineer

A Monitoring and Observability Engineer is a strategic professional who stays ab...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking
  • Grafana & Observability Stack: Proficiency in administering Geneos ITRS at scale
  • Proficiency in administering Grafana (user management, data sources, dashboards, alerts)
  • Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces)
  • Experience with Prometheus for metric collection and PromQL for querying
  • Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies
  • Technical Documentation: Ability to create clear and concise documentation for systems and processes
Job Responsibility
Job Responsibility
  • Operating with a global footprint
  • Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale
  • Managing the legacy monitoring stack across the Production Management organization within Citi
  • Driving the strategic delivery of end-to-end Observability solutions in Citi
  • Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions
  • Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services
  • Persuading and influencing others through strong and comprehensive communication and diplomacy skills
  • Performing other duties and functions as assigned
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Observability and Monitoring Engineer

We are seeking a highly skilled Observability and Monitoring Engineer to design,...
Location
Location
United States , Pennington
Salary
Salary:
115000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Senior Application Programmer
  • 3–5 years of experience in supporting IT Operations
  • Strong knowledge of monitoring tools (Dynatrace, Splunk)
  • Experience with scripting languages (Python, Perl, Unix shell)
  • Creative problem solver who thrives in a fast-paced environment
  • Must be a team player and demonstrate ability to communicate effectively with both technical and non-technical individuals
  • Excellent verbal and written communication skills
  • Clear oral communication and strong English proficiency
  • Self-starter, motivated, innovative, capable of handling a team and providing technical solutions
  • Ability to deal with complex information, processes, and relationships to derive simple solutions
Job Responsibility
Job Responsibility
  • Deploy and configure Dynatrace across diverse environments (Windows, Linux, Mainframe)
  • Onboard applications into Splunk using forwarders, source types, and indexing best practices
  • Define and implement tagging strategies, dashboards, and alerting policies for Dynatrace and Splunk
  • Enable full-stack monitoring, including APM, infrastructure, logs, and synthetic monitoring
  • Implement distributed tracing, anomaly detection, and performance baselining
  • Develop scripts and workflows for automated onboarding and configuration using APIs
  • Integrate monitoring solutions with ticketing tools for incident management
  • Establish retention policies and data governance for logs and metrics
  • Document onboarding processes, SOPs, and troubleshooting guides
  • Partner with application teams, infrastructure, and CIO stakeholders to align monitoring strategies
  • Fulltime
Read More
Arrow Right

Cloud and Observability Engineer

As a Cloud and Observability Engineer you will play a critical role in ensuring ...
Location
Location
India , Gurugram
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 2+ years of experience as a Systems Engineer, DevOps Engineer, or similar roles, with a focus on monitoring, alerting, and observability solutions
  • 2+ yrs of hands-on experience with and understanding of Cloud and Container technologies (GCP/Azure/AWS + K8/EKS/GKE/AKS)
  • Good knowledge and hands-on experience with 2 or more Observability platforms, including alert creation, dashboard creation, and infrastructure monitoring
  • Good understanding of CI/CD with at least one deployment and version control tool
  • Basic understanding and practical experience with PromQL, Prometheus's query language, for querying metrics and creating custom dashboards
  • Excellent problem-solving and debugging skills
  • Strong English verbal and written communication skills
  • Ability to analyze complex systems, identify inefficiencies or gaps, and propose optimized monitoring solutions
  • Ability to also work across US and European timezones
Job Responsibility
Job Responsibility
  • Extension Delivery: Build & enhance quality extension packages for alerts, dashboards and parsing rules in Coralogix Platform to improve monitoring experience for key services using our platform
  • Migration Delivery: Help migrate customer alerts, dashboards and parsing rules from leading competitive observability and security platforms to Coralogix
  • Knowledge Management: Build, maintain and evolve documentation with respect to all aspects of extensions and migration
  • Conduct training sessions for internal stakeholders and customer on all aspects of the platform functionality (alerts, dashboards, parsing, querying, etc.), migrations process & techniques and extensions content
  • Collaborate closely with internal stakeholders and customers to understand their specific monitoring needs, gather requirements, and ensure alignment during the extension building process
  • Fulltime
Read More
Arrow Right

IT Monitoring & Observability Engineer

We are seeking an experienced IT Monitoring & Observability Engineer to support ...
Location
Location
United States , Washington, DC
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 7 years of relevant experience in IT monitoring, observability, or infrastructure operations
  • Hands‑on experience with OpenText Operations Bridge (OBM) and related tools including: Operations Bridge Manager, SiteScope, AI Operations Management, Optic
  • Extensive knowledge of multi‑vendor server operating systems
  • Direct experience with monitoring protocols such as SNMP and WMI
  • Scripting experience using PowerShell, VBScript, and/or other scripting languages
  • Experience managing monitoring environments with: 250+ hosts and/or 3,000+ sensors
  • Experience with additional monitoring platforms such as: Zenoss, PRTG, Zabbix, Nagios
  • Strong background monitoring: Servers, Storage, Databases, Networks, Applications
  • Proven ability to engineer monitoring solutions and provide technical leadership
Job Responsibility
Job Responsibility
  • Support and manage a unified Configuration Management Database (CMDB), ensuring accuracy and standardization
  • Collect, aggregate, and analyze monitoring and performance data to support ITIL processes including: Configuration, Event, Capacity, Availability, Demand, Incident and Problem Management
  • Assess, tune, and optimize monitoring capabilities to deliver accurate, actionable alerts for 24x7 operations teams
  • Design, create, and maintain intuitive dashboards showing real‑time and historical service health and performance
  • Configure, maintain, and optimize monitoring dashboards across diverse infrastructure components
  • Deploy, manage, and update Management Packs, connectors, and monitoring policies
  • Perform event correlation, suppression, and filtering to reduce alert noise and improve incident triage
  • Integrate data from third‑party monitoring tools into a centralized event console
  • Conduct proactive performance and availability monitoring, identify root causes, and implement preventive measures
  • Support continuous improvement of monitoring strategy, tooling, and operational effectiveness
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • eligible to enroll in our company 401(k) plan
  • free online training
Read More
Arrow Right

IT Monitoring & Observability Engineer

We are seeking an experienced IT Monitoring & Observability Engineer to support ...
Location
Location
United States , Washington, DC
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 7 years of relevant experience in IT monitoring, observability, or infrastructure operations
  • Hands‑on experience with OpenText Operations Bridge (OBM) and related tools including: Operations Bridge Manager, SiteScope, AI Operations Management, Optic
  • Extensive knowledge of multi‑vendor server operating systems
  • Direct experience with monitoring protocols such as SNMP and WMI
  • Scripting experience using PowerShell, VBScript, and/or other scripting languages
  • Experience managing monitoring environments with: 250+ hosts and/or 3,000+ sensors
  • Experience with additional monitoring platforms such as: Zenoss, PRTG, Zabbix, Nagios
  • Strong background monitoring: Servers, Storage, Databases, Networks, Applications
  • Proven ability to engineer monitoring solutions and provide technical leadership
Job Responsibility
Job Responsibility
  • Support and manage a unified Configuration Management Database (CMDB), ensuring accuracy and standardization
  • Collect, aggregate, and analyze monitoring and performance data to support ITIL processes including: Configuration, Event, Capacity, Availability, Demand, Incident and Problem Management
  • Assess, tune, and optimize monitoring capabilities to deliver accurate, actionable alerts for 24x7 operations teams
  • Design, create, and maintain intuitive dashboards showing real‑time and historical service health and performance
  • Configure, maintain, and optimize monitoring dashboards across diverse infrastructure components
  • Deploy, manage, and update Management Packs, connectors, and monitoring policies
  • Perform event correlation, suppression, and filtering to reduce alert noise and improve incident triage
  • Integrate data from third‑party monitoring tools into a centralized event console
  • Conduct proactive performance and availability monitoring, identify root causes, and implement preventive measures
  • Support continuous improvement of monitoring strategy, tooling, and operational effectiveness
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • eligible to enroll in our company 401(k) plan
Read More
Arrow Right

Monitoring and Observability (M&O) Manager

There are NO limits to your career: come shape the future and be part of a truly...
Location
Location
Salary
Salary:
Not provided
outsystems.com Logo
OutSystems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields)
  • 7+ years of experience in SRE, DevOps, or Software Engineering roles
  • Proven track record in building, scaling, and maintaining highly available, distributed systems
  • Strong understanding of incident management, SLAs/SLOs/SLIs, and service reliability metrics
  • Excellent communication, stakeholder management, and cross-functional leadership skills
  • Ability to foster a culture of automation, reliability, and continuous improvement
  • Deep, hands-on experience with the Prometheus ecosystem, Grafana, FluentBit, Elastic Stack, and OpenTelemetry
  • Strong, practical expertise in AWS
  • Deep knowledge of Kubernetes
  • Proficiency with Terraform (we use Spacelift)
Job Responsibility
Job Responsibility
  • Define and execute the M&O strategic vision and roadmap as Platform Engineering
  • Lead and mentor a team of M&O engineers, fostering innovation and operational excellence
  • Treat the M&O platform as an internal product
  • actively engage with engineering 'customers' (R&D) to understand their needs, gather feedback, and define the platform's roadmap
  • Manage and optimize cloud infrastructure costs for M&O tools and services
  • Own the full lifecycle of the M&O platform itself, using Infrastructure as Code, CI/CD, and SRE principles to ensure the platform is reliable, scalable, and cost-effective
  • Act as the primary evangelist for observability, developing 'golden paths,' documentation, and training to help teams effectively monitor their own services
  • Partner with development teams throughout the product lifecycle to ensure resilient, performant systems
  • Drive the enablement of Service Level Objectives (SLOs) by providing the tools, templates, and training for teams to define and measure their own SLOs
  • Develop, manage, and promote a self-service, company-wide observability platform for use by all engineering teams
  • Fulltime
Read More
Arrow Right

Senior Systems Engineer – Production Monitoring and Control

The Senior Systems Engineer is responsible for the design, integration, moderniz...
Location
Location
United States , Austin; Warren
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree or higher in Engineering, Computer Science, or related technical field
  • 8+ years of professional experience in systems engineering and/or software engineering supporting production systems
  • 5+ years of experience with Windows-based application stacks, including one or more of the following: .NET / C# application development or integration, Windows Server administration and configuration, SQL Server database development and administration
  • Experience with SQL Server and database development (schemas, queries, stored procedures, performance tuning)
  • Experience in Industrial IT systems and manufacturing environments, including plant floor applications, OT/IT integration, or MES/SCADA-style solutions
  • Strong problem-solving, communication, and interpersonal skills
Job Responsibility
Job Responsibility
  • Design, integrate, and support applications and systems for Plant Floor Monitoring & Control in GM Manufacturing
  • Lead modernization of the underlying PMC tech stack across Windows Server, SQL Server, and GE CIMPLICITY, including upgrades, refactoring, standardization, and lifecycle management
  • Plan and execute environment buildouts, migrations, and validations for PMC and related systems (test, pre‑prod, and production)
  • Build and maintain GitHub Actions and CI/CD pipelines to automate build, test, deployment, and configuration processes for PMC applications and supporting services
  • Collaborate on environment management, configuration management, and automation efforts to improve reliability and repeatability of deployments
  • Partner with reporting and data teams to support data integration, operational reporting, and visualization solutions using SQL Server and related technologies
  • Develop and maintain SQL Server databases, including schema design, performance tuning, stored procedures, and views used by PMC and reporting applications
  • Configure, integrate, and support GE CIMPLICITY -based solutions as part of the PMC ecosystem, including interfaces to plant floor and upstream/downstream systems
  • Implement monitoring, alerting, and logging solutions to improve observability and incident response for PMC environments
  • Troubleshoot and resolve system and software issues across both new and legacy systems, including Windows Server, SQL Server, and GE CIMPLICITY
What we offer
What we offer
  • This job may be eligible for relocation benefits
  • Fulltime
Read More
Arrow Right

Staff Software engineer - Authentication and Security Observability

The Login Services team sits within Core Security Engineering and owns Uber’s au...
Location
Location
United States , Sunnyvale
Salary
Salary:
232000.00 - 258000.00 USD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 8+ years of industry experience building large-scale backend platforms, with deep experience in distributed systems and production infrastructure
  • Strong programming experience in multiple languages (e.g., Go, Java, Python, Node.js/TypeScript), with a track record of shipping reliable systems
  • Demonstrated expertise designing and operating scalable distributed services, including reliability engineering and operational excellence (observability, incident response, SLAs)
  • Strong background in security engineering, preferably in identity/authentication and building or operating security-critical pipelines at scale
  • Proven ability to own complex systems end-to-end—from architecture and implementation to rollout, monitoring, and long-term maintainability—in large-scale environments
Job Responsibility
Job Responsibility
  • Lead architecture and execution of core authentication capabilities for human and non-human identities, delivering secure, resilient, and frictionless login experiences at Uber scale
  • Own and evolve Uber’s tier-zero authentication and SSO infrastructure, maintaining high availability, security, and performance for core login flows and enabling secure, policy-driven access to internal and third-party applications
  • Build and evolve platform services (APIs, workflows, policy enforcement) with strong engineering fundamentals: reliability, performance, observability, and safe rollout/rollback
  • Develop the Security Knowledge Platform, building the data/graph foundations and risk signals to categorize identity + asset risk and power multiple security and product use cases
  • Build the next generation of automation and intelligence—agentify IAM operations to reduce toil/cost and develop the Security Knowledge Platform to power identity + asset risk insights across Security Engineering
  • Partner cross-functionally and raise the bar—align stakeholders across Security/IT/Ops/Product, mentor engineers through design reviews and incident learning, and set technical direction for the team
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • May be offered an equity award & other types of comp
  • All full-time employees are eligible to participate in a 401(k) plan
  • Eligible for various benefits
  • Fulltime
Read More
Arrow Right