CrawlJobs Logo

Monitoring & Observability Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Monitoring & Observability Engineer is a senior level position responsible for being an expert with a wide range of monitoring tools, including APM (Appdynamics), Splunk and other tools. The position will drive the monitoring agenda forward for the Global Consumer Bank, driving best-in-class monitoring across all regions and applications, and incubating new capabilities and technologies.

Job Responsibility:

  • Drive the best-in-class monitoring using a range of tools across all regions of Global Consumer bank
  • Drive POCs and incubate new features and capabilities
  • Be forward looking and ensure long term strategic success
  • Work closely with the monitoring operations teams, production support, performance test teams, operations, application owners and application owners to deliver best-in-class monitoring
  • Explain complicated performance bottlenecks to stakeholders
  • Understand complicated application architecture, including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Build advanced dashboards and queries
  • Be a subject matter expert for the Global Consumer Bank, including conducting brown bags and office hours
  • Recommend product customization for system integration
  • Identify problem causality, business impact and root causes
  • Advise or mentor junior team members
  • Impact the engineering function by influencing decisions through advice, counsel or facilitating services
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency

Requirements:

  • 3-7 years of relevant experience in an Engineering & IT role
  • At least 2+ years of hands-on working experience in: Strong understanding of UI/UX principles and best practices
  • Proficient in JavaScript, TypeScript, HTML, CSS, React, and Node.js
  • Experience with backend technologies and databases (e.g., MongoDB)
  • Experience with Python Programming
  • Experience with version control systems (e.g., Git)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration skills
  • Create modular and reusable React components to streamline development and maintain consistency across the application
  • Continuously improve existing applications, addressing bugs, and implementing new features
  • Good exposure to microservices/micro front end architecture
  • Experience with building applications on cloud platform
  • Experience with CI/CD tools (e.g. GitHub Actions)
  • Portfolio showcasing UI/UX projects
  • At least 2+ years of hands-on working experience in: Enterprise monitoring system (such as AppDynamics, Grafana or any APM solutions)
  • Automation scripting (such as Python, PowerShell, etc)
  • Drive the implementation and configuration of Enterprise Observability solution( AppDynamics) to meet organizational monitoring needs
  • Plan, design, build and manage Observability for the applications running multi-cloud environment
  • Perform regular updates, patches, and upgrades to observability tools to ensure they are up-to-date and secure
  • 2+ years working Splunk (or alternative log analytics tool)
  • 2+ years of Experience with a range of architecture techstacks including Java app servers, Web Servers, Cloud (PCF, AWS, Google), Kubernetes, TIBCO, mainframe
  • Experience with synthetic monitoring tools (ideally Micro focus BSM / APM)
  • Ability to converse with application owners, architects, performance testers to pinpoint application performance bottlenecks via the monitoring & observability tools
  • Big Data / AIOPS experience is a strong plus. Including experience with the Splunk Machine Learning Toolkit
  • Experience working in Financial Services or a large complex and/or global environment
  • Experience working with diverse stakeholders, including operations, application developers and performance testing
  • Project Management experience
  • Consistently demonstrates clear and concise written and verbal communication
  • Comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • Demonstrated analytic/diagnostic skills
  • Ability to work in a matrix environment and partner with virtual teams
  • Ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • Ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • Proven track record of operational process change and improvement

Nice to have:

  • Big Data / AIOPS experience
  • Experience with the Splunk Machine Learning Toolkit
  • Experience working in Financial Services or a large complex and/or global environment

Additional Information:

Job Posted:
April 29, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Monitoring & Observability Engineer

Observability engineer

Be the eyes and ears of our platforms with a role that puts you at the heart of ...
Location
Location
Bulgaria , Sofia
Salary
Salary:
Not provided
ebrd.com Logo
European Bank for Reconstruction and Development
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Designing, Implementing and Supporting COTS and Open Source monitoring solutions
  • Understanding of software development principles and troubleshooting application issues
  • Understanding of infrastructure management principles and troubleshooting practices
  • Understanding of performance monitoring approaches
  • Knowledge of Azure monitoring services, container monitoring
  • Understanding of telemetry standards for interoperability
  • Intermediate to advanced technology certification in the given specialism
  • Entry level service management certification such a ITIL Foundation
Job Responsibility
Job Responsibility
  • Design, automate, and optimize observability platforms for logging, metrics, and tracing
  • Expertise in consolidating and analysing application / system logs at enterprise scale, including familiarity with distributed tracing technologies, integrating with ITSM platforms
  • Proficient in scripting languages (Python, Bash, PowerShell) for task automation
  • Experience with Terraform or Ansible for deploying and configuring monitoring / logging infrastructure
  • Strong understanding of protocols (WMI, SSH, SNMP) and methods (API, Traps) for data gathering
What we offer
What we offer
  • Varied, stimulating and engaging work
  • A working culture that embraces inclusion and celebrates diversity
  • An environment that places sustainability, equality and digital transformation at the heart of what we do
  • Flexible working
  • Fulltime
Read More
Arrow Right

Observability Engineer – Splunk Focus

Join our growing Monitoring team! As a Splunk Specialist, you will collaborate c...
Location
Location
Portugal , Lisbon
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven expertise in Splunk Enterprise
  • Strong experience with Splunk ITSI
  • Knowledge of Cribl
  • Ability to design and implement Splunk dashboards
  • Familiarity with automation tools (e.g., Ansible)
  • Experience working in multi-regional teams is a plus
Job Responsibility
Job Responsibility
  • Provide support for monitoring tools: Splunk (Enterprise & ITSI), OpenTelemetry, Cribl, SolarWinds, Dynatrace
  • Automate daily tasks using Ansible
  • Assist development and production teams in migrating to the new Splunk Enterprise and ITSI platforms
  • Build dashboards and define relevant metrics
  • Propose and implement improvements across tools, processes, and KPIs
  • Fulltime
Read More
Arrow Right

Federal Observability Engineer

You will be part of a larger technical team, working as an Observability Enginee...
Location
Location
United States , HILL AFB
Salary
Salary:
105500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • US Citizenship Required
  • Secret Clearance Required
  • DD8750 - Security Plus or higher Security Certification (CISSP, CASP, etc)
  • Bachelor's degree preferred or Associate degree holder (technical field) with 6-8 years working experience in related fields
  • Strong understanding of cloud computing platforms (AWS, Azure, GCP)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Proficiency in scripting languages (Python, Go, Bash)
  • Experience with SQL and NoSQL databases
  • Knowledge of networking protocols (TCP/IP, HTTP)
  • Proven experience with the OpsRamp platform is a strong plus
Job Responsibility
Job Responsibility
  • Designing, implementing, and maintaining observability infrastructure in an OpsRamp environment
  • Working as part of a larger technical team supporting HPE's PCE environment and Cloud infrastructure for a Federal Customer
  • Configuring and managing data sources, defining and monitoring key performance indicators (KPIs), and analyzing performance trends
  • Configuring log collection, aggregation, and analysis within the OpsRamp platform
  • Creating and managing alerts, defining escalation paths, and integrating with incident management systems
  • Developing and implementing automated workflows and remediation actions within the OpsRamp platform
  • Designing and building custom dashboards and reports to provide key insights into system health and performance
  • Integrating OpsRamp with other monitoring and observability tools as needed
  • Ensuring data quality and integrity within the OpsRamp platform
  • Troubleshooting and resolving performance issues, application errors, and other operational problems
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...
Location
Location
United States , Oakland
Salary
Salary:
164000.00 - 208000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, or equivalent coursework
  • At least 3 years of experience building logging, metrics, and tracing infrastructure
  • Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
  • Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
  • At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
  • Excellent communication and collaboration skills
  • Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.
Job Responsibility
Job Responsibility
  • Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
  • Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
  • Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
  • Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
  • Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
  • Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
  • Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure Observability

We have an opening for a Senior Software Engineer on our Infrastructure Team, wi...
Location
Location
United States
Salary
Salary:
180000.00 - 225000.00 USD / Year
temporal.io Logo
Temporal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrated ability to develop horizontally scalable, resilient, and high performance distributed systems in a production environment
  • Experience designing, implementing, deploying, and supporting large scale, geographically distributed observability and/or high throughput data streaming/processing pipelines, or similar
  • Expert in one or more high-level programming languages, preferably Go
  • Expert-level Kubernetes skills
  • Expert-level query development skills, preferably SQL
  • Hands-on experience with one or more cloud providers, preferably AWS, or GCP
  • Thorough understanding of computer architecture, operating systems, and networking
  • Familiarity with best practices regarding monitoring, instrumenting, and configuring infrastructure
  • User-first mindset
  • Motivated by impact
Job Responsibility
Job Responsibility
  • Lead the end-to-end Software Development Lifecycle: goals & requirements solicitation, design & review, implementation, operationalization & deployment, support & maintenance
  • Formulate feature designs, review with stakeholders, iterate to incorporate feedback and drive consensus
  • Clearly document design choices and operational knowledge to successfully deploy and manage the software you develop
  • Provide appropriate test and production readiness coverage for unit, integration, and performance of your feature ownership area
  • Set a high bar for technical excellence and take pride in the software you develop
  • Design and build multi-component, distributed systems that operate at scale
  • Investigate issues with a methodical approach to identify a root cause
  • Understand performance and reliability implications of design options at scale. Make related tradeoffs
  • Able to participate in the team’s on-call rotation
  • Expert-level knowledge of architecture and services of assigned domain. Strong command over all aspects of the Temporal ecosystem
What we offer
What we offer
  • Unlimited PTO, 12 Holidays + 2 Floating Holidays
  • 100% Premiums Coverage for Medical, Dental, and Vision
  • AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
  • Empower 401K Plan
  • Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more
  • $3,600 / Year Work from Home Meals
  • $1,500 / Year Career Development & Learning
  • $1,200 / Year Lifestyle Spending Account
  • $1,000 / Year In-Home Office Setup (In addition to Temporal issued equipment)
  • $500 / Year Professional Memberships
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are seeking a skilled Site Reliability Engineer (SRE) with experience in AWS,...
Location
Location
Spain , Barcelona
Salary
Salary:
Not provided
yokoy.io Logo
Yokoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with AWS services such as ECS, S3, RDS, Lambda, CloudFront, etc.
  • Experience with monitoring tools like DataDog, CloudWatch, and Grafana
  • Experience with Docker, ECS, Kubernetes or similar containerisation technologies
  • Knowledge of languages such as Bash, Python, NodeJS
  • Experience with IaC tools such as Terraform, Pulumi, and so on
Job Responsibility
Job Responsibility
  • Design, build and maintain scalable, and reliable cloud infrastructure in AWS
  • Monitor and manage the performance, reliability, and security of our systems
  • Implement, and improve monitoring tools to ensure system health, and availability
  • Work with development teams to build, and maintain scalable, resilient and secure applications
  • Participate in our on-call rotation, and resolve production issues
  • Continuously improve automation, monitoring and deployment processes
What we offer
What we offer
  • Competitive compensation, including equity in the company
  • Generous vacation days so you can rest and recharge
  • Health perks such as private healthcare
  • Fitness perks such as an onsite gym & fitness app subsidy
  • Flexible compensation plan to help you diversify and increase the net salary
  • Unforgettable Perk events, including travel to one of our hubs
  • Spring Health - Get access to 12x therapy & 12x coaching sessions per year
  • Exponential growth opportunities
  • VolunteerPerk - We offer 16 paid hours per year that you can use to give back to society by volunteering for a charity of your choice
  • Work from anywhere in the world allowance of 20 working days per year
Read More
Arrow Right

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right