CrawlJobs Logo

Lead Observability Engineer

blueyonder.com Logo

Blue Yonder

Location Icon

Location:
India , Hyderabad

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Lead Observability Engineer role focusing on the Elastic Observability Platform, ensuring end-to-end visibility for infrastructure, cloud services, networks, and business-critical applications. The role involves strategic leadership, platform ownership, and technical expertise across hybrid environments.

Job Responsibility:

  • Receives work assignments through the ticketing system or from senior leadership
  • Provides Tier-4 engineering expertise, platform ownership, and technical leadership for all observability capabilities across hybrid cloud, on-premises, and SaaS environments
  • Leads the design, architecture, and maturity of the enterprise observability ecosystem with a primary focus on the Elastic Observability Platform
  • Drives the enterprise strategy for logging, metrics, traces, synthetics, and alerting—including governance, standardization, and performance optimization
  • Partners closely with Cloud, Infrastructure, Security, Enterprise Applications, and SRE leadership to define observability frameworks
  • Ensures observability platforms meet enterprise requirements for security, performance, availability, compliance, and scalability
  • Oversees monitoring implementations for key SaaS applications including Workday, Salesforce, ServiceNow, and Microsoft 365
  • Provides guidance, mentorship, and direction to observability engineers, SREs, and operational teams
  • Acts as a strategic advisor during major incidents by providing real-time diagnostics, correlation insights, and driving RCA improvements
  • Required to provide on-call support during off-hours on weekdays, weekends, and holidays on a rotating basis
  • Own and lead the architecture and roadmap for the Elastic Observability platform across the enterprise
  • Define and enforce governance standards for logs, metrics, traces, data retention, and alerting quality
  • Lead platform scaling initiatives—including cluster sizing, performance tuning, ILM tiering, and cost optimization
  • Architect, deploy, and maintain advanced Elastic Observability solutions across hybrid environments
  • Design executive-grade dashboards, correlation views, analytics boards, anomaly detection, and ML-based detections
  • Optimize ingestion pipelines, index structures, data flow, and search/query performance at scale
  • Integrate Elastic Observability with Azure, VMware, Kubernetes, network platforms, ServiceNow, and API sources
  • Define and lead enterprise monitoring standards across logs, metrics, traces, and synthetics
  • Drive cloud and on-prem monitoring maturity by improving instrumentation, coverage, and telemetry consistency
  • Establish alert engineering frameworks that reduce noise and improve detection fidelity
  • Lead design of synthetic transactions, user-experience monitoring, and availability baselines for SaaS apps
  • Ensure proactive monitoring of Workday, Salesforce, ServiceNow, and Microsoft 365 integrations
  • Serve as the observability lead during P1/P0 incidents by delivering real-time visibility and correlation insights
  • Drive MTTR/MTTD improvements through enhanced observability patterns and RCA alignment
  • Build and maintain operational runbooks, dashboards, and standard operating procedures
  • Work with engineering, Cloud, Infrastructure, Applications, and Security leadership to improve observability adoption
  • Act as the senior technical advisor in major IT projects, shaping observability-by-design principles
  • Mentor and guide observability engineers, analysts, and SRE teams to uplift operational capabilities
  • Ensure all monitoring pipelines follow enterprise security, compliance, retention, and logging policies
  • Validate that new systems adhere to observability onboarding requirements and telemetry standards
  • empowering partner IT teams, such as Infrastructure and Apps, to self-service by creating their own monitors, all within the unified guidance and framework established by Observability

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, MIS, or equivalent experience
  • 7–10+ years of experience in observability engineering, SRE, monitoring platform ownership, or infrastructure operations
  • Deep, hands-on expertise with Elastic Stack (Elasticsearch, Kibana, Logstash, Beats/Elastic Agent, APM)
  • Strong architectural knowledge of cloud (Azure/AWS) and hybrid observability patterns
  • Experience leading observability for infrastructure, cloud platforms, network systems, Kubernetes, and Microsoft 365
  • Proven experience designing monitoring for SaaS platforms (Workday, Salesforce, ServiceNow)
  • Advanced scripting/automation experience (Python, PowerShell, Bash)
  • Strong knowledge of API integrations, data pipelines, and log-flow engineering
  • Experience leading incident diagnostics and delivering visibility for RCA and operational improvement
  • Strong analytical, architectural, and troubleshooting skills with a platform-owner mindset
  • Demonstrated ability to influence cross-functional teams and drive enterprise observability adoption
  • Knowledge of ITIL processes, SRE principles, and operational governance
  • Excellent communication, leadership, and stakeholder-management skills

Nice to have:

  • Familiarity with Grafana, Prometheus, Splunk, AppDynamics, Dynatrace
  • Knowledge of Terraform, Ansible, Kubernetes, and infrastructure-as-code tools

Additional Information:

Job Posted:
February 13, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Observability Engineer

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Solutions Engineering Lead

Coralogix is a modern full-stack observability platform that transforms how busi...
Location
Location
United States , Dallas
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
  • Familiarity with structured sales methodologies such as MEDDPIC or Command of the Message (plus)
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • A 401(k) plan and match
  • Paid sick time
  • Paid time off
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Lead Engineering Geologist

The Lead engineering Geologist will be part of the current DJV (Design Joint Ven...
Location
Location
Australia , Sydney; Cooma
Salary
Salary:
Not provided
lombardi.group Logo
Lombardi Engineering
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Should have at least 10-15 years of industry experience in ground investigation, D&B/TBM tunnel excavation and support, caverns, shafts, adits and surface excavation and support
  • Should have a degree in Engineering Geology or equivalent, and have preferably professional registration status
  • Preferably be familiar with tunneling observational methods i.e NATM as well as TBM tunneling with respect to geological/geotechnical assessment during excavation and instrumentation and monitoring systems and data interpretation
  • Familiar with advanced tunnel/underground geological software and system, i.e. Leapfrog, gINT, Shape Matrix, Dips, Swedge/Unwedge, etc.
  • A valid driving license
Job Responsibility
Job Responsibility
  • Lead Lombardi geology team on site
  • Perform scheduling, resource planning and managing the REDST tasks related to geology studies and investigations
  • Liaise with the client (employer) safety officers, tunnel inspectors, security guards, section engineers, and surveyors to ensure the safety of DJV staff/DJV geologists during the geology field works and geological mapping
  • Attend regular internal/external meetings as requested by the Lead Resident Design Engineer or Project Manager and Project Director
  • Attend regular technical meetings with the contractor/Employer as requested by the Lead Resident Design Engineer
  • Oversee and conduct tunnels/shafts/cavern/access adits and portals geological mapping
  • Assess ground conditions and ground behaviors in response to the surface and underground excavations in accordance with the relevant technical specifications and drawings
  • Review geological mapping, geological/geotechnical instrumentation and monitoring documents, geotechnical testing results and other relevant documents as specified in the relevant specifications and drawings
  • Advise on tunnel/cut slope excavation and support methods, rock/soil support, fissure/consolidation/backfill/cavity grouting, probe drilling, over breaks treatment, geotechnical instrumentation and monitoring and lab/in-situ testing in accordance with the project technical specifications and drawings
  • Geological/geotechnical investigation and data interpretation in relation to D&B and TBM excavations, shafts, caverns, cut slopes, and tunnel portals
  • Fulltime
Read More
Arrow Right

Cyber Security Engineering Lead

Join Citi's Cloud Technology Services team to lead and execute critical cyber se...
Location
Location
Hungary , Budapest
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of relevant cybersecurity and/or IT experience
  • Leadership roles across technology or cybersecurity leading large programs or transformational activities
  • Proven track record of delivering security observability platforms such as telemetry data for performance and/or user experience.
  • Thorough understanding of industry and corporate technology standards for Cyber Security services
  • Demonstrated ability to take ownership and work with cross functional teams to manage multiple projects simultaneously under pressure
  • Advanced analytical and problem-solving skills
  • Consistently demonstrates clear and concise written and oral communication as well as strong presentation skills to both technical and non-technical audiences.
  • Bachelor’s degree in relevant subject or equivalent work experience
Job Responsibility
Job Responsibility
  • Lead a virtual team of Infrastructure Defense professionals.
  • Lead CTB transformational and RTB activities across NDCS and act as focal point managing cyber security platforms
  • Lead, design, own and deliver Security Observability Enablement on a global scale focusing on all related perimeter technologies – such as Firewall Telemetry.
  • Deliver end-to-end dashboards of critical security service based data (such as firewall performance)
  • Working with Transformation Program Directors, Senior Architects, Steering Committees on execution of perimeter security and edge security programs
  • Work with global cyber security industry partners on influencing next generation cyber technology, take part in related R&D efforts.
  • Responsible for inventory, accuracy and engineering excellence activities for assigned services and products.
What we offer
What we offer
  • Cafeteria Program
  • Home Office Allowance (for colleagues working in hybrid work models)
  • Paid Parental Leave Program (maternity and paternity leave)
  • Private Medical Care Program and onsite medical rooms at our offices
  • Pension Plan Contribution to voluntary pension fund
  • Group Life Insurance
  • Employee Assistance Program
  • Access to a wide variety of learning and development programs, online course libraries and upskilling platforms, such as Udemy and Degreed
  • Flexible work arrangements to support you in managing work - life balance
  • Career progression opportunities across geographies and business lines
  • Fulltime
Read More
Arrow Right

Staff Observability Operations Engineer

We are currently seeking several experienced and highly skilled Staff Observabil...
Location
Location
United States , Hartford
Salary
Salary:
130295.00 - 260590.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty)
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, Bash for automation and configuration
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions
  • Proven ability to troubleshoot and resolve complex technical issues
Job Responsibility
Job Responsibility
  • Deploy and implement modern observability solutions
  • Manage and administer observability and event management platforms
  • Coordinate and manage release cycles for observability platforms
  • Troubleshoot and resolve incidents related to observability platforms
  • Continuously monitor and enhance platform performance
  • Collaborate with cross-functional stakeholders
  • Provide training and mentoring to junior engineers
  • Ensure compliance and security of observability platforms
  • Maintain documentation of observability platform configurations
  • Generate and analyze reports on platform performance and capacity
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs
  • confidential counseling and financial coaching
  • Paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • Fulltime
Read More
Arrow Right

Automation Engineering Lead Analyst

Enterprise Analytics Services (EAS) team (Part of Cloud Technology Services Orga...
Location
Location
India , Pune; Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Total 10+ years of IT experience
  • Solid hands-on experience of 6+ years with Linux operating systems, including extensive experience with VMware environments
  • Solid proficiency cultivated over 6+ years in Unix Shell scripting, Ansible, Terraform, Python scripting, and Java for robust automation and development
  • Proven experience of 5+ years in automation, architecting, deploying, and managing solutions utilizing containerization technologies, including Docker, Kubernetes, OpenShift, and Helm, in production environments
  • Extensive expertise with 5+ years of hands-on experience in SQL language, complemented by a strong command of multiple Relational Database Management Systems (RDBMS), HiveQL, and Spark for data manipulation and analysis
  • Experience working with AWS/GCP. Certification is preferred
  • Demonstrated DevOps skills with a minimum of 4 years of experience, including proficiency in version control (GitHub) and practical application of CI/CD tools (e.g., Jenkins, Tekton, Harness)
  • Preferred experience with observability tools like Grafana, Elastic Kibana, Splunk etc.
  • Bachelor's degree/University degree or equivalent experience
  • Master's degree preferred
Job Responsibility
Job Responsibility
  • Design, develop, and maintain automation solutions to streamline the deployment, configuration, and lifecycle management of complex systems
  • Identifying tasks that are repetitive, time-consuming and, strategically implement automation to optimize resource capabilities, improve efficiency and cost savings
  • Engineer and customize enterprise platforms to align with evolving organizational requirements, ensuring scalability, reliability, and maintainability
  • Build and deliver custom solutions, enhancements and extensions that improve system capabilities, operational workflows, and end-user experience
  • Evaluate and certify new product features and releases through structured testing and validation to ensure compatibility, performance, and security within the enterprise environment
  • Collaborate cross-functionally with operations teams to provide technical guidance, develop fit-for-purpose automation tools, and support production needs
  • Engage with external vendors and internal stakeholders to coordinate product features, raise enhancement requests, and resolve technical issues efficiently
  • Liaise with end users and internal customers to gather requirements, provide technical solutions, and deliver a high standard of service and support
  • Contribute to the overall system architecture and engineering strategy, promoting automation-first approaches and reusable design patterns
  • Document processes, solutions, and best practices to ensure transparency, knowledge sharing, and operational excellence
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...
Location
Location
United States , Oakland
Salary
Salary:
164000.00 - 208000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, or equivalent coursework
  • At least 3 years of experience building logging, metrics, and tracing infrastructure
  • Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
  • Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
  • At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
  • Excellent communication and collaboration skills
  • Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.
Job Responsibility
Job Responsibility
  • Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
  • Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
  • Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
  • Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
  • Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
  • Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
  • Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt
  • Fulltime
Read More
Arrow Right