Observability Lead Job at Chicago Trading Company (Chicago)

Lead Observability Engineer

We are seeking a Lead Observability Engineer to join the team, and be able to wo...

Location

Salary:

Not provided

N-iX

Expiration Date

Until further notice

Requirements

5+ years of engineering experience in cloud observability platforms, infrastructure, and telemetry systems
Deep experience in alerting, notifications, and monitoring at scale
Advanced expertise with ClickHouse, or similar high-performance analytical databases, for telemetry storage and querying
Hands-on experience migrating telemetry/storage solutions (preferably from Cosmos DB to ClickHouse or equivalent)
Solid understanding of telemetry pipelines, cloud-native monitoring, and best practices
Experience with dashboarding and visualization tools (Grafana, Kibana, or similar)
Strong scripting and automation skills (Python, Bash, Terraform or equivalent)
Proven collaboration and communication skills across cross-functional teams.

Job Responsibility

Lead the migration and transformation of telemetry storage from custom Cosmos DB solutions to ClickHouse, building a scalable and reliable end-to-end observability platform
Architect, implement, and maintain alerting and notification systems integrated with ClickHouse for critical services and applications
Develop, deploy, and operate high-throughput telemetry pipelines, ensuring accurate and actionable monitoring across cloud environments
Collaborate with engineering and product teams to define and champion observability best practices
Work with DevOps and development teams to automate collection, ingestion, and retention policies for logs, metrics, and traces
Drive continuous improvement in system performance, stability, and reliability through effective observability
Participate in on-call rotations, incident response, and root cause analysis to enhance monitoring and alerting capabilities.

What we offer

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

Fulltime

Lead Observability Platform Engineer

Capital One is looking for an Observability Platform Engineer to join our Associ...

Location

United States , Plano; McLean; Richmond

Salary:

149800.00 - 188100.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

High School Diploma, GED, or equivalent certification
At least 3 years of experience creating reports and building alert monitors
At least 3 years working with macOS and Windows platforms
Strong analytical and technical skills
Ability to foster collaborative, open, working relationships with technology groups and other stakeholders, including vendor relationships
Demonstrated clear communication skills and ability to interact effectively at all levels of an organization, and to influence senior management and executives
Strong knowledge of syntax structures for reporting languages, such as SQL or Opal, and good familiarity with parsing data.

Job Responsibility

Work with partner teams to update configurations for our log collectors on our Windows and Mac endpoints
Work with stakeholders to identify, discuss and prioritize log ingestion strategies
Build complex dashboards that tell stories about the health of our endpoints, and identify opportunities for improvements
Create monitors that alert platform teams when changes to the environment may be impacting the health of devices and user experiences
Create reports that detail the performance of applications on our endpoints, and applications being considered for future deployment
Assist platform teams with issue triage by providing complex data and log analysis where needed
Use data to tell stories to our senior leaders, help to drive vendor and product roadmaps
Help create processes and strategies that can validate changes in performance across operating system and product version updates

What we offer

Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
A comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Fulltime

Lead Observability Engineer

Lead Observability Engineer role focusing on the Elastic Observability Platform,...

Location

India , Hyderabad

Salary:

Not provided

Blue Yonder

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, MIS, or equivalent experience
7–10+ years of experience in observability engineering, SRE, monitoring platform ownership, or infrastructure operations
Deep, hands-on expertise with Elastic Stack (Elasticsearch, Kibana, Logstash, Beats/Elastic Agent, APM)
Strong architectural knowledge of cloud (Azure/AWS) and hybrid observability patterns
Experience leading observability for infrastructure, cloud platforms, network systems, Kubernetes, and Microsoft 365
Proven experience designing monitoring for SaaS platforms (Workday, Salesforce, ServiceNow)
Advanced scripting/automation experience (Python, PowerShell, Bash)
Strong knowledge of API integrations, data pipelines, and log-flow engineering
Experience leading incident diagnostics and delivering visibility for RCA and operational improvement
Strong analytical, architectural, and troubleshooting skills with a platform-owner mindset

Job Responsibility

Receives work assignments through the ticketing system or from senior leadership
Provides Tier-4 engineering expertise, platform ownership, and technical leadership for all observability capabilities across hybrid cloud, on-premises, and SaaS environments
Leads the design, architecture, and maturity of the enterprise observability ecosystem with a primary focus on the Elastic Observability Platform
Drives the enterprise strategy for logging, metrics, traces, synthetics, and alerting—including governance, standardization, and performance optimization
Partners closely with Cloud, Infrastructure, Security, Enterprise Applications, and SRE leadership to define observability frameworks
Ensures observability platforms meet enterprise requirements for security, performance, availability, compliance, and scalability
Oversees monitoring implementations for key SaaS applications including Workday, Salesforce, ServiceNow, and Microsoft 365
Provides guidance, mentorship, and direction to observability engineers, SREs, and operational teams
Acts as a strategic advisor during major incidents by providing real-time diagnostics, correlation insights, and driving RCA improvements
Required to provide on-call support during off-hours on weekdays, weekends, and holidays on a rotating basis

Fulltime

Observability Lead – Elastic (ELK) Stack

We are seeking a highly experienced and visionary Observability Lead to spearhea...

Location

India , Mumbai

Salary:

Not provided

Integra Micro Software Services

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, Information Technology (IT), or a closely related technical field
Minimum of 8+ years of professional experience dedicated to observability, system monitoring, or infrastructure management practices
3+ years of direct, hands-on experience specifically managing and engineering solutions using the full Elastic Stack (Elasticsearch, Kibana, Logstash/Beats, Elastic APM, and Fleet/Elastic Agent)
Strong, practical understanding of fundamental observability concepts, including the collection and analysis of logs, metrics, traces, and synthetic monitoring
Expertise in implementing OpenTelemetry, configuring distributed tracing, and carrying out telemetry instrumentation within complex microservice environments
Proven experience working with complementary modern monitoring and containerization tools such as Kubernetes, Docker, Prometheus, and Grafana
Demonstrated proficiency in managing system configurations using YAML-based configurations
Extensive experience in performance optimization, advanced data visualization, and sophisticated dashboarding using Kibana

Job Responsibility

Spearhead our monitoring and infrastructure management initiatives
Drive the strategy and implementation of robust observability solutions
Ensure system reliability, performance, and insightful data visualization

What we offer

Innovation Focused culture
Collaborative Environment
Professional Development through continuous learning programs, certifications, and mentorship opportunities
Work-Life Integration with competitive benefits and policies

Lead Integration & Observability Specialist

The Lead Integration & Observability Specialist will design and implement observ...

Location

India , Coimbatore

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

7+ years of overall IT experience
5+ years of relevant experience in Observability / Monitoring / Reliability Engineering
Strong hands-on experience with enterprise observability tools, such as: IBM Instana, Dynatrace, AppDynamics, Prometheus, Grafana
Expertise in: Monitoring and alerting design
Log management and analysis
Metrics and distributed tracing
Health checks and SLO/SLI concepts
Experience monitoring AWS/Azure workloads
Strong troubleshooting and incident analysis skills
Experience defining operational and non-functional requirements

Job Responsibility

Lead the implementation of enterprise observability for applications, APIs, services, batch jobs, and data pipelines
Design and standardize monitoring, alerting, logging, metrics, and health checks across distributed systems
Integrate observability platforms with incident management and automation tools to support proactive issue detection and remediation
Support reliability and availability of integration platforms built on AWS/Azure
Perform advanced troubleshooting using logs, metrics, and traces to resolve production issues
Define operational readiness standards and non-functional requirements
Mentor engineers on observability best practices and platform usage
Collaborate with product, support, and operations teams to improve service stability and delivery

Site Reliability Engineering (SRE) / Observability Technical Lead

Join a dynamic team as a Site Reliability Engineer, leading observability and re...

Location

United Kingdom , London

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
Hands-on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
Strong proficiency in Infrastructure as Code (IaC) using Terraform
Solid understanding of cloud platforms including AWS, GCP, or Azure
Experience with automation/configuration management tools like Ansible, Chef, or Puppet
Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
Experience managing Kubernetes and containerized environments (Docker, Helm)
Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
Excellent leadership, communication, and collaboration skills

Job Responsibility

Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements
Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence

What we offer

Tailored benefits that support your physical, emotional, and financial wellbeing
Continuous growth and development opportunities
Flexible work options

Fulltime

Program Lead: Product Operations - AI Observability

The AI Observability Program Leader will own the end-to-end strategy, design, an...

Location

United States , Sunnyvale

Salary:

162000.00 - 180000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

5+ years of experience in Technical Program Management, Product Operations, AI Quality, or Observability
Bachelor’s degree in Engineering, Computer Science, Data Science, or a related technical field.

Job Responsibility

Architect Observability Frameworks: Own the strategy for understanding AI agentic reasoning, enabling deep analysis of step-by-step agent decision-making
Drive Autoeval Strategy: Design and roll out automated evaluation systems (LLM-as-a-judge) to provide a scalable, high-confidence "pulse" on AI performance across conversational and voice interfaces
Define Micrometrics: Develop granular signals within agentic activity—identifying latent failures, reasoning loops, or tool-calling inefficiencies—to drive product improvements
Lead Pre-Launch Simulation: Partner with Product & Engineering to build and maintain simulation environments that test AI agents against edge cases before deployment, and democratise these tools with Operations teams
Cross-Functional Technical Partnership: Act as the primary liaison between Product, Engineering, and Data Science to ensure observability tooling is integrated into the development lifecycle and directly informs release "Go/No-Go" decisions
Insight Synthesis: Package complex technical observability data into clear, actionable narratives for leadership, highlighting specific failure patterns and opportunities for CX improvement
Operational Excellence: Establish the standards and tooling for how AI performance is reported globally, ensuring consistency across different regions and support modalities.

What we offer

Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
All full-time employees are eligible to participate in a 401(k) plan
Eligible for various benefits (details at link).

Fulltime

Technical Architect

Lead the design, modernization, and implementation of scalable, secure, and resi...

Location

United States , Armonk

Salary:

247319.00 - 250000.00 USD / Year

The New York Times

Expiration Date

Until further notice

Requirements

Bachelor's degree or equivalent in Computer Science, Information Technology, Engineering or related and five (5) years of experience as a Consultant Architect, Virtualization Architect, Senior Cloud Architect or related
Five (5) years of experience must include utilizing Hybrid Cloud, AWS, Azure, Red Hat Linux, Terraform, Ansible, Python, VMware Cloud Foundation (VCF) Stack

Job Responsibility

Lead the design, modernization, and implementation of scalable, secure, and resilient hybrid cloud and containerized infrastructure platforms
Define and lead the technical architecture strategy for hybrid cloud, container orchestration (Kubernetes, RedHat OpenShift, VMware Tanzu), and virtualized environments (VMware, Nutanix, RedHat)
Architect secure and scalable infrastructure across private, public, and hybrid cloud ecosystems
Evaluate, design, and implement solutions for computing, storage, networking, identity, and availability zones across global regions
Design and implement Kubernetes, RedHat OpenShift clusters across multi-cloud and on-prem environments, including CI/CD integration, policy enforcement, and workload orchestration
Define governance, observability, and security patterns for containerized workloads
Lead Infrastructure-as-Code (IaC) initiatives using Terraform, Ansible, GitOps, GitHub, PowerShell, and Python
Enable self-service infrastructure capabilities through automation frameworks and developer platforms
Partner with DevSecOps, SRE, Infrastructure Operations, Security, and Datacenter Operation teams to scope, define, size, and execute application onboarding, modernization, and consolidation initiatives
Mentor engineering teams and influence enterprise architecture (EA) roadmaps

Fulltime

Select Country

Observability Lead

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?