This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Senior Manager of Kubernetes Observability to provide strategic leadership for the design, standardization, and scaled execution of our enterprise observability ecosystem across Kubernetes and OpenShift platforms, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). This role is responsible for ensuring a robust, unified, and automated observability platform that enables reliability, performance, and operational excellence across all clusters and workloads in hybrid and multi‑cloud environments. As a senior technology leader, you will define the long‑term vision and operating model for metrics, logging, tracing, eventing, and monitoring standards across on‑prem, cloud‑managed, and hosted Kubernetes platforms. You will guide multiple engineering teams to execute consistently against this strategy, ensuring full instrumentation, proactive issue detection, reduced MTTR, and improved platform stability. Through strong architectural direction, organizational alignment, and focused mentorship, you will elevate engineering maturity and ensure developers and SREs have actionable insights that accelerate innovation and support enterprise growth at scale.
Job Responsibility:
Define the target‑state vision and multi‑year roadmap for observability across Kubernetes, OpenShift, AKS, and GKE, including metrics, logging, tracing, eventing, and alerting standards
Establish a unified observability operating model that ensures consistency, scalability, and reuse across on‑prem, cloud‑managed, and multi‑cloud Kubernetes environments
Define success metrics and outcomes that measure observability effectiveness, reliability improvements, and reductions in MTTR across all platforms
Set architectural direction for enterprise observability platforms, tooling, and telemetry pipelines across Kubernetes, OpenShift, AKS, and GKE
Establish standardized instrumentation patterns for clusters, workloads, control planes, and platform services, ensuring complete and consistent telemetry coverage regardless of Kubernetes distribution or cloud provider
Drive convergence toward unified observability frameworks that abstract provider‑specific differences while preserving deep platform insight
Drive automation of observability onboarding and telemetry workflows across Kubernetes, AKS, and GKE to reduce manual effort and accelerate adoption
Enable self‑service observability capabilities that allow developers and SREs to easily instrument, monitor, and troubleshoot workloads across cloud and on‑prem clusters
Ensure observability is embedded by default into platform, infrastructure‑as‑code, and application delivery pipelines
Enable proactive issue detection through scalable alerting frameworks, actionable dashboards, and standardized monitoring practices across all Kubernetes platforms
Improve reliability and performance visibility for workloads running on OpenShift, AKS, and GKE, reducing reliance on reactive troubleshooting
Partner with SRE and operations teams to continuously improve incident response, post‑incident learning, and preventative engineering across hybrid and multi‑cloud environments
Lead, mentor, and develop engineering leaders and teams responsible for observability platform components and services
Align platform, SRE, cloud, and application teams around shared observability standards and operational goals across Kubernetes, AKS, and GKE
Strengthen cross‑team collaboration and engineering rigor to raise overall organizational maturity in observability and operations
Requirements:
6+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of management or leadership experience
5+ years of experience in platform engineering, reliability engineering, or observability‑focused technical leadership roles, or equivalent demonstrated experience
6+ years of Grafana & Splunk
5+ years of experience with Kubernetes observability concepts, including metrics, logging, tracing, eventing, and monitoring platforms, across OpenShift, AKS, and GKE
Nice to have:
6+ years of people management or senior technical leadership experience guiding multiple engineering teams
Demonstrated success defining and scaling enterprise observability platforms across large, multi‑cloud Kubernetes environments
Strong understanding of SRE, operational excellence, and reliability engineering practices
Experience driving automation and standardization to reduce MTTR and operational toil
Proven ability to influence across platform, infrastructure, cloud, and application teams
Strong executive communication skills, including the ability to articulate strategy, tradeoffs, and outcomes to senior stakeholders