CrawlJobs Logo

Senior Manager of Kubernetes Observability

https://www.wellsfargo.com/ Logo

Wells Fargo

Location Icon

Location:
United States , IRVING

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a Senior Manager of Kubernetes Observability to provide strategic leadership for the design, standardization, and scaled execution of our enterprise observability ecosystem across Kubernetes and OpenShift platforms, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). This role is responsible for ensuring a robust, unified, and automated observability platform that enables reliability, performance, and operational excellence across all clusters and workloads in hybrid and multi‑cloud environments. As a senior technology leader, you will define the long‑term vision and operating model for metrics, logging, tracing, eventing, and monitoring standards across on‑prem, cloud‑managed, and hosted Kubernetes platforms. You will guide multiple engineering teams to execute consistently against this strategy, ensuring full instrumentation, proactive issue detection, reduced MTTR, and improved platform stability. Through strong architectural direction, organizational alignment, and focused mentorship, you will elevate engineering maturity and ensure developers and SREs have actionable insights that accelerate innovation and support enterprise growth at scale.

Job Responsibility:

  • Define the target‑state vision and multi‑year roadmap for observability across Kubernetes, OpenShift, AKS, and GKE, including metrics, logging, tracing, eventing, and alerting standards
  • Establish a unified observability operating model that ensures consistency, scalability, and reuse across on‑prem, cloud‑managed, and multi‑cloud Kubernetes environments
  • Define success metrics and outcomes that measure observability effectiveness, reliability improvements, and reductions in MTTR across all platforms
  • Set architectural direction for enterprise observability platforms, tooling, and telemetry pipelines across Kubernetes, OpenShift, AKS, and GKE
  • Establish standardized instrumentation patterns for clusters, workloads, control planes, and platform services, ensuring complete and consistent telemetry coverage regardless of Kubernetes distribution or cloud provider
  • Drive convergence toward unified observability frameworks that abstract provider‑specific differences while preserving deep platform insight
  • Drive automation of observability onboarding and telemetry workflows across Kubernetes, AKS, and GKE to reduce manual effort and accelerate adoption
  • Enable self‑service observability capabilities that allow developers and SREs to easily instrument, monitor, and troubleshoot workloads across cloud and on‑prem clusters
  • Ensure observability is embedded by default into platform, infrastructure‑as‑code, and application delivery pipelines
  • Enable proactive issue detection through scalable alerting frameworks, actionable dashboards, and standardized monitoring practices across all Kubernetes platforms
  • Improve reliability and performance visibility for workloads running on OpenShift, AKS, and GKE, reducing reliance on reactive troubleshooting
  • Partner with SRE and operations teams to continuously improve incident response, post‑incident learning, and preventative engineering across hybrid and multi‑cloud environments
  • Lead, mentor, and develop engineering leaders and teams responsible for observability platform components and services
  • Align platform, SRE, cloud, and application teams around shared observability standards and operational goals across Kubernetes, AKS, and GKE
  • Strengthen cross‑team collaboration and engineering rigor to raise overall organizational maturity in observability and operations

Requirements:

  • 6+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 3+ years of management or leadership experience
  • 5+ years of experience in platform engineering, reliability engineering, or observability‑focused technical leadership roles, or equivalent demonstrated experience
  • 6+ years of Grafana & Splunk
  • 5+ years of experience with Kubernetes observability concepts, including metrics, logging, tracing, eventing, and monitoring platforms, across OpenShift, AKS, and GKE

Nice to have:

  • 6+ years of people management or senior technical leadership experience guiding multiple engineering teams
  • Demonstrated success defining and scaling enterprise observability platforms across large, multi‑cloud Kubernetes environments
  • Strong understanding of SRE, operational excellence, and reliability engineering practices
  • Experience driving automation and standardization to reduce MTTR and operational toil
  • Proven ability to influence across platform, infrastructure, cloud, and application teams
  • Strong executive communication skills, including the ability to articulate strategy, tradeoffs, and outcomes to senior stakeholders

Additional Information:

Job Posted:
March 19, 2026

Expiration:
March 21, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Manager of Kubernetes Observability

Senior Product Manager (SaaS)

This software platform is on a mission to make data less of a headache and more ...
Location
Location
United Kingdom , London
Salary
Salary:
100000.00 - 120000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Worked as a product manager in a SaaS organisation before
  • Knowing how to prioritise
  • Find opportunities in ambiguity
  • Rally teams around a plan
  • Love for user experience
  • Comfort with technical details
  • Experience with containerised platforms using Kubernetes
  • Experience with databases
  • Experience with observability tools such as Prometheus and OpenTelemetry
Job Responsibility
Job Responsibility
  • Turn customer needs into smart roadmaps and products people actually enjoy using
  • Work with design and engineering to bring ideas to life
  • Keep an eye on the market
  • Make sure strategy always stays sharp
  • Team up with marketing, sales, and support to ensure features launch with impact
  • Solve problems before they become roadblocks
  • Fulltime
Read More
Arrow Right

Senior Observability Engineer

Coralogix is a modern, full-stack observability platform transforming how busine...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Site Reliability, DevOps, or Platform Engineering with a focus on observability
  • Proven expertise with at least one major observability platform (e.g., Prometheus, Victoria Metrics, OpenSearch)
  • Hands-on experience with Kubernetes, including deep knowledge of controllers, operators, and Helm
  • Experience writing Kubernetes controllers (controller-runtime, KubeBuilder)
  • Strong programming skills in Go or Python (Rust is a plus)
  • Experience designing, scaling, and operating observability systems at enterprise scale
  • Familiarity with at least one major cloud provider (AWS, Azure, or GCP)
  • Strong understanding of distributed systems, telemetry pipelines, and instrumentation standards (e.g., OpenTelemetry)
  • Excellent communication skills with the ability to explain complex topics to diverse stakeholders
Job Responsibility
Job Responsibility
  • Design, implement, and maintain observability features such as Alerting, SLOs, Reporting, and Synthetic Tests
  • Manage and scale OpenTelemetry Collectors and other observability agents across Kubernetes environments
  • Write and maintain Kubernetes Controllers using frameworks like controller-runtime and KubeBuilder
  • Operate and optimize the internal Coralogix account, ensuring proper usage, cost efficiency, and best practices adoption
  • Define and enforce observability guidelines and standards across the organization
  • Partner with engineering teams to embed observability by default into products and services
  • Control observability-related costs while maximizing performance, visibility, and value
  • Contribute to upstream projects such as OpenTelemetry, helping shape industry standards
  • Explore and implement cutting-edge observability technologies, including eBPF-based approaches
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Platform Engineering (Developer Experience)

Everlaw is seeking a Senior Engineering Manager, Platform to lead teams focused ...
Location
Location
United States , Oakland, California
Salary
Salary:
219000.00 - 277000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years as a senior engineer building developer productivity tools and/or highly available platform services (e.g., storage, pub-sub, search, caching, observability) and/or deep experience with infrastructure/cloud technologies (e.g., Terraform, Kubernetes, Docker)
  • 3+ years of experience directly managing software engineers and/or technical leads, including hiring, coaching, performance management, and growing a high-performing team
  • 2+ years of experience building and leading developer experience or platform teams/programs that deliver internal platforms and tooling with measurable productivity outcomes (e.g., faster builds/tests, improved CI/CD lead times, higher deployment frequency)
  • Experience managing scalable database infrastructure (e.g., Postgres, MySQL or equivalent)
  • Can communicate at the right altitude with both technical and non-technical stakeholders, and you’ve led cross-functional roadmaps with Engineering Operations, Security Engineering, DevOps, Product, and Design
  • Authorized to work in the United States. Please note that currently, Everlaw is not sponsoring employment visas.
Job Responsibility
Job Responsibility
  • Lead platform teams that build and evolve core internal platforms and developer tooling—spanning build/test infrastructure, CI/CD, and developer workflows—to improve engineer productivity and time-to-value
  • Collaborate closely with Engineering Operations, Security Engineering, DevOps, Product, and Design to synthesize requirements and prioritize impactful investments
  • Drive roadmapping, resourcing, and execution for critical platform areas that make it better and cheaper to develop, test, and release software
  • Establish and use developer efficiency metrics (e.g., build/test times, deploy lead time, change failure rate) to identify bottlenecks and plan ambitious improvements to workflows
  • Ensure operational excellence for platform services and tooling with clear SLOs, robust observability, and incident/bug management practices
  • Coach and develop engineers and leads
  • provide actionable feedback, elevate technical execution, and foster an inclusive, high-accountability culture
  • Partner with Engineering Operations to improve processes for alignment, goal setting, empowerment, and cross-team execution across Engineering
  • Communicate effectively with both technical and non-technical stakeholders, adjusting altitude from strategy to technical deep dives as needed.
What we offer
What we offer
  • Medical
  • dental
  • wellness program
  • paid parental leave
  • professional development
  • fully stocked kitchen
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Portfolio Management

As an experienced Senior Software Engineer you will help build our flagship Clea...
Location
Location
United States , New York
Salary
Salary:
170000.00 - 220000.00 USD / Year
clearstreet.io Logo
Clear Street
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least seven (7) years of professional experience implementing highly scalable services in Java/SpringBoot using both multi-threaded and asynchronous processing patterns
  • Strong SQL skills, query plan analysis and optimization skills/tactics
  • Build JSON parsing/validation/transform pipelines (JSON Schema), including custom adapters/codecs (preference of GSON over Jackson)
  • Fundamental understanding of OLAP/OLTP workflows, and row oriented / column oriented database design choices
  • Model and operate Redis beyond KV: streams, pub/sub, hashes, sorted sets, Lua, eviction & persistence tradeoffs
  • Production debugging instincts: can trace failures across the layers of a system, understand /proc, syscalls, and debug latency related issues
  • Familiar with Kubernetes, Docker, and Linux
  • Strong command over design patterns, data structures, and algorithms
  • Solid with git, understand branching, rebasing, and dealing with issues
Job Responsibility
Job Responsibility
  • Help build our flagship Clear Street Portfolio Management platform
  • Tackle non trivial problems that force you to balance trade offs while implementing clean and efficient solutions
  • Build core services for our world-class financial platform designed to handle all aspects of client needs while maintaining a high SLA
  • Own and harden the ingestion, validation and persistence of high-volume data products across our Portfolio Management platform
  • Turn ambiguous, cross-team pain into deterministic, observable systems
  • Develop a wide range of services, from user authentication and authorization to client data delivery
  • Solve complex problems that will challenge your system design skills, implement clean and efficient code, and simplify complexity through feature and service design
  • Mentor teammates, evolve our technical standards and best practices, and promote a culture of system design
What we offer
What we offer
  • Competitive compensation
  • Company equity
  • 401k matching
  • Gender neutral parental leave
  • Full medical, dental and vision insurance
  • Lunch stipends
  • Fully stocked kitchens
  • Happy hours
  • Fulltime
Read More
Arrow Right

Senior Golang Developer

In the HPE Hybrid Cloud, we lead the innovation agenda and technology roadmap fo...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-10 years of experience, out of which at least 4 years of experience in Golang developer
  • Strong proficiency in Golang and a deep understanding of its internals
  • Experience with the full site of Go frameworks and tools, including Dependency management tools such as Godep, Go Modules and Vendoring, Go’s templating knowledge, Go test frameworks and Go profiling tools
  • Should be familiar and have hands on experience in containerization and orchestration tools (Ex: Docker, Kubernetes)
  • Good to have knowledge on Open Telemetry protocol, GRPC, HTTP 2.0
  • Experience on microservices architecture
  • Good to have knowledge on proxies like Envoy
  • Good to have knowledge on Kafka, Apache pulsar
  • Good to have knowledge on Clickhouse or any other Big data analytics tools
  • Should have good understanding of software development best practices, version control tools - such as Git, SVN and continuous integration
Job Responsibility
Job Responsibility
  • You write scalable, robust, testable, efficient, and easily maintainable code
  • You write code to ingest observability data points like Logs, Traces and metrics and write APIs to provide data for analytics
  • Responsible for development, support and maintenance of one or more modules related to observability data points
  • Optimize the application for maximum speed and scalability
  • Collaborate with front-end developers to integrate user-facing elements with server-side logic
  • Work closely with DevOps and infrastructure teams to deploy and monitor applications in production environments
  • Involving in life-cycle project delivery work that includes requirement analysis, designing, building, testing and deployment
  • Analyse use cases and requirements to design secure and scalable solutions
What we offer
What we offer
  • Health & Wellbeing
  • Comprehensive suite of benefits that supports their physical, financial and emotional wellbeing
  • Personal & Professional Development
  • Career programs catered to helping you reach career goals
  • Unconditional Inclusion
  • Flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Technology Outbound Product Manager

Join the innovators of OpsRamp as its technology product management leader, resp...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in marketing, engineering, computer science, or a related field
  • MBA or advanced technical degree preferred
  • 4+ years of experience in technical marketing, product marketing, or product management, or pre-sales in observability, ITOM, log management, SaaS and enterprise software, or IT infrastructure industries
  • Knowledge/experience with SaaS software preferred
  • Public cloud experience is a plus
  • Knowledge of application modernization (e.g., Kubernetes), automation (python, pipelines, PowerShell, etc.) is a plus
  • Proven track record of developing and executing successful GTM strategies and campaigns that drive awareness, demand generation, and market leadership
  • Excellent written and verbal communication skills, with the ability to distill complex technical concepts into clear, concise, and compelling messaging and content
  • Strong analytical skills and experience conducting market and competitive analysis to identify key trends, insights, and opportunities
  • Ability to work effectively in a fast-paced, dynamic environment with cross-functional teams and multiple stakeholders
Job Responsibility
Job Responsibility
  • Develop and execute technical evangelizing strategies to drive awareness, demand generation, and market leadership for OpsRamp solutions
  • Collaborate with product management and engineering teams to deeply understand product features, capabilities, and roadmaps, and translate them into compelling value propositions, messaging, and content
  • Create and maintain a wide range of technical collateral, including whitepapers, solution briefs, presentations, videos, demos, and blog posts
  • Drive the creation and delivery of technical enablement materials to support technical sales, partners, and customers, including training presentations, FAQs, and technical guides
  • Conduct market and competitive analysis to identify key trends, insights, and opportunities to differentiate OpsRamp in the ITOM market
  • Serve as a technical evangelist and spokesperson for OpsRamp at industry events, conferences, webinars, and customer meetings
  • Collaborate with product marketing and corporate marketing teams to develop technical content that drives engagement, leads, and pipeline
  • Gather key customer and target audience insights to inform product positioning and messaging as well as the product roadmap
  • Contribute to GTM strategy and messaging, and help maintain technical accuracy of marketing messages.
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

As a Site Reliability Engineer, you will focus on ensuring that the Prolific pla...
Location
Location
United Kingdom
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years with Google Cloud Platform, GKE, and the Kubernetes ecosystem with experience with Terraform and Terragrunt
  • Strong programming skills in Python
  • Strong experience in observability principles and tooling
  • Experience in GitOps flows and platforms for Kubernetes, such as ArgoCD
  • Deep understanding of system architecture and scalability principles
  • Strong collaboration and communication skills to work with cross-functional teams
Job Responsibility
Job Responsibility
  • Develop and maintain highly available infrastructure using modern infra-as-code techniques, with a focus on terragrunt and terraform
  • Manage and optimise Kubernetes clusters and their workloads with a focus on reliability and performance
  • Participate in incident response and remediation, working with relevant product teams and stakeholders to resolve production issues efficiently, including creating and maintaining runbooks
  • Review and optimise other areas of our tooling stack, such as CICD or release strategies
  • Foster a culture of continuous improvement, such as enhancing documentation and upskilling teams in cloud architecture and kubernetes
  • Improve observability and alerting systems across our application and infrastructure, ensuring proactive detection of system degradation
  • Collaborate with Engineering teams to foster an SRE culture, including contributing defining SLO’s, SLA’s and error budgets
  • Design and implement automation strategies to ensure managed services remain up-to-date, secure, and performant
  • Lead and support initiatives that automate processes to improve system efficiency, resilience and reduce toil
  • Organising, supporting and responding to on-call incidents
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
  • impactful, mission-driven culture
  • Fulltime
Read More
Arrow Right

Senior Principal Cloud Developer

The role involves designing and building innovative Agentic AI applications and ...
Location
Location
United States , San Jose
Salary
Salary:
157500.00 - 361500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years of experience in developing highly scalable cloud and cloud-native applications using technology stacks, architecture, design, development, and support
  • at least one year of recent multi-agent Agentic and RAG GenAI Software Development experience applied to Networking and/or Observability domains
  • experience developing Network Observability software for large scale Network Monitoring, Network Performance, Network Configuration or Network Capacity Management products
  • deep understanding and experience in Networking Protocol and Networking Best Practices for Enterprise and Service Provider networks
  • proven skills and programming experience in Golang, scalable concurrent processing, REST, Data Caching Services, DB schema design and data access technologies
  • experience in building, orchestrating, and deploying highly scalable REST based stateless APIs/web services for web applications in Kubernetes environment
  • familiarity with code versioning tools such as Git
  • knowledge of Network and NetFlow Logs processing and indexing
  • ability to communicate with senior Executives and with customers
Job Responsibility
Job Responsibility
  • design and build large scale distributed systems
  • apply best practices for high availability, scalability, resilience, performance, and security requirements in the cloud
  • transition proof-of-concept implementations into R&D teams to accelerate new product delivery
  • create technical content such as designs, specifications, and initial software implementations
  • mentor less-experienced staff members
  • collect product feedback from field interactions to provide input into Engineering and Product Management
  • maintain knowledge of OpsRamp SaaS product and roadmap, as well as competition
  • collaborate with product team to translate functional requirements into technical solutions
  • develop monitoring solutions using tools and services that are part of the cloud infrastructure
  • facilitate CI/CD by integrating development processes
What we offer
What we offer
  • comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • personal and professional development programs
  • unconditional inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right