CrawlJobs Logo

Lead Observability Platform Engineer

https://www.cvshealth.com/ Logo

CVS Health

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

106605.00 - 284280.00 USD / Year

Job Description:

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time. POSITION SUMMARY Join CVS Health Enterprise Technology and help evolve observability at Fortune‑6 scale. The Enterprise Observability Platform (EOP) delivers standardized, frictionless instrumentation and telemetry pipelines for engineering teams across all CVS Health application environments—spanning on‑prem, hybrid, and multiple public clouds. As a Lead Observability Platform Engineer, you will design, build, and operate large‑scale observability services that process billions of logs, metrics, and traces daily. You will develop high‑performance backend services using Go, Java, and Node.js, and lead the adoption of OpenTelemetry-based instrumentation and standards across the enterprise. In this role, you will partner closely with SRE, Cloud Engineering, CI/CD, Infrastructure, Security, and application teams to shape platform strategy, enhance developer experience, and ensure reliable, secure, and cost‑efficient observability at scale. You will provide senior technical leadership, influence architectural direction, and help deliver a world‑class, self-service observability ecosystem that accelerates engineering productivity and operational excellence.

Job Responsibility:

  • Design, build, and operate core observability platform services using Go, Java (Spring Boot), and Node.js
  • Lead enterprise-wide adoption of OpenTelemetry, including client libraries, semantic conventions, instrumentation patterns, and Collector/agent strategy
  • Architect and scale high-throughput, fault-tolerant telemetry pipelines (logs, metrics, traces) with a focus on performance, reliability, and cost efficiency
  • Develop self-service observability capabilities that simplify onboarding, troubleshooting, and adoption for application teams
  • Implement end-to-end monitoring of the observability platform itself, defining SLOs, health checks, and alerting
  • Collaborate with SRE, Platform, and Cloud teams to establish reliability standards, error budgets, and incident response practices
  • Participate in on-call rotations and lead incident mitigation, root-cause analysis, and post-incident reviews
  • Automate operational workflows and eliminate manual toil through tooling, CI/CD enhancements, and platform automation
  • Ensure secure telemetry pipelines through mTLS, secrets management, and zero-trust design patterns
  • Produce and maintain high-quality technical documentation, standards, and best practices
  • Engage with internal engineering teams to gather requirements, influence roadmap prioritization, and deliver platform improvements
  • Provide technical leadership through mentorship, design reviews, architectural guidance, and cross-team collaboration with principal engineers and engineering leadership

Requirements:

  • 7+ years of experience in Software Engineering, Platform Engineering, or SRE
  • 5+ years of experience with observability practices, including SLIs/SLOs/SLAs, alerting, and incident management
  • 5+ years building production-grade backend services in Go and/or Java
  • 5+ years implementing and operating OpenTelemetry, including OTLP, semantic conventions, and instrumentation patterns
  • 5+ years with cloud-native and containerized platforms (Docker, Kubernetes, Argo CD)
  • 5+ years working with public cloud platforms (AWS, GCP, or Azure)
  • 3+ years designing and scaling distributed, high-volume data pipelines
  • 3+ years working with Grafana OSS or comparable observability backends (e.g., Grafana, Loki, Tempo, Mimir)
  • 3+ years with relational databases (PostgreSQL, MySQL)

Nice to have:

  • Experience with service meshes and networking technologies such as Envoy and Istio
  • Experience integrating or operating commercial observability platforms (Datadog, New Relic, AppDynamics, etc.)
  • Experience with streaming and data platforms such as Kafka, Pulsar, or similar technologies
  • Familiarity with time-series, NoSQL, or analytical databases (ClickHouse, Bigtable, Cassandra, etc.)
  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation
  • Experience with cost optimization and capacity planning for large-scale telemetry systems
  • Experience with chaos engineering, resiliency testing, or fault injection
  • Background in security-aware platform design, including secure service-to-service communication
  • Experience mentoring senior engineers and influencing platform standards across organizations
  • Strong operational experience supporting 24x7 production systems, including on-call responsibilities
  • Strong technical communication and cross-team collaboration skills
  • Experience operating in regulated or compliance-heavy environments (e.g., healthcare, finance)
What we offer:
  • Medical, dental, and vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • Bonus, commission or short-term incentive program
  • Equity award program

Additional Information:

Job Posted:
April 24, 2026

Expiration:
April 30, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Lead Observability Platform Engineer

Senior Software Engineer, Platform Observability

Everlaw is looking for a Senior Software Engineer that brings experience in buil...
Location
Location
United States , Oakland
Salary
Salary:
164000.00 - 208000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science, or equivalent coursework
  • At least 3 years of experience building logging, metrics, and tracing infrastructure
  • Proficiency in coding in a language such as C, C++, C#, Java, Python, Javascript, Go or Rust
  • Experience with Infrastructure as Code and container solutions to manage cloud environments (ex: Terraform, Ansible, Docker, etc)
  • At least 1 year of experience leading multi-developer efforts, including planning, technical breakdown, and coordination
  • Excellent communication and collaboration skills
  • Please note that at this time, Everlaw is not sponsoring U.S. employment visas for this role. Due to federal contract requirements, Everlaw may only hire US citizens for this position.
Job Responsibility
Job Responsibility
  • Build observability strategies to support application and infrastructure metrics, logs, traces, dashboards, and alerts
  • Develop and maintain infrastructure as code (IAC) using tools such as Terraform and Ansible
  • Monitor usage trends to identify opportunities to optimize efficiency and performance of our metrics database and logging tools
  • Improve our on-call and incident management processes by encouraging deeper understanding, communication, and trust
  • Support developer projects by influencing design and implementation of infrastructure features as well as providing technical guidance
  • Support compliance efforts by promoting continuous documentation of our processes and involvement in audits
  • Provide Technical Mentorship to other engineers by both sharing your technical knowledge and becoming an expert in an area of our code base.
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Uptown Oakland, just steps from the BART line and dozens of restaurants and walking distance to Lake Merritt
  • Fulltime
Read More
Arrow Right

Director of Engineering, Platform Engineering

In your role as ‘Director of Engineering, Platform Engineering’ you will guide t...
Location
Location
United States , Oakland, California
Salary
Salary:
241000.00 - 305000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 4 years of experience managing and leading senior engineers, including technical workstream management and execution support
  • At least 2 years of experience managing and leading managers, coaching them on talent management, strategic planning, and execution, with a focus on platform engineering teams
  • At least 5 years of experience as a senior engineer building one or more of - developer productivity tools, highly available platform services (i.e. storage systems, pub-sub systems, search systems, caching solutions, observability solutions) and/or have expertise and experience with infrastructure and/or cloud technologies (like Ansible, Terraform, Kubernetes, Docker etc)
  • You have a good dynamic range that you apply to different situations - you can step back and empower, while also diving deep into the code to understand the details
  • You can communicate at the right altitude with both technical and non-technical stakeholders
  • You have experience working with stakeholder teams (internal and/or external) in setting and collaborating on technical roadmaps
  • You have experience communicating with customers articulating to them how the platform works on reliability, security and compliance matters
  • You have a BS/MS or PhD in Computer Science (or equivalent)
  • You have a sound foundational understanding of a wide range of computer science topics and concerns relating to system and software design
  • You are authorized to work in the United States
Job Responsibility
Job Responsibility
  • Inspire and empower your managers to cultivate high-performing teams, fostering a culture of continuous feedback and professional growth to ensure successful project delivery and career development
  • Use your technical knowledge to align stakeholders across Engineering and Product on the ideal path forward on complex technical decisions and roadmap decisions
  • Strategize, prioritize, resource, and execute against our Engineering roadmap
  • Work with Engineering Operations, cross-functional teams, team members and managers to improve various processes that affect infrastructure growth, support, alignment, collaboration, and accountability
  • Critically observe and understand Everlaw’s platform, tooling, and processes
What we offer
What we offer
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Paid parental leave and approximately 10 days (80 hours) per year of sick leave
  • Seventeen paid vacation days plus 11 federal holidays
  • Membership to Modern Health to help employees prioritize mental health and wellness
  • Annual allocation for Learning & Development opportunities and applicable professional membership dues
  • Company-sponsored life and disability insurance
  • Work in Downtown Oakland, just steps from the BART line and dozens of restaurants
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Platform Engineering (Developer Experience)

Everlaw is seeking a Senior Engineering Manager, Platform to lead teams focused ...
Location
Location
United States , Oakland, California
Salary
Salary:
219000.00 - 277000.00 USD / Year
everlaw.com Logo
Everlaw
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years as a senior engineer building developer productivity tools and/or highly available platform services (e.g., storage, pub-sub, search, caching, observability) and/or deep experience with infrastructure/cloud technologies (e.g., Terraform, Kubernetes, Docker)
  • 3+ years of experience directly managing software engineers and/or technical leads, including hiring, coaching, performance management, and growing a high-performing team
  • 2+ years of experience building and leading developer experience or platform teams/programs that deliver internal platforms and tooling with measurable productivity outcomes (e.g., faster builds/tests, improved CI/CD lead times, higher deployment frequency)
  • Experience managing scalable database infrastructure (e.g., Postgres, MySQL or equivalent)
  • Can communicate at the right altitude with both technical and non-technical stakeholders, and you’ve led cross-functional roadmaps with Engineering Operations, Security Engineering, DevOps, Product, and Design
  • Authorized to work in the United States. Please note that currently, Everlaw is not sponsoring employment visas.
Job Responsibility
Job Responsibility
  • Lead platform teams that build and evolve core internal platforms and developer tooling—spanning build/test infrastructure, CI/CD, and developer workflows—to improve engineer productivity and time-to-value
  • Collaborate closely with Engineering Operations, Security Engineering, DevOps, Product, and Design to synthesize requirements and prioritize impactful investments
  • Drive roadmapping, resourcing, and execution for critical platform areas that make it better and cheaper to develop, test, and release software
  • Establish and use developer efficiency metrics (e.g., build/test times, deploy lead time, change failure rate) to identify bottlenecks and plan ambitious improvements to workflows
  • Ensure operational excellence for platform services and tooling with clear SLOs, robust observability, and incident/bug management practices
  • Coach and develop engineers and leads
  • provide actionable feedback, elevate technical execution, and foster an inclusive, high-accountability culture
  • Partner with Engineering Operations to improve processes for alignment, goal setting, empowerment, and cross-team execution across Engineering
  • Communicate effectively with both technical and non-technical stakeholders, adjusting altitude from strategy to technical deep dives as needed.
What we offer
What we offer
  • Medical
  • dental
  • wellness program
  • paid parental leave
  • professional development
  • fully stocked kitchen
  • Equity program
  • 401(k) retirement plan with company matching
  • Health, dental, and vision
  • Flexible Spending Accounts for health and dependent care expenses
  • Fulltime
Read More
Arrow Right

Platform Engineer, Agent Collaboration Platform

Platform engineering at Hebbia is about excellent, scalable enablement. You are ...
Location
Location
United States , New York City; San Francisco
Salary
Salary:
160000.00 - 300000.00 USD / Year
hebbia.ai Logo
Hebbia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Data Science, Statistics, or a related field
  • 5+ years software development experience at a venture-backed startup or top technology firm, with a focus on distributed systems and platform engineering
  • Proficiency in building backend and distributed systems using technologies such as Python, Java, or Go
  • Deep understanding of scalable system design, performance optimization, and resilience engineering
  • Extensive experience with cloud platforms (e.g., AWS)
  • Working experience with one or more of the following: Kafka, ElasticSearch, PostgreSQL, and/or Redis
  • Knowledge of workflow orchestration and execution platforms like Airflow, Temporal or Prefect
  • Proven experience enabling observability patterns
  • Ability to analyze complex problems, propose innovative solutions, and effectively communicate technical concepts
  • Proven experience in leading software development projects and collaborating with cross-functional teams
Job Responsibility
Job Responsibility
  • Own critical system components: Take complex requirements and turn them into robust, scaled solutions that solve real customer needs
  • Unlock O(1) universal indexing: Build and iterate on our high-scale document build system that enables constant time latency for indexing any content in the world, regardless of data volume
  • Drive performance optimization: Architect and implement performance-tuning solutions to ensure our systems operate efficiently at scale, minimizing latency and maximizing throughput across millions of documents
  • Mentor and guide: Provide technical leadership, mentorship, and guidance to junior engineers, fostering a culture of learning and growth
What we offer
What we offer
  • PTO: Unlimited
  • Insurance: Medical + Dental + Vision + 401K
  • Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late
  • Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent
  • Fertility benefits: $15k lifetime benefit
  • New hire equity grant: competitive equity package with unmatched upside potential
  • Fulltime
Read More
Arrow Right

Engineering Manager - Product & Platform

Arize AX is the AI & Agent Engineering Platform – one place to develop, evaluate...
Location
Location
United States
Salary
Salary:
180000.00 - 250000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You’ve been both an IC and a manager – you know what great engineering looks like at the system and code level, and you know how to build and lead teams
  • Strength across both product/fullstack engineering and backend/infrastructure – comfortable moving between customer-facing workflows and the systems that power them
  • Experience balancing people leadership, project execution, and technical depth
  • Clear, direct communicator who builds trust across teams
Job Responsibility
Job Responsibility
  • Contribute as an engineer on complex product and infrastructure challenges – building features that customers touch and scaling the systems behind them
  • Lead a team of engineers and tech leads – hiring, mentoring, and creating the conditions for them to do the best work of their careers
  • Drive projects end-to-end – ensuring scope is clear, trade-offs are well understood, and delivery is predictable
  • Work cross-functionally with Product and Design to set direction, and with Solutions and Support to make sure we’re solving the real problems our customers face in production
What we offer
What we offer
  • medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, others for mental and wellness support, competitive equity package, WFH monthly stipend to pay for co-working spaces
  • Fulltime
Read More
Arrow Right

Solutions Engineering Lead

We are hiring a Solutions Engineering Team Lead for the East region to scale and...
Location
Location
United States , Boston
Salary
Salary:
220000.00 - 300000.00 USD / Year
coralogix.com Logo
Coralogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in customer-facing technical roles (Sales Engineering, Solutions Architecture or similar)
  • 3+ years leading or managing pre-sales technical teams with a record of coaching success
  • Experience supporting or owning team-level quotas within a sales organization
  • Hands-on expertise with the following: Kibana, Grafana, Datadog, New Relic, Splunk, Honeycomb, Jaeger, OpenSearch
  • Proficiency crafting PromQL, Lucene and SQL queries for troubleshooting and dashboards
  • Deep knowledge of cloud services central to observability: AWS: EKS, Fargate, Lambda, CloudFormation, CloudWatch Logs and Metrics
  • Azure Monitor and equivalents in Google Operations Suite
  • Working knowledge of OpenTelemetry, modern DevOps and container platforms (Kubernetes, Docker)
  • Strong ability to communicate with engineers and C-level audiences alike
Job Responsibility
Job Responsibility
  • Own regional SE performance in partnership with Account Executives, ensuring quota attainment and deal velocity
  • Hire, onboard and mentor Solutions Engineers, setting clear KPIs and career paths
  • Maintain a strong personal presence with customers, modeling technical excellence and closing strategic opportunities
  • Improve processes for discovery, POC execution, documentation and knowledge sharing
  • Collaborate with Product, Support and Customer Success to shorten feedback loops and accelerate adoption
  • Architect and deploy reference designs for logs, metrics, traces, SIEM and Kubernetes monitoring across AWS, Azure and GCP
  • Lead white-board deep-dive sessions on ingestion pipelines, index-free querying and cost-optimized retention strategies
  • Provide escalation support during POCs: troubleshoot complex issues, analyze logs, traces, craft PromQL, Lucene or Dataprime queries and isolate root causes
  • Track technical success metrics such as POC win rate, onboarding time-to-value and validation scorecards, converting data insights into process improvements
  • Contribute code or scripts (Python, Go or Java) for custom exporters, automation and synthetic monitoring
What we offer
What we offer
  • Comprehensive and inclusive employee benefits for healthcare, dental, and mental health benefits
  • 401(k) plan and match
  • Paid sick time and paid time off
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

The Client Environments team is the bridge between SpotOn’s cloud and the physic...
Location
Location
United States , Detroit
Salary
Salary:
170000.00 - 210000.00 USD / Year
mytennislessons.com Logo
MyTennisLessons
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead and mentor engineers across Network and Android (Elo) systems
  • Drive GitOps adoption for network and device configuration
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds)
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays)
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns
  • Own service quality metrics — uptime, response time, issue recurrence
Job Responsibility
Job Responsibility
  • Lead and mentor engineers across Network and Android (Elo) systems — building a strong culture of ownership and reliability
  • Drive GitOps adoption for network and device configuration, ensuring deployments are consistent, testable, and reversible
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds), ensuring clean provisioning and policy enforcement
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination through system changes, automation, and better visibility
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays) to eliminate variance across restaurant networks
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns through controllers
  • Own service quality metrics — uptime, response time, issue recurrence — and report progress on reliability improvements
What we offer
What we offer
  • Medical, Dental and Vision Insurance
  • 401k with company match
  • RSUs
  • Paid vacation, 10 company holidays, sick time, and volunteer time off
  • Employee Resource Groups to build community and inclusion at work
  • Monthly cell phone and internet stipend
  • Tuition reimbursement for up to $2,000 per calendar year to assist with your professional development
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

The Client Environments team is the bridge between SpotOn’s cloud and the physic...
Location
Location
United States , Austin
Salary
Salary:
170000.00 - 210000.00 USD / Year
mytennislessons.com Logo
MyTennisLessons
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead and mentor engineers across Network and Android (Elo) systems
  • Drive GitOps adoption for network and device configuration
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds)
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays)
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns
  • Own service quality metrics — uptime, response time, issue recurrence
Job Responsibility
Job Responsibility
  • Lead and mentor engineers across Network and Android (Elo) systems — building a strong culture of ownership and reliability
  • Drive GitOps adoption for network and device configuration, ensuring deployments are consistent, testable, and reversible
  • Oversee MDM and device lifecycle management (Elo tablets, Android handhelds), ensuring clean provisioning and policy enforcement
  • Run the operational loop: stay close to client incidents, analyze recurring issues, and drive root-cause elimination through system changes, automation, and better visibility
  • Collaborate with Core Services (Device Registry, MDM, Sidecar) and NOC to improve observability, alerting, and response workflows
  • Standardize configurations and rollout models (base + overlays) to eliminate variance across restaurant networks
  • Design for resilience: enable cellular failover, LTE monitoring, and automated recovery patterns through controllers
  • Own service quality metrics — uptime, response time, issue recurrence — and report progress on reliability improvements
What we offer
What we offer
  • Medical, Dental and Vision Insurance
  • 401k with company match
  • RSUs
  • Paid vacation, 10 company holidays, sick time, and volunteer time off
  • Employee Resource Groups to build community and inclusion at work
  • Monthly cell phone and internet stipend
  • Tuition reimbursement for up to $2,000 per calendar year to assist with your professional development
  • Fulltime
Read More
Arrow Right