CrawlJobs Logo

Director of Engineering - Sre & Operations

https://www.cvshealth.com/ Logo

CVS Health

Location Icon

Location:
United States , Wellesley

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

144200.00 - 288400.00 USD / Year

Job Description:

As the Director of Platform Engineering - SRE & Operations, you will guide the strategy, implementation, and ongoing maturity of reliability, availability, and operational excellence across key platforms within the DDAT organization. You will oversee the reliability of web, mobile, API, platform, and AI‑enabled systems, ensuring they are resilient, scalable, secure, and cost‑efficient. You will partner closely with the other engineering teams across CVS Health to embed SRE best practices and strengthen the resiliency, observability, and performance of our digital ecosystem.

Job Responsibility:

  • Contribute to and execute the SRE strategy, including definition and management of SLOs, SLIs, and error budgets
  • Establish and operationalize reliability standards across web, mobile, backend services, and data workloads
  • Champion a culture of reliability-by-design and continuous improvement within engineering teams
  • Drive adoption of AIOps capabilities for intelligent alerting, proactive issue detection, and predictive failure mitigation
  • Implement AI-assisted automation: incident triage, runbooks, root-cause analysis, and self-healing workflows
  • Collaborate with the AI Platform team to integrate LLMs and machine learning models into operational processes
  • Lead the observability roadmap spanning metrics, logs, traces, and experience monitoring
  • Define and standardize tooling and operational practices using Datadog, Splunk, Prometheus, Grafana, and OpenTelemetry
  • Deliver actionable dashboards and reporting for availability, performance, latency, and error budget consumption
  • Partner with the DevEx and Cloud Engineering teams to strengthen CI/CD reliability, safety, and automation
  • Promote progressive delivery (canary, blue/green, feature flags) to reduce deployment risk
  • Ensure quality gates, automated rollback, and deployment safeguards are consistently applied
  • Lead major incident response and escalation processes for critical digital platforms
  • Improve MTTD, MTTR, and reduce incident recurrence through preventive engineering and automation
  • Maintain operational readiness through runbooks, on‑call processes, and post‑incident learning
  • Ensure cloud reliability and scalability across On-Prem, Azure, and GCP environments
  • Collaborate with Finance and Platform teams to support FinOps practices, cost optimization, and capacity planning
  • Optimize performance and availability across high‑traffic, customer‑facing platforms
  • Lead and develop high-performing SRE teams, including managers, engineers, and technical specialists
  • Support career pathways, skill frameworks, and upskilling initiatives aligned to SRE disciplines
  • Foster a culture centered on ownership, accountability, curiosity, and continuous learning

Requirements:

  • 10+ years of experience in software engineering, platform operations, or site reliability engineering
  • 5+ years in leadership roles managing SRE, DevOps, or platform reliability teams at scale

Nice to have:

  • Experience using AI/ML capabilities in operations (anomaly detection, predictive alerting, log analysis, automated remediation)
  • Hands‑on knowledge of AIOps platforms (e.g., Datadog Watchdog, Dynatrace Davis, Splunk AI, or custom ML/LLM tooling)
  • Deep expertise in cloud infrastructure, distributed systems, and high‑availability architectures
  • Strong understanding of SRE principles, DevOps practices, and modern reliability engineering
  • Experience running mission‑critical digital systems with large-scale user traffic
  • Effective communication and stakeholder influence skills, including with senior technology leaders
  • Experience working in regulated industries (e.g., healthcare, financial services, insurance)
  • Demonstrated success collaborating with platform engineering, AI teams, architecture, and cross-functional technical organizations
  • Master's degree preferred
What we offer:
  • CVS Health bonus, commission or short-term incentive program
  • equity award program
  • medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs

Additional Information:

Job Posted:
May 15, 2026

Expiration:
May 22, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Director of Engineering - Sre & Operations

Director of Engineering Operations

At WHOOP, we’re on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
200000.00 - 245000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in software operations, technical program management, or engineering leadership within high-growth technology organizations
  • Proven success scaling engineering teams
  • Deep understanding of software development lifecycles (SDLC), Agile/DevOps practices, and incident management frameworks
  • Demonstrated experience defining and operationalizing engineering metrics (e.g., DORA, cycle time, developer experience)
  • Strong analytical and data-driven mindset, with the ability to build dashboards, measure outcomes, and inform decisions through data
  • Exceptional communication and collaboration skills, with experience influencing across all levels of an organization
  • Hands-on operator who thrives in ambiguity and can build systems from the ground up before scaling them
  • Experience managing budgets, vendor relationships, and cross-functional programs
  • Strong commitment to embracing and leveraging AI tools in day-to-day tasks, ensuring AI-assisted work aligns with the same high-quality standards as personal contributions
Job Responsibility
Job Responsibility
  • Engineering Metrics & Continuous Improvement - Define, implement, and manage engineering performance metrics (e.g., DORA, delivery throughput, cycle time). Translate insights into action through reviews, tracking, and reporting to leadership. Partner with Platform and QA teams to ensure data pipelines and dashboards are accurate and actionable
  • Hiring, Onboarding & Growth Enablement - Collaborate with Recruiting and leadership to improve hiring velocity and maintain quality. Build and continuously refine the engineering onboarding experience to reduce ramp-up time and increase new hire satisfaction. Design systems that help new engineers quickly understand WHOOP’s tools, systems, and culture
  • Incident Management & Reliability - Oversee the incident lifecycle — ensuring consistent classification, escalation, postmortems, and follow-through. Build accountability for reducing repeat incidents and improving reliability metrics. Collaborate with Platform and SRE teams on root-cause visibility and operational readiness
  • Operational Excellence & Cadence - Own and facilitate the operational rhythm of the software organization, including leadership syncs, Engineering Manager forums, quarterly reviews, and Software All-Hands. Establish feedback systems that promote alignment, accountability, and clear communication across teams. Identify organizational bottlenecks and develop scalable, data-driven solutions to improve flow and focus
  • Governance, Compliance & Communication - Serve as a bridge between Software and GRC, Legal, and Finance to ensure operational alignment on compliance, documentation, and governance. Own the Builder (Engineering) Blog and internal communications, maintaining transparency through dashboards, documentation, and reports
  • Financial & Vendor Management - Manage budgets for developer tooling, DX platforms, and operational vendors. Partner with Finance and Legal on procurement, ROI analysis, and vendor strategy. Ensure investments deliver measurable improvements in productivity and quality
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Director, Service Reliability Engineering

As Director of SRE, you will lead the team responsible for accelerating and auto...
Location
Location
United States , Bethesda
Salary
Salary:
125600.00 - 203700.00 USD / Year
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate degree in computer science, software engineering, or a related field (or equivalent experience)
  • 10+ years of experience in SRE, devsecops or IT operations
  • At least 5 years’ experience in a previous leadership role within SRE, devsecops or IT Operations
  • At least five years of experience in the following technologies - Presentation Management: HTML, CSS, JS, Backbone, Node JS, Android, iOS, Application Platforms: NGINX, Java, Akana, Play Framework, Tomcat, Docker, Openshift, Application Data: PostgreSQL, Couchbase, Cassandra, Integration Services: Apache Kafka, Apache Spark, Akana, Analytics Platforms: Hadoop, dashDB, Cognos, Tableau, Security: Forgerock, OpenID, OAUTH, Ping Identity, Public Cloud: Azure, Google Cloud, AliCloud, Amazon Web Services, CI/CD: Harness
  • Experience with test automation
  • Working knowledge and proven track record of implementing disaster indifferent architecture
  • Experience with CDN and Akamai tools
  • Linux/Unix system administration experience
  • Proficient in scripting and programming languages (like Python, Go, Bash, Shell)
  • Hands on experience with infrastructure as code (like Terraform), container orchestration (like Kubernetes), and reliability automation
Job Responsibility
Job Responsibility
  • Define and execute Marriott’s SRE vision, aligning with business objectives and technology roadmaps
  • Build, mentor and lead a high-performing SRE team, fostering a culture of collaboration and innovation
  • Establish reliability, observability and automation goals to improve system uptime, performance and scalability
  • Partner with engineering, operations and security teams to drive best practices and continuous improvement
  • Implement reliability-focused engineering practices, including SLAs, SLOs/SLIs and error budgets
  • Design and maintain resilient, scalable and fault-tolerant architectures across cloud and hybrid environments
  • Develop strategies to proactively identify and mitigate risks to system performance and availability
  • Drive root cause analysis (RCA) and post-mortem processes to prevent recurring incidents
  • Champion automation in monitoring, deployment and incident resolution to reduce toil and enhance efficiency
  • Lead and optimize incident response processes, ensuring rapid detection, diagnosis, and resolution of system failures
What we offer
What we offer
  • Bonus program
  • comprehensive health care benefits
  • 401(k) plan with up to 5% company match
  • employee stock purchase plan at 15% discount
  • accrued paid time off (including sick leave where applicable)
  • life insurance
  • group disability insurance
  • travel discounts
  • adoption assistance
  • paid parental leave
  • Fulltime
Read More
Arrow Right

Director of Engineering

Kiddom is seeking a hands-on Director of Engineering to lead key parts of our en...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in software engineering
  • 5+ years leading and scaling teams in a SaaS environment
  • Has managed multiple teams working on a range of products
  • Strong background in building and operating distributed systems and cloud-native platforms
  • Experience delivering products that leverage AI/ML (for example, recommendations, automation, analytics) or working closely with data science/ML teams
  • Familiarity with modern data pipelines, security best practices, and observability
  • Comfortable diving into technical discussions and guiding teams on trade-offs, even if you are not writing code daily
  • Be able to be ‘in the weeds’ with the engineers driving technical decisions
Job Responsibility
Job Responsibility
  • Own delivery for a portfolio of product and platform areas, ensuring your teams ship high-quality software on time
  • Help define and drive adoption of engineering best practices
  • Contribute to technical decision-making on architecture, frameworks, and key platform investments
  • Lead, coach, and grow engineering managers and senior engineers across backend, frontend, DevOps, data, and AI-adjacent teams
  • Implement and refine processes for planning, estimation, documentation, and cross-team collaboration
  • Partner with the engineering leadership on career ladders, performance management, and succession planning within engineering
  • Work with Product, Data, and Curriculum teams to deliver AI-powered features that improve personalization, grading, insights, and curriculum recommendations
  • Help teams integrate AI capabilities into existing workflows in ways that are intuitive, reliable, and impactful for educators and students
  • Partner with platform and SRE teams to improve CI/CD pipelines, observability, and cloud cost efficiency
  • Work with data engineering to design and maintain scalable, well-modeled data pipelines that power AI and analytics features
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year.
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State.
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

Senior Director of Platform Engineering

Lead the Future of Platform Engineering at Modus Create. As Senior Director of P...
Location
Location
United States of America
Salary
Salary:
Not provided
moduscreate.com Logo
Modus Create
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Platform Engineering/DevOps
  • 7+ years in senior engineering leadership
  • ideally in consulting or high-growth tech environments
  • a clear point of view on modern architecture, engineering best-practices, and agile delivery
  • proven experience scaling distributed global teams and platform engineering operations
  • strong pre-sales and delivery experience
  • able to shape winning proposals and roadmaps
  • a customer-first mindset and passion for solving complex problems with elegant, scalable solutions
  • excellent communication and collaboration skills in cross-functional and cross-cultural environments
  • a history of growing leaders and fostering high-trust, high-performance teams
Job Responsibility
Job Responsibility
  • Lead and scale a high-performing, distributed platform engineering team through strong mentorship and inclusive leadership
  • define what great looks like—through reusable runbooks, technical standards, and nurturing a culture grounded in quality, belonging, and continuous learning
  • help clients modernize platforms, launch new infrastructure, and make better innovation investment decisions
  • ensure every solution is aligned with client goals and drives measurable value
  • own and evolve our delivery frameworks, platform engineering standards, and team operations
  • champion cloud-native development, DevOps and SRE best practices, and scalable architecture
  • partner with Sales, Partnerships, and Client Executives to shape and win new opportunities
  • translate client needs into technical solutions, delivery plans, and estimates
  • lead development of proposals, estimation, and pre-sales architecture discussions
  • develop reusable solution assets, infrastructure templates and case studies for future engagements
What we offer
What we offer
  • Remote work with flexible working hours
  • Modus Global Office Programme: on-demand access to private offices, meeting rooms, coworking spaces and business lounges in locations in over 120 countries
  • Employee Referral Program
  • Client Referral Program
  • Travel according to client or team needs
  • The chance to work side-by-side with thought leaders in emerging tech
  • Access to more than 12,000 courses with a licensed Coursera account
  • Possibility to obtain paid certification/courses if they align with company goals and are relevant to the employee's role
  • Fulltime
Read More
Arrow Right

Director SRE & Operations

Director SRE & Operations for E-business / Digital at PUMA in Herzogenaurach, Ge...
Location
Location
Germany , Herzogenaurach
Salary
Salary:
Not provided
about.puma.com Logo
Puma Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10–15 years of experience in technology operations, site reliability engineering, or platform engineering within large-scale digital or eCommerce environments
  • Proven track record owning platform reliability, availability, and operational performance for consumer-facing systems
  • Strong experience with cloud infrastructure, incident management, observability, and operational readiness in high-traffic, peak-driven environments
  • Demonstrated ability to embed SRE practices (SLOs, SLIs, incident response, automation) across engineering teams
  • Experienced leader of global operations or SRE teams, comfortable working in on-call and 24/7 operational models
  • Calm, decisive leader with a strong focus on stability, resilience, and continuous operational improvement
Job Responsibility
Job Responsibility
  • Leadership: Responsible for all aspects of the performance management and professional development of the team, including recruitment, development plans, providing constructive feedback, appraisals and exit processes
  • Foster a positive and inclusive team culture by actively engaging team members, promoting open communication, and implementing initiatives that enhance employee satisfaction and well-being
  • Compliance with and implementation of legal and operational requirements regarding occupational health and safety within your own area of responsibility
  • Global Site Reliability & Operations Strategy: Define and execute a global Site Reliability Engineering (SRE) and Technology Operations strategy aligned with PUMA’s D2C growth, peak trading demands, and omnichannel ambitions
  • Establish reliability, availability, performance, and scalability targets across all D2C platforms (eCommerce, in-store integrations, APIs, data platforms)
  • Own the end-to-end operational health of consumer-facing and business-critical platforms
  • Platform Reliability, Resilience & Performance: Drive a reliability-first mindset across engineering, embedding SRE principles such as SLIs, SLOs, SLAs, error budgets, and resilience-by-design
  • Ensure platforms are engineered to handle peak events (campaigns, drops, seasonal peaks) with minimal risk and rapid recovery
  • Lead incident management, major incident response, root cause analysis, and post-incident reviews with a strong focus on learning and prevention
  • Continuously improve platform observability, monitoring, alerting, and performance management
  • Fulltime
Read More
Arrow Right

Senior Director of Engineering, SRE

We are looking for a Senior Director of Site Reliability Engineering (SRE) to de...
Location
Location
United States
Salary
Salary:
186000.00 - 255000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Several years of Senior leadership experience in Site Reliability Engineering capacity
  • Deep knowledge of SRE principles and practices (SLIs/SLOs, error budgets, reliability economics)
  • Experience building self-service systems through platform engineering
  • Strong background in distributed systems and microservices
  • Production experience operating Kubernetes-based platforms
  • Solid understanding of cloud-native networking fundamentals
  • Experience running systems in multi-cloud environments (AWS and at least one of GCP or Azure)
  • Proven success scaling SRE practices across large engineering organizations
  • Demonstrated experience building, mentoring, and developing high-performing SRE teams
  • Ability to grow and sustain an inclusive, resilient engineering culture
Job Responsibility
Job Responsibility
  • Lead reliability and operational excellence across AlphaSense’s platforms and products
  • Scale SRE practices in a “you build it, you run it” engineering organization
  • Lead and grow a follow-the-sun SRE team across multiple time zones
  • Build, mentor, and develop high-performing SRE engineers
  • Own incident management, on-call operations, and post-incident learning
  • Cultivate an awareness and culture of reliability throughout the engineering organization
  • Set direction for observability and operational tooling
  • Enable teams to operate production systems safely and confidently
  • Embed reliability into the whole software delivery lifecycle in collaboration with Product, Platform, Cloud, and Security
  • Reduce systemic risk through toil reduction and continuous improvement
What we offer
What we offer
  • equity
  • a generous benefits program
  • Fulltime
Read More
Arrow Right

Platform Engineering Director

We are seeking an experienced Platform Engineering Director to manage and lead o...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years’ experience in software and platform/infrastructure engineering, including senior technical leadership
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience
  • Proven experience leading platform or infrastructure engineering teams, including managers and senior ICs, in distributed and matrixed environments
  • Deep expertise in AWS, Kubernetes, containerisation, infrastructure as code, and CI/CD/deployment automation
  • Experience designing, building, and operating developer platforms or internal PaaS
  • Knowledge of platform engineering patterns (e.g. golden paths, paved roads), service mesh, and API management
  • Strong background in production system architecture, capacity planning, performance optimisation, and incident leadership
  • Excellent communication skills with senior and executive stakeholders
Job Responsibility
Job Responsibility
  • Define and execute the platform engineering strategy aligned with business objectives and Infrastructure & Operations goals
  • Lead and manage the Platform Engineering team to high performance
  • Establish and govern best practices, standards, and architecture for platform services and production cloud infrastructure
  • Partner with engineering leadership and Infrastructure & Operations teams to define platform and production system requirements
  • Act as the final escalation point for complex platform engineering issues beyond Level 2 operations support
  • Oversee the design, development, and maintenance of developer platforms enabling application delivery, CI/CD, and deployment automation
  • Build and maintain platform infrastructure, cloud environments, and tooling in line with architectural standards and requirements
  • Design and operate scalable, highly available, and reliable production systems and services
  • Lead initiatives for infrastructure as code (IaC), observability, and monitoring across development and production environments
  • Provide expert-level technical leadership during complex platform, infrastructure, and production incidents
What we offer
What we offer
  • Flexible work options - Our hybrid policy allows employees to work from home up to 3 times per week
  • Health & Wellness support - Health and Life Insurance
  • Financial growth opportunities - Employees can become shareholders in Ledger as well as other financial benefits depending on your country of work
  • Commuter allowance - Ledger offers a commuter allowance to contribute to your preferred means of transportation
  • Learning & Development - A comprehensive suite of training solutions providing a personalised learning experience for every employee
  • Fulltime
Read More
Arrow Right

Director of Platform Engineering & Operations

NetApp is seeking a strategic and execution-oriented Director of Platform Engine...
Location
Location
United States , RTP
Salary
Salary:
199750.00 - 298100.00 USD / Year
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of progressive experience in infrastructure engineering and operations
  • 7+ years of leadership experience managing global, distributed teams at scale
  • Deep expertise in: Hybrid compute platforms (virtualization, containerization, public cloud IaaS/PaaS)
  • Enterprise storage technologies (block, file, object, hybrid architectures)
  • Global DDI services (enterprise DNS, DHCP, IPAM architectures)
  • Demonstrated experience implementing Infrastructure as Code and CI/CD-driven infrastructure delivery
  • Proven track record driving automation at scale across enterprise infrastructure
  • Strong experience with AI-Ops platforms, observability stacks, and operational analytics
  • Experience leading both engineering (build) and operations (run) functions within a unified organization
Job Responsibility
Job Responsibility
  • Define and execute the strategy for enterprise compute, storage, and DDI platforms across hybrid (on-prem and cloud) environments
  • Drive modernization of infrastructure services using IaC, GitOps, CI/CD automation, and policy-as-code frameworks
  • Lead the evolution toward self-service platform models with clear service catalogs, SLOs, and reliability metrics
  • Partner with executive stakeholders across IT, Security, Engineering, and Product to align platform capabilities with business priorities
  • Establish multi-year roadmaps for infrastructure transformation, cost optimization, resilience, and scalability
  • Oversee architecture, engineering, and lifecycle management of: On-prem and cloud-based compute platforms
  • On-prem and cloud-based storage platforms
  • Global DDI services (DNS, DHCP, IPAM)
  • Certificate lifecycle management
  • Standardize infrastructure patterns across data centers and public cloud providers
What we offer
What we offer
  • Health Insurance
  • Life Insurance
  • Retirement or Pension Plans
  • Paid Time Off
  • various Leave options
  • employee stock purchase plan
  • and/or restricted stocks (RSU’s)
  • Fulltime
Read More
Arrow Right