CrawlJobs Logo

Engineering Manager, SRE

United States, San Francisco 220000.00 - 260000.00 USD / Year · Job Posted January 30, 2026
Apply Position
Job Link Share

Job Description

Abridge’s services and engineering teams are in hyperscale mode, and multiplying rapidly with our customer base and new product launches. We are looking for a seasoned leader who can harness this growth across the organization through reliability and performance engineering, engineering velocity, software replatforming and rearchitecture, and application security . You’ll lead and build an extremely fast growing organization, iteratively scope and execute a company-wide application reliability roadmap, and lead development and improvement of SLOs across the entire company and spanning multi-region and multi-cloud. The combination of security, scale, uptime, and timeline requirements Abridge has has never been executed before in tech. This is a rapidly expanding role that sits at the intersection of AI, reliability engineering, security, and healthcare.

Job Responsibility

  • Visionary leadership: Scope, resource, evangelize, and execute a company-wide reliability and engineering velocity roadmap across environments and clouds, real-time streaming infrastructure under immense scale, compute as well as AI -at-edge infrastructure, and the most ambitious cloud security roadmap in the entire tech industry. Collaborate with department heads across product engineering, security, product management, commercial, and more to develop, align, and execute an extremely ambitious strategic roadmap
  • Gifted tactician: Work at the level of small tiger teams to unblock, enable, and drive execution and solutioning. Juggle several ambiguous and tricky problems at a time
  • Recruiter extraordinaire: Scale out your team to meet this roadmap - both ICs and managers. Attract top talent and hire quickly while maintaining a consistently high bar. Iterate on the hiring process, improve diversity and equity, retain and maximize the effectiveness of an extremely senior team
  • Mentor to the mentors: Develop their careers, create top-of-ladder development opportunities, and continuously raise the bar for your staff as well as your peers and leaders in their abilities and awareness. Earn their trust, lead by example, be a doctor rather than a judge for organizational and people challenges, and help establish and maintain a hivemind, de-siloed culture across all engineering pods

Requirements

  • 3 - 6+ years as a manager in rapidly growing organizations including at least 1 year as a manager of managers
  • Seeking an extremely challenging role that will push you beyond your limits, where failures are inevitable and not to be feared
  • Seeking a senior leadership role to develop people, environments, and impact - not ego, accolades, or ladder climbing
  • Able to ask for help, fail fast and admit defeat
  • get yourself and others out of their comfort zone
  • Track record of leading performance engineering including load test and chaos engineering, large scale distributed telemetry implementation, major architectural and software refactors, engineering velocity, and full stack development
  • Experience running production workloads in more than one cloud provider (at a time, or across your experience)
  • Experience managing workloads across containerized solutions, Kubernetes, and CNCF-approved tooling such as Argo, istio, OTel, and more
  • Thought leader in platform building, with a strong desire to represent Abridge as a reliability engineering leader in the tech industry
  • Genuine passion for Abridge’s mission to improve healthcare in America and across the world

What we offer

  • Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
  • Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
  • Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
  • Paid Parental Leave: Generous paid parental leave for all full-time employees
  • Family Forming Benefits: Resources and financial support to help you build your family
  • 401(k) Matching: Contribution matching to help invest in your future
  • Personal Device Allowance: Tax free funds for personal device usage
  • Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
  • Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
  • Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals
  • Sabbatical Leave: Paid Sabbatical Leave after 5 years of employment
  • Compensation and Equity: Competitive compensation and equity grants for full time employees

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Engineering Manager, SRE

8 matching positions

Senior Engineering Manager, SRE

Abridge’s services and engineering teams are in hyperscale mode, and multiplying...
Location
Location
United States , San Francisco; New York; Pittsburgh
Salary
Salary:
250000.00 - 290000.00 USD / Year
abridge.com Logo
Abridge
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years as a manager in rapidly growing organizations including at least 1 year as a manager of managers
  • Seeking an extremely challenging role that will push you beyond your limits, where failures are inevitable and not to be feared
  • Seeking a senior leadership role to develop people, environments, and impact - not ego, accolades, or ladder climbing
  • Able to ask for help, fail fast and admit defeat
  • get yourself and others out of their comfort zone
  • Track record of leading performance engineering including load test and chaos engineering, large scale distributed telemetry implementation, major architectural and software refactors, engineering velocity, and full stack development
  • Experience running production workloads in more than one cloud provider (at a time, or across your experience)
  • Experience managing workloads across containerized solutions, Kubernetes, and CNCF-approved tooling such as Argo, istio, OTel, and more
  • Thought leader in platform building, with a strong desire to represent Abridge as a reliability engineering leader in the tech industry
  • Genuine passion for Abridge’s mission to improve healthcare in America and across the world
Job Responsibility
Job Responsibility
  • Visionary leadership: Scope, resource, evangelize, and execute a company-wide reliability and engineering velocity roadmap across environments and clouds, real-time streaming infrastructure under immense scale, compute as well as AI -at-edge infrastructure, and the most ambitious cloud security roadmap in the entire tech industry
  • Collaborate with department heads across product engineering, security, product management, commercial, and more to develop, align, and execute an extremely ambitious strategic roadmap
  • Gifted tactician: Work at the level of small tiger teams to unblock, enable, and drive execution and solutioning
  • Juggle several ambiguous and tricky problems at a time
  • Recruiter extraordinaire: Scale out your team to meet this roadmap - both ICs and managers
  • Attract top talent and hire quickly while maintaining a consistently high bar
  • Iterate on the hiring process along with other leaders, improve diversity and equity, retain and maximize the effectiveness of an extremely senior team, and make strategic bets on the people that will take us to the next level
  • Mentor to the mentors: Develop their careers, create top-of-ladder development opportunities, and continuously raise the bar for your staff as well as your peers and leaders in their abilities and awareness
  • Earn their trust, lead by example, be a doctor rather than a judge for organizational and people challenges, and help establish and maintain a hivemind, de-siloed culture across all engineering pods
What we offer
What we offer
  • Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
  • Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
  • Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
  • Paid Parental Leave: Generous paid parental leave for all full-time employees
  • Family Forming Benefits: Resources and financial support to help you build your family
  • 401(k) Matching: Contribution matching to help invest in your future
  • Personal Device Allowance: Tax free funds for personal device usage
  • Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
  • Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
  • Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals
  • Fulltime
Read More
Arrow Right
New

Engineering Manager, Storage SRE

Airbnb was born in 2007 when two hosts welcomed three guests to their San Franci...
Location
Location
United States
Salary
Salary:
212000.00 - 265000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of relevant industry experience in database infrastructure, storage systems, or site reliability engineering
  • 3+ years of engineering management experience leading SRE, infrastructure, or platform teams
  • Demonstrated track record of building high-performing teams by hiring strong engineers, developing talent, and maintaining team health through periods of change
  • Strong technical foundation with the ability to partner with technical leads on architectural decisions, roadmap tradeoffs, and delivery quality
  • Proven ability to lead a team through a technology transition while maintaining operational rigor on existing systems
  • Solid understanding of distributed systems, cloud infrastructure, and production database operations
  • Strong communicator able to cut through ambiguity and represent the team credibly to senior leadership
Job Responsibility
Job Responsibility
  • Own the Storage SRE technical roadmap across a 12+ month horizon, setting the direction for how the team deepens its operational model as it takes on new database technologies alongside its existing systems
  • Lead and grow a team of engineers by providing mentorship, timely feedback, and career development support to build a high-performing, inclusive team
  • Drive the generalization of cluster lifecycle, schema management, and observability tooling as the team broadens its database technology support
  • Partner with engineering teams across Airbnb as the primary expert on reliable database adoption, helping them work with mission-critical storage systems safely and efficiently at scale
  • Establish and uphold operational excellence standards covering on-call strategy, incident response, backup and disaster recovery, and systemic reliability improvements
  • Collaborate with storage infrastructure and platform teams to ensure Storage SRE's tooling and observability stay current as the broader storage platform evolves
  • Improve the developer experience for engineers working with high-traffic transactional storage systems
  • Drive performance, security, scalability, and availability initiatives across Airbnb's database systems
  • Communicate technical strategy and trade-offs clearly to engineers and senior leadership
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right

Manager of Site Reliability Engineering (SRE)

The Manager of Site Reliability Engineering leads and develops a team of SRE pra...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Typically requires a bachelor's degree and 7 years of experience in a technology and/or software engineering role or an equivalent combination
  • Proven experience working in large, complex enterprise environments (Fortune 500 or equivalent)
  • Strong understanding and demonstrated implementation of Site Reliability Engineering (SRE) principles at scale
  • Hands-on experience with infrastructure-as-code (IaC) tools such as Terraform, and ArgoCD
  • In-depth knowledge and practical experience with CI/CD pipelines and automation of software delivery
  • Championing DevOps practices and embedding reliability early in the SDLC
  • Significant hands-on experience in Site Reliability Engineering or related roles focused on cloud infrastructure reliability
  • Strong software engineering background with proficiency in infrastructure-as-code tools (e.g., Terraform, ArgoCD) and CI/CD automation
  • Deep knowledge of cloud platforms, specifically Google Cloud Platform (GCP), Kubernetes, container orchestration, and cloud-native architecture
  • Familiarity with monitoring and observability tools such as Dynatrace, Datadog, or equivalents
Job Responsibility
Job Responsibility
  • Lead, mentor, and grow a high-performing team of Site Reliability Engineers, fostering a culture of ownership, continuous improvement, and operational excellence
  • Implement and champion Site Reliability Engineering principles and DevOps best practices within the team to ensure service reliability, availability, and performance
  • Define and track key SRE metrics such as service uptime, incident response and resolution times
  • Drive automation efforts including CI/CD pipeline enhancements, infrastructure-as-code practices, and self-service infrastructure provisioning to increase deployment velocity while reducing manual toil
  • Own and continuously improve observability practices including system monitoring, logging, alerting, and diagnostics to ensure rapid issue detection and resolution
  • Participate in incident response processes including incident management, root cause analysis, post-mortems, and continuous improvement to enhance system resilience
  • Partner closely with software engineering, product management, architecture, and security teams to embed reliability and security early in the software development lifecycle (SDLC)
  • Oversee the management and scalability of cloud infrastructure environments, primarily on Google Cloud Platform (GCP), with a focus on Kubernetes, container orchestration, and hybrid cloud integrations
  • Advocate for and apply best practices in performance tuning, capacity planning, and system design for high availability
  • Develop and execute a long-term roadmap for our hybrid cloud platform, aligning with evolving business objectives and technology trends
What we offer
What we offer
  • comprehensive benefit plans and programs designed to support your health and wellness, provide income protection and build financial security for your retirement
  • Fulltime
Read More
Arrow Right

Engineering Manager, Production Engineering

We're looking for a hands-on Engineering Manager to lead our Production Engineer...
Location
Location
United States , San Francisco
Salary
Salary:
209000.00 - 253000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software or infrastructure engineering experience, with at least 1–2 years in an engineering management or tech lead role
  • Strong SRE or production engineering background — hands-on experience with incident management, SLO frameworks, runbooks, and on-call operations
  • Solid coding ability
  • comfortable writing production-grade code in Go, Python, or similar languages to build tooling and automation
  • Experience working with or embedding into cross-functional product teams, and influencing engineering decisions across organizational boundaries
  • Familiarity with container orchestration and cloud-native infrastructure — Kubernetes, distributed systems, and cloud service architectures
  • Strong communication skills — able to clearly represent technical risk and operational status to both engineering peers and business stakeholders
Job Responsibility
Job Responsibility
  • Leading and growing a team of SREs embedded within Crusoe's AI product areas, setting technical direction and fostering a culture of ownership and continuous improvement
  • Contributing as an IC — reviewing code, building tooling, and driving automation to reduce toil and improve the reliability and scalability of production services
  • Owning SLA/SLO performance, incident response, and on-call health for service offerings
  • leading blameless post-mortems and driving systemic remediation
  • Partnering with embedded product and platform engineering teams to influence infrastructure design, observability strategy, and operational readiness for new and existing services
  • Defining and tracking reliability, performance, and operational maturity metrics across the team
  • translating data into prioritized roadmap investments
  • Serving as a technical escalation point for high-severity production incidents affecting enterprise customers, and collaborating with Cloud Support and Customer Success on resolution and communication
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Engineering Manager, Infrastructure Engineering

This is not a traditional SRE or DevOps role. Whatnot's Reliability Engineering ...
Location
Location
Poland , Kraków
Salary
Salary:
Not provided
whatnot.com Logo
Whatnot
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in infrastructure or platform engineering
  • 5+ years managing engineering teams
  • Experience leading managers or multiple teams a plus
  • Proven track record building and operating large-scale distributed systems with strong reliability, observability, and incident response practices
  • Deep technical grounding in one or more of: SLO design, monitoring/alerting, incident tooling, traffic control mechanisms, load and chaos testing, or platform engineering
  • Experience leading teams that ship developer-facing platforms, frameworks, or internal tools
  • Strong software engineering fundamentals
  • Demonstrated ability to guide teams through complex system challenges, large-scale migrations, and longer-term reliability initiatives
  • Exceptional communication and leadership skills
  • A passion for enabling teams to build fast while building safely through well-designed tooling and proactive detection mechanisms
Job Responsibility
Job Responsibility
  • Lead and mentor a team of highly skilled software engineers, supporting their technical growth, execution, and long-term career development
  • Set technical direction and quality standards for the team while empowering senior ICs to own design and architecture decisions
  • Develop and execute the strategic roadmap for reliability engineering at Whatnot
  • Build and operationalize best practices that empower product and platform teams to design and run reliable systems
  • Own the strategic roadmap for reliability tooling, including incident response systems, SLO measurement platforms, and developer-facing reliability libraries
  • Lead the team in designing and building traffic control systems as reusable platform components
  • Lead the design and execution of load testing at scale
  • Drive continuous improvement in incident detection and mitigation
  • Collaborate with cross-functional teams to influence product and architectural decisions that improve overall reliability and customer impact
  • Partner with Infrastructure and Engineering leadership to shape reliability strategy and investment priorities across the organization
  • Fulltime
Read More
Arrow Right

Manager / Sr Manager, Engineering (AI Posture)

Location
Location
United States , Santa Clara
Salary
Salary:
185000.00 - 298000.00 USD / Year
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-7 years of experience managing software engineering teams within a large-scale organization
  • 5+ years of hands-on software engineering experience with a strong systems-level foundation
  • Proven ability to plan, execute, and deliver complex roadmaps with high predictability, owning distributed cloud products end-to-end
  • Demonstrated experience leading teams through the delivery of complex, data-rich web applications with a focus on performance and usability
  • Demonstrated experience designing and operating large-scale cloud architectures on platforms such as GCP, AWS, or Azure
  • Strong collaboration skills with a track record of aligning cross-disciplinary teams around shared objectives
Job Responsibility
Job Responsibility
  • Build, mentor, and lead a high-performing software engineering team, fostering a culture of empowerment and driving both individual growth and collective impact
  • Partner closely with Product Management and cross-functional teams (Infrastructure, UX, SRE & QA) to define priorities and shape multi-quarter product roadmaps, ensuring alignment across all stakeholders
  • Own the end-to-end software development lifecycle, translating product strategy into executable plans and ensuring consistent, high-quality, on-time delivery
  • Provide architectural leadership for scalable, distributed systems guiding the design and implementation of high-throughput, cloud-native applications
  • Drive production readiness by enforcing best practices around deployment, observability, reliability, and runtime stability, focusing on the details to ensure operational excellence
  • Align stakeholders across business units through clear communication of technical strategy, trade-offs, priorities, risks, and execution plans
  • Engage directly with strategic customers to lead technical deep dives and architecture reviews, and to influence future product direction
  • Foster a culture of high engineering standards, accountability, and continuous improvement, with a strong emphasis on quality and security
  • Fulltime
Read More
Arrow Right

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
  • 3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
  • Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
  • Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
  • Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
  • Create a culture of operational excellence, continuous improvement, and psychological safety within the team
  • Conduct regular 1:1s, performance reviews, and career development conversations
  • Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
  • Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
  • Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
  • Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
  • Ensure alignment between team objectives and broader engineering and business goals
  • Advocate for and allocate resources toward reducing technical debt and improving developer experience
  • Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
  • 3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
  • Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
  • Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
  • Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
  • Create a culture of operational excellence, continuous improvement, and psychological safety within the team
  • Conduct regular 1:1s, performance reviews, and career development conversations
  • Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
  • Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
  • Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
  • Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
  • Ensure alignment between team objectives and broader engineering and business goals
  • Advocate for and allocate resources toward reducing technical debt and improving developer experience
  • Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right