Engineering Manager, SRE Job at Abridge (San Francisco)

Senior Engineering Manager, SRE

Abridge’s services and engineering teams are in hyperscale mode, and multiplying...

Location

United States , San Francisco; New York; Pittsburgh

Salary:

250000.00 - 290000.00 USD / Year

Abridge

Expiration Date

Until further notice

Requirements

6+ years as a manager in rapidly growing organizations including at least 1 year as a manager of managers
Seeking an extremely challenging role that will push you beyond your limits, where failures are inevitable and not to be feared
Seeking a senior leadership role to develop people, environments, and impact - not ego, accolades, or ladder climbing
Able to ask for help, fail fast and admit defeat
get yourself and others out of their comfort zone
Track record of leading performance engineering including load test and chaos engineering, large scale distributed telemetry implementation, major architectural and software refactors, engineering velocity, and full stack development
Experience running production workloads in more than one cloud provider (at a time, or across your experience)
Experience managing workloads across containerized solutions, Kubernetes, and CNCF-approved tooling such as Argo, istio, OTel, and more
Thought leader in platform building, with a strong desire to represent Abridge as a reliability engineering leader in the tech industry
Genuine passion for Abridge’s mission to improve healthcare in America and across the world

Job Responsibility

Visionary leadership: Scope, resource, evangelize, and execute a company-wide reliability and engineering velocity roadmap across environments and clouds, real-time streaming infrastructure under immense scale, compute as well as AI -at-edge infrastructure, and the most ambitious cloud security roadmap in the entire tech industry
Collaborate with department heads across product engineering, security, product management, commercial, and more to develop, align, and execute an extremely ambitious strategic roadmap
Gifted tactician: Work at the level of small tiger teams to unblock, enable, and drive execution and solutioning
Juggle several ambiguous and tricky problems at a time
Recruiter extraordinaire: Scale out your team to meet this roadmap - both ICs and managers
Attract top talent and hire quickly while maintaining a consistently high bar
Iterate on the hiring process along with other leaders, improve diversity and equity, retain and maximize the effectiveness of an extremely senior team, and make strategic bets on the people that will take us to the next level
Mentor to the mentors: Develop their careers, create top-of-ladder development opportunities, and continuously raise the bar for your staff as well as your peers and leaders in their abilities and awareness
Earn their trust, lead by example, be a doctor rather than a judge for organizational and people challenges, and help establish and maintain a hivemind, de-siloed culture across all engineering pods

What we offer

Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
Paid Parental Leave: Generous paid parental leave for all full-time employees
Family Forming Benefits: Resources and financial support to help you build your family
401(k) Matching: Contribution matching to help invest in your future
Personal Device Allowance: Tax free funds for personal device usage
Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals

Fulltime

New

Engineering Manager, Storage SRE

Airbnb was born in 2007 when two hosts welcomed three guests to their San Franci...

Location

United States

Salary:

212000.00 - 265000.00 USD / Year

Airbnb

Expiration Date

Until further notice

Requirements

9+ years of relevant industry experience in database infrastructure, storage systems, or site reliability engineering
3+ years of engineering management experience leading SRE, infrastructure, or platform teams
Demonstrated track record of building high-performing teams by hiring strong engineers, developing talent, and maintaining team health through periods of change
Strong technical foundation with the ability to partner with technical leads on architectural decisions, roadmap tradeoffs, and delivery quality
Proven ability to lead a team through a technology transition while maintaining operational rigor on existing systems
Solid understanding of distributed systems, cloud infrastructure, and production database operations
Strong communicator able to cut through ambiguity and represent the team credibly to senior leadership

Job Responsibility

Own the Storage SRE technical roadmap across a 12+ month horizon, setting the direction for how the team deepens its operational model as it takes on new database technologies alongside its existing systems
Lead and grow a team of engineers by providing mentorship, timely feedback, and career development support to build a high-performing, inclusive team
Drive the generalization of cluster lifecycle, schema management, and observability tooling as the team broadens its database technology support
Partner with engineering teams across Airbnb as the primary expert on reliable database adoption, helping them work with mission-critical storage systems safely and efficiently at scale
Establish and uphold operational excellence standards covering on-call strategy, incident response, backup and disaster recovery, and systemic reliability improvements
Collaborate with storage infrastructure and platform teams to ensure Storage SRE's tooling and observability stay current as the broader storage platform evolves
Improve the developer experience for engineers working with high-traffic transactional storage systems
Drive performance, security, scalability, and availability initiatives across Airbnb's database systems
Communicate technical strategy and trade-offs clearly to engineers and senior leadership

What we offer

bonus
equity
benefits
Employee Travel Credits

Fulltime

Manager of Site Reliability Engineering (SRE)

The Manager of Site Reliability Engineering leads and develops a team of SRE pra...

Location

United States , Birmingham

Salary:

Not provided

Genuine Parts Company

Expiration Date

Until further notice

Requirements

Typically requires a bachelor's degree and 7 years of experience in a technology and/or software engineering role or an equivalent combination
Proven experience working in large, complex enterprise environments (Fortune 500 or equivalent)
Strong understanding and demonstrated implementation of Site Reliability Engineering (SRE) principles at scale
Hands-on experience with infrastructure-as-code (IaC) tools such as Terraform, and ArgoCD
In-depth knowledge and practical experience with CI/CD pipelines and automation of software delivery
Championing DevOps practices and embedding reliability early in the SDLC
Significant hands-on experience in Site Reliability Engineering or related roles focused on cloud infrastructure reliability
Strong software engineering background with proficiency in infrastructure-as-code tools (e.g., Terraform, ArgoCD) and CI/CD automation
Deep knowledge of cloud platforms, specifically Google Cloud Platform (GCP), Kubernetes, container orchestration, and cloud-native architecture
Familiarity with monitoring and observability tools such as Dynatrace, Datadog, or equivalents

Job Responsibility

Lead, mentor, and grow a high-performing team of Site Reliability Engineers, fostering a culture of ownership, continuous improvement, and operational excellence
Implement and champion Site Reliability Engineering principles and DevOps best practices within the team to ensure service reliability, availability, and performance
Define and track key SRE metrics such as service uptime, incident response and resolution times
Drive automation efforts including CI/CD pipeline enhancements, infrastructure-as-code practices, and self-service infrastructure provisioning to increase deployment velocity while reducing manual toil
Own and continuously improve observability practices including system monitoring, logging, alerting, and diagnostics to ensure rapid issue detection and resolution
Participate in incident response processes including incident management, root cause analysis, post-mortems, and continuous improvement to enhance system resilience
Partner closely with software engineering, product management, architecture, and security teams to embed reliability and security early in the software development lifecycle (SDLC)
Oversee the management and scalability of cloud infrastructure environments, primarily on Google Cloud Platform (GCP), with a focus on Kubernetes, container orchestration, and hybrid cloud integrations
Advocate for and apply best practices in performance tuning, capacity planning, and system design for high availability
Develop and execute a long-term roadmap for our hybrid cloud platform, aligning with evolving business objectives and technology trends

What we offer

comprehensive benefit plans and programs designed to support your health and wellness, provide income protection and build financial security for your retirement

Fulltime

Engineering Manager, Production Engineering

We're looking for a hands-on Engineering Manager to lead our Production Engineer...

Location

United States , San Francisco

Salary:

209000.00 - 253000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

5+ years of software or infrastructure engineering experience, with at least 1–2 years in an engineering management or tech lead role
Strong SRE or production engineering background — hands-on experience with incident management, SLO frameworks, runbooks, and on-call operations
Solid coding ability
comfortable writing production-grade code in Go, Python, or similar languages to build tooling and automation
Experience working with or embedding into cross-functional product teams, and influencing engineering decisions across organizational boundaries
Familiarity with container orchestration and cloud-native infrastructure — Kubernetes, distributed systems, and cloud service architectures
Strong communication skills — able to clearly represent technical risk and operational status to both engineering peers and business stakeholders

Job Responsibility

Leading and growing a team of SREs embedded within Crusoe's AI product areas, setting technical direction and fostering a culture of ownership and continuous improvement
Contributing as an IC — reviewing code, building tooling, and driving automation to reduce toil and improve the reliability and scalability of production services
Owning SLA/SLO performance, incident response, and on-call health for service offerings
leading blameless post-mortems and driving systemic remediation
Partnering with embedded product and platform engineering teams to influence infrastructure design, observability strategy, and operational readiness for new and existing services
Defining and tracking reliability, performance, and operational maturity metrics across the team
translating data into prioritized roadmap investments
Serving as a technical escalation point for high-severity production incidents affecting enterprise customers, and collaborating with Cloud Support and Customer Success on resolution and communication

What we offer

Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement

Fulltime

Engineering Manager, Infrastructure Engineering

This is not a traditional SRE or DevOps role. Whatnot's Reliability Engineering ...

Location

Poland , Kraków

Salary:

Not provided

Whatnot

Expiration Date

Until further notice

Requirements

10+ years of experience in infrastructure or platform engineering
5+ years managing engineering teams
Experience leading managers or multiple teams a plus
Proven track record building and operating large-scale distributed systems with strong reliability, observability, and incident response practices
Deep technical grounding in one or more of: SLO design, monitoring/alerting, incident tooling, traffic control mechanisms, load and chaos testing, or platform engineering
Experience leading teams that ship developer-facing platforms, frameworks, or internal tools
Strong software engineering fundamentals
Demonstrated ability to guide teams through complex system challenges, large-scale migrations, and longer-term reliability initiatives
Exceptional communication and leadership skills
A passion for enabling teams to build fast while building safely through well-designed tooling and proactive detection mechanisms

Job Responsibility

Lead and mentor a team of highly skilled software engineers, supporting their technical growth, execution, and long-term career development
Set technical direction and quality standards for the team while empowering senior ICs to own design and architecture decisions
Develop and execute the strategic roadmap for reliability engineering at Whatnot
Build and operationalize best practices that empower product and platform teams to design and run reliable systems
Own the strategic roadmap for reliability tooling, including incident response systems, SLO measurement platforms, and developer-facing reliability libraries
Lead the team in designing and building traffic control systems as reusable platform components
Lead the design and execution of load testing at scale
Drive continuous improvement in incident detection and mitigation
Collaborate with cross-functional teams to influence product and architectural decisions that improve overall reliability and customer impact
Partner with Infrastructure and Engineering leadership to shape reliability strategy and investment priorities across the organization

Fulltime

Manager / Sr Manager, Engineering (AI Posture)

Location

United States , Santa Clara

Salary:

185000.00 - 298000.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

5-7 years of experience managing software engineering teams within a large-scale organization
5+ years of hands-on software engineering experience with a strong systems-level foundation
Proven ability to plan, execute, and deliver complex roadmaps with high predictability, owning distributed cloud products end-to-end
Demonstrated experience leading teams through the delivery of complex, data-rich web applications with a focus on performance and usability
Demonstrated experience designing and operating large-scale cloud architectures on platforms such as GCP, AWS, or Azure
Strong collaboration skills with a track record of aligning cross-disciplinary teams around shared objectives

Job Responsibility

Build, mentor, and lead a high-performing software engineering team, fostering a culture of empowerment and driving both individual growth and collective impact
Partner closely with Product Management and cross-functional teams (Infrastructure, UX, SRE & QA) to define priorities and shape multi-quarter product roadmaps, ensuring alignment across all stakeholders
Own the end-to-end software development lifecycle, translating product strategy into executable plans and ensuring consistent, high-quality, on-time delivery
Provide architectural leadership for scalable, distributed systems guiding the design and implementation of high-throughput, cloud-native applications
Drive production readiness by enforcing best practices around deployment, observability, reliability, and runtime stability, focusing on the details to ensure operational excellence
Align stakeholders across business units through clear communication of technical strategy, trade-offs, priorities, risks, and execution plans
Engage directly with strategic customers to lead technical deep dives and architecture reviews, and to influence future product direction
Foster a culture of high engineering standards, accountability, and continuous improvement, with a strong emphasis on quality and security

Fulltime

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...

Location

France , Paris

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions

Job Responsibility

Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
Create a culture of operational excellence, continuous improvement, and psychological safety within the team
Conduct regular 1:1s, performance reviews, and career development conversations
Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
Ensure alignment between team objectives and broader engineering and business goals
Advocate for and allocate resources toward reducing technical debt and improving developer experience
Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement

What we offer

Free comprehensive health insurance for you and your children
Parent Care Program: receive one additional month of leave on top of the legal parental leave
Free mental health and coaching services through our partner Moka.care
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
Work Council subsidy to refund part of sport club membership or creative class
Up to 14 days of RTT
A subsidy from the work council to refund part of the membership to a sport club or a creative class
Lunch voucher with Swile card

Fulltime

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...

Location

Germany , Berlin

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions

Job Responsibility

Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
Create a culture of operational excellence, continuous improvement, and psychological safety within the team
Conduct regular 1:1s, performance reviews, and career development conversations
Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
Ensure alignment between team objectives and broader engineering and business goals
Advocate for and allocate resources toward reducing technical debt and improving developer experience
Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement

What we offer

Free comprehensive health insurance for you and your children
Parent Care Program: receive one additional month of leave on top of the legal parental leave
Free mental health and coaching services through our partner Moka.care
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
Work Council subsidy to refund part of sport club membership or creative class
Up to 14 days of RTT
A subsidy from the work council to refund part of the membership to a sport club or a creative class
Lunch voucher with Swile card

Fulltime

Select Country

Engineering Manager, SRE

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?