CrawlJobs Logo

Engineering Manager, Storage SRE

United States 212000.00 - 265000.00 USD / Year · Job Posted June 16, 2026
Apply Position
Job Link Share

Job Description

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every country across the globe. Every day, hosts offer unique stays and experiences that make it possible for guests to connect with communities in a more authentic way. The Community You Will Join: Airbnb's Storage SRE team owns the "how do you reliably run and operate databases at scale" problem. The team is at the heart of Airbnb's online data systems, building tooling, workflows, and automation to ensure mission-critical data services are reliable, secure, and performant. You will be joining a team that operates at genuine internet scale and has direct influence on the infrastructure decisions that power a global marketplace. The team has deep operational ownership of Airbnb's primary relational database fleet, covering the full lifecycle from provisioning and capacity planning to schema management, backup and recovery, disaster recovery, and client integration best practices. The team also improves the developer experience for engineers working with transactional data stores at scale, enabling engineers across Airbnb to work with high-traffic storage systems reliably and efficiently. Looking ahead, the team is actively expanding its tooling and operational model to support a new class of distributed database technology, partnering closely with storage infrastructure and platform teams to enable resilient adoption across Airbnb.

Job Responsibility

  • Own the Storage SRE technical roadmap across a 12+ month horizon, setting the direction for how the team deepens its operational model as it takes on new database technologies alongside its existing systems
  • Lead and grow a team of engineers by providing mentorship, timely feedback, and career development support to build a high-performing, inclusive team
  • Drive the generalization of cluster lifecycle, schema management, and observability tooling as the team broadens its database technology support
  • Partner with engineering teams across Airbnb as the primary expert on reliable database adoption, helping them work with mission-critical storage systems safely and efficiently at scale
  • Establish and uphold operational excellence standards covering on-call strategy, incident response, backup and disaster recovery, and systemic reliability improvements
  • Collaborate with storage infrastructure and platform teams to ensure Storage SRE's tooling and observability stay current as the broader storage platform evolves
  • Improve the developer experience for engineers working with high-traffic transactional storage systems
  • Drive performance, security, scalability, and availability initiatives across Airbnb's database systems
  • Communicate technical strategy and trade-offs clearly to engineers and senior leadership

Requirements

  • 9+ years of relevant industry experience in database infrastructure, storage systems, or site reliability engineering
  • 3+ years of engineering management experience leading SRE, infrastructure, or platform teams
  • Demonstrated track record of building high-performing teams by hiring strong engineers, developing talent, and maintaining team health through periods of change
  • Strong technical foundation with the ability to partner with technical leads on architectural decisions, roadmap tradeoffs, and delivery quality
  • Proven ability to lead a team through a technology transition while maintaining operational rigor on existing systems
  • Solid understanding of distributed systems, cloud infrastructure, and production database operations
  • Strong communicator able to cut through ambiguity and represent the team credibly to senior leadership

What we offer

  • bonus
  • equity
  • benefits
  • Employee Travel Credits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Engineering Manager, Storage SRE

8 matching positions

Engineering Manager, Storage

The Online Data organization ensures Airbnb customers are delighted in experienc...
Location
Location
United States
Salary
Salary:
204000.00 - 255000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of engineering management experience
  • 6+ years of relevant software development experience in a fast paced tech environment
  • Experience with building and operating distributed databases and services that are long-term and evolvable
  • Experience in organization design for a team that is scaling up
  • Expertise with a public cloud provider (AWS, GCP, Azure) and their Storage, VM, networking, Kubernetes, Security offerings
  • Excellent communication skills and the ability to work well within a team and with teams across the engineering organization
Job Responsibility
Job Responsibility
  • Lead a team of talented, diverse software engineers to build software to make database operations reliable and automated
  • Make the open-source database well-integrated with Airbnb’s Compute, Networking and Security infrastructure
  • Work with TL and team to define and execute on a vision and 3-year roadmap for the control plane area
  • Stay in touch with technical designs and decisions, be the sounding board
  • Synthesize technical information and represent the team with upper management
  • Align with ORM and SRE teams in Online Data on each team’s charter and how each team’s core capabilities fit together
  • Attract top talent, mentor individual contributors and manage their promotions and career
  • Nurture a culture of rigor and responsibly “moving fast” from design, through code review, to production
  • Represent Airbnb with open source communities and external alliance partners
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right

Storage Engineering Manager

We are seeking a seasoned Storage Engineering Manager with experience in the spe...
Location
Location
United States , San Jose; San Francisco; Bellevue
Salary
Salary:
297000.00 - 495000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in storage engineering with at least 5+ years in a management or lead role
  • Demonstrated experience leading a team of storage engineers and storage SREs on complex, cross-functional projects in a fast-paced startup environment
  • Extensive hands-on experience in designing, deploying, and maintaining distributed storage solutions in a CSP (Cloud Service Provider), NCP (Neo-Cloud provider), HPC-infrastructure integrator, or AI-infrastructure company
  • Experience with storage solutions serving storage volumes at a scale greater than 20PB
  • Strong project management skills, leading high-confidence planning, project execution, and delivery of team outcomes on schedule
  • Extensive experience with storage site reliability engineering
  • Experience with one or more of the following in an HPC or AI Infrastructure environment: Vast, DDN, Pure Storage, NetApp, Weka
  • Experience deploying CEPH at scale greater than 25PB
  • Experience in serving one or more of the following storage protocols: object storage (e.g., S3), block storage (e.g., iSCSI), or file storage (e.g., NFS, SMB, Lustre)
  • Professional individual contributor experience as a storage engineer or storage SRE
Job Responsibility
Job Responsibility
  • Grow/Hire, lead, and mentor a top-talent team of high-performing storage engineers delivering HPC, petabyte-scale storage solutions
  • Foster a high-velocity culture of innovation, technical excellence, and collaboration
  • Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members
  • Drive outcomes by managing project priorities, deadlines, and deliverables using Agile methodologies
  • Drive the technical vision and strategy for Lambda distributed storage solutions
  • Lead storage vendor selection criteria, vendor selection, and vendor relationship management (support, installation, scheduling, specification, procurement)
  • Manage team in storage lifecycle management (installation, cabling, capacity upgrades, service, RMA, updating both hardware and software components as needed)
  • Guide choices around optimization of storage pools, sharding, and tiering/caching strategies
  • Lead team in tasks related to multi-tenant security, tenant provisioning, metering integration, storage protocol interconnection, and customer data-migration
  • Guide Storage SREs in development of scripting and automation tools for configuration management, monitoring, and operational tasks
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan that we all actually use
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

As Platform Engineering Manager at Power Design, you'll lead the buildout of our...
Location
Location
United States , St Petersburg
Salary
Salary:
Not provided
powerdesigninc.us Logo
Power Design
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Education: Bachelor's degree in Computer Science, Computer Engineering, Information Systems, or a related field
  • equivalent professional experience considered
  • Experience: 7–10 years of progressive experience in infrastructure engineering, platform engineering, DevOps, or SRE — with meaningful time in both hands-on implementation and technical leadership
  • Preferred certifications: HashiCorp Terraform Associate, AWS/Azure Solutions Architect, CKA/CKAD, or equivalent cloud or platform engineering certifications
  • Hands-on production experience with at least one major cloud platform (Azure, AWS, GCP, or OCI)
  • breadth across multiple platforms strongly preferred
  • Demonstrated history of evaluating infrastructure decisions through a cloud-first lens, identifying when to leverage cloud services rather than defaulting to on-premises solutions
  • Hands-on expertise with Terraform or a comparable IaC framework
  • GitOps pipeline experience (GitHub Actions, Azure DevOps, GitLab CI, or similar)
  • Production experience implementing enterprise observability and AIOps tooling (Datadog, Dynatrace, New Relic, Prometheus/Grafana, or equivalent), including anomaly detection, event correlation, and automated remediation workflows
Job Responsibility
Job Responsibility
  • Design, build, and maintain automation for infrastructure provisioning, configuration, and lifecycle management — with security controls built in from the start
  • Lead the evaluation, selection, and implementation of Power Design's first enterprise observability and AIOps platform, owning the decision end-to-end from vendor assessment through production rollout
  • Develop and maintain observability tooling, dashboards, and automated remediation workflows covering metrics, logging, tracing, and alerting across cloud and on-premises environments
  • Build and enforce CI/CD pipelines for infrastructure and platform services using GitOps best practices
  • Continuously evaluate the infrastructure footprint and identify workloads where cloud migration would improve resilience, reduce complexity, or lower cost — and build the business case to act on it
  • Apply a security-first lens to every platform decision, including IAM/RBAC design, secrets management, Zero Trust implementation (Zscaler), and policy-as-code
  • Create self-service infrastructure workflows — provisioning automation, access workflows, and internal developer tooling — to reduce ticket volume and enable engineering teams to move faster
  • Leverage AI-assisted tooling for anomaly detection, event correlation, and operational insights to drive a proactive operations model
  • Establish and own design standards, architectural consistency, and IaC strategy across the Platform Engineering function
  • Provide technical leadership and mentorship to platform engineers
  • Fulltime
Read More
Arrow Right

Technical Architect

Lead the design, modernization, and implementation of scalable, secure, and resi...
Location
Location
United States , Armonk
Salary
Salary:
247319.00 - 250000.00 USD / Year
nytimes.com Logo
The New York Times
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent in Computer Science, Information Technology, Engineering or related and five (5) years of experience as a Consultant Architect, Virtualization Architect, Senior Cloud Architect or related
  • Five (5) years of experience must include utilizing Hybrid Cloud, AWS, Azure, Red Hat Linux, Terraform, Ansible, Python, VMware Cloud Foundation (VCF) Stack
Job Responsibility
Job Responsibility
  • Lead the design, modernization, and implementation of scalable, secure, and resilient hybrid cloud and containerized infrastructure platforms
  • Define and lead the technical architecture strategy for hybrid cloud, container orchestration (Kubernetes, RedHat OpenShift, VMware Tanzu), and virtualized environments (VMware, Nutanix, RedHat)
  • Architect secure and scalable infrastructure across private, public, and hybrid cloud ecosystems
  • Evaluate, design, and implement solutions for computing, storage, networking, identity, and availability zones across global regions
  • Design and implement Kubernetes, RedHat OpenShift clusters across multi-cloud and on-prem environments, including CI/CD integration, policy enforcement, and workload orchestration
  • Define governance, observability, and security patterns for containerized workloads
  • Lead Infrastructure-as-Code (IaC) initiatives using Terraform, Ansible, GitOps, GitHub, PowerShell, and Python
  • Enable self-service infrastructure capabilities through automation frameworks and developer platforms
  • Partner with DevSecOps, SRE, Infrastructure Operations, Security, and Datacenter Operation teams to scope, define, size, and execute application onboarding, modernization, and consolidation initiatives
  • Mentor engineering teams and influence enterprise architecture (EA) roadmaps
  • Fulltime
Read More
Arrow Right
New

Svp Of Infrastructure & Cloud Operations

Our client is a global game monetisation and payments platform, headquartered in...
Location
Location
United States , US Remote
Salary
Salary:
300000.00 - 325000.00 USD / Year
signifytechnology.com Logo
Signify Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 10 years of experience leading infrastructure and operations across both private and public cloud platforms — public-only experience will not be considered
  • Strong GCP experience
  • familiarity with multi-cloud environments essential
  • Deep expertise in SLA and SLO management — the team is measured on availability, stability, and performance
  • Proven leadership of DevOps, SRE, Networking, and Database Management functions with direct cross-disciplinary team responsibility
  • Demonstrated experience reporting directly to a CTO or CIO
  • Strong background in AI-powered automation, with hands-on experience implementing intelligent systems for monitoring, alerting, and incident resolution
  • Experience managing and developing global, internationally distributed teams across different time zones and cultures
  • Flexibility to work across time zones — their teams span up to 15 hours of difference (Malaysia, China, North America)
  • Willingness to travel to international office locations, including Kuala Lumpur and/or Baku
Job Responsibility
Job Responsibility
  • Define and execute the vision, mission, and strategic roadmap for global infrastructure and operations, aligned with business priorities and technology goals
  • Build and scale high-performing teams across DevOps, SRE, Networking, and Database disciplines
  • Oversee global infrastructure operations across multiple time zones and cultural environments
  • Manage hybrid and multi-cloud environments (GCP preferred), including compute, storage, network, and security
  • Develop and implement robust automation strategies using AI/ML to reduce toil, accelerate issue resolution, and improve system reliability
  • Lead initiatives in observability, CI/CD security, and proactive incident prevention
  • Ensure infrastructure is secure, compliant, and resilient, with robust business continuity and disaster recovery practices
  • Partner with internal stakeholders across Product, Engineering, and Security to enable product velocity and stability
  • Own the hiring, mentoring, and development of global infrastructure teams with a focus on continuous improvement
  • Develop and manage the infrastructure budget, focusing on cost optimisation and resource forecasting
What we offer
What we offer
  • 100% company-paid medical, dental, and vision plans
  • Unlimited flexible time off
  • Personalised career roadmap and professional development investment
  • High-impact, company-wide scope with genuine leadership visibility
  • A collaborative, globally diverse team culture with a strong focus on job satisfaction and growth
  • Fulltime
Read More
Arrow Right
New

Senior Azure Platform Engineer

We are seeking a Senior Azure Platform & Resiliency Engineer to design, build, a...
Location
Location
United States , Tucker
Salary
Salary:
99360.00 - 159900.00 USD / Year
gasoc.com Logo
Georgia System Operations
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in cloud infrastructure, DevOps, or SRE roles
  • Strong hands-on experience with Microsoft Azure architecture and services
  • Proven experience designing and implementing Azure landing zones
  • Deep expertise in Terraform (or equivalent IaC tools)
  • Strong understanding of Azure networking (VNet, peering, DNS, private access)
  • Experience with CI/CD pipelines (GitHub Actions, Azure DevOps)
  • Proficiency in scripting (Python, PowerShell, or similar)
Job Responsibility
Job Responsibility
  • Design and implement Azure Landing Zones aligned with Microsoft Cloud Adoption Framework (CAF)
  • Define and manage management group and subscription strategy, RBAC and identity models, Azure Policy and governance controls
  • Build and maintain shared services (networking, logging, identity)
  • Develop and maintain infrastructure using Terraform (preferred), Bicep, or ARM
  • Create reusable modules for networking, compute, storage and platform services
  • Integrate IaC into CI/CD pipelines (GitHub Actions or Azure DevOps)
  • Design high availability and multi-region architectures
  • Define and implement disaster recovery (DR) strategies (RTO/RPO)
  • Conduct failover testing and resilience validation
  • Establish and track SLIs/SLOs and reliability metrics
What we offer
What we offer
  • Comprehensive medical, dental, and vision coverage
  • Strong retirement program
  • Career development
  • Flexible work schedules
  • Fulltime
Read More
Arrow Right
New

Principal Site Reliability Engineer (Sovereign Cloud)

As a Principal Site Reliability Engineer, you will serve as the technical author...
Location
Location
Bulgaria , Sofia
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Infrastructure, SRE, or DevOps roles
  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Kubernetes Mastery: Expert-level experience (6+ years) managing production K8s workloads (preferably within GKE, but will also consider EKS)
  • Deep understanding of Networking, Storage, and RBAC
  • CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins)
  • Programming: Proficient in Python for systems programming and automation
  • Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment
  • Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated, strong sense of ownership, urgency, and drive
Job Responsibility
Job Responsibility
  • Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization
  • GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model
  • Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance
  • Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability
  • AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks
  • Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements
  • Participate in on-call rotations to support critical business and production systems
  • Fulltime
Read More
Arrow Right

Sr, Software Engineer, Cloud Storage

As a Software Engineer, you will play a key role in delivering an enterprise‑cla...
Location
Location
United States , Morrisville
Salary
Salary:
170000.00 - 220000.00 USD / Year
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ Years of Software Engineering/Development Experience
  • Strong experience in software design, development, and system-level architecture
  • Proficiency in programming languages, with Go, Python, C++, or C
  • Deep knowledge of Kubernetes
  • hands-on experience building or deploying micro-services using Docker and Kubernetes
  • Practical experience with public cloud providers such as GCP, Azure, or AWS
  • Solid understanding of data structures, algorithms, multithreading, distributed systems, and modern programming practices
  • Strong collaboration and communication skills (verbal and written)
  • Demonstrated ability to lead features or small teams independently
  • Quick learner with the ability to adapt to new technologies and complex systems
Job Responsibility
Job Responsibility
  • Design, develop, and test new product features involving complex and interdependent distributed systems
  • Deliver high‑quality, maintainable code across cloud‑native storage components
  • Independently drive feature development from design to completion
  • Participate in technical discussions within the team and across partner groups
  • Collaborate with cloud hyperscalers and internal stakeholders on solutions built for first party cloud native platforms
  • Work closely with SRE, Product Management, and cross-functional engineering teams to align on design, requirements, and execution
  • Contribute to design reviews, architectural discussions, and problem investigations
  • Mentor junior engineers in best practices and technical execution
  • Ensure solutions meet scalability, reliability, and performance goals for enterprise-class cloud storage systems
What we offer
What we offer
  • Health Insurance
  • Life Insurance
  • Retirement or Pension Plans
  • Paid Time Off
  • various Leave options
  • Performance-Based Incentives
  • employee stock purchase plan
  • restricted stocks (RSU’s)
  • Volunteer time off: 40 hours of paid volunteer time each year
  • Well-being: Employee Assistance Program, fitness, and mental health resources
  • Fulltime
Read More
Arrow Right