Engineering Manager, Storage SRE Job at Airbnb

Engineering Manager, Storage

The Online Data organization ensures Airbnb customers are delighted in experienc...

Location

United States

Salary:

204000.00 - 255000.00 USD / Year

Airbnb

Expiration Date

Until further notice

Requirements

3+ years of engineering management experience
6+ years of relevant software development experience in a fast paced tech environment
Experience with building and operating distributed databases and services that are long-term and evolvable
Experience in organization design for a team that is scaling up
Expertise with a public cloud provider (AWS, GCP, Azure) and their Storage, VM, networking, Kubernetes, Security offerings
Excellent communication skills and the ability to work well within a team and with teams across the engineering organization

Job Responsibility

Lead a team of talented, diverse software engineers to build software to make database operations reliable and automated
Make the open-source database well-integrated with Airbnb’s Compute, Networking and Security infrastructure
Work with TL and team to define and execute on a vision and 3-year roadmap for the control plane area
Stay in touch with technical designs and decisions, be the sounding board
Synthesize technical information and represent the team with upper management
Align with ORM and SRE teams in Online Data on each team’s charter and how each team’s core capabilities fit together
Attract top talent, mentor individual contributors and manage their promotions and career
Nurture a culture of rigor and responsibly “moving fast” from design, through code review, to production
Represent Airbnb with open source communities and external alliance partners

What we offer

bonus
equity
benefits
Employee Travel Credits

Fulltime

Storage Engineering Manager

We are seeking a seasoned Storage Engineering Manager with experience in the spe...

Location

United States , San Jose; San Francisco; Bellevue

Salary:

297000.00 - 495000.00 USD / Year

Lambda

Expiration Date

Until further notice

Requirements

10+ years of experience in storage engineering with at least 5+ years in a management or lead role
Demonstrated experience leading a team of storage engineers and storage SREs on complex, cross-functional projects in a fast-paced startup environment
Extensive hands-on experience in designing, deploying, and maintaining distributed storage solutions in a CSP (Cloud Service Provider), NCP (Neo-Cloud provider), HPC-infrastructure integrator, or AI-infrastructure company
Experience with storage solutions serving storage volumes at a scale greater than 20PB
Strong project management skills, leading high-confidence planning, project execution, and delivery of team outcomes on schedule
Extensive experience with storage site reliability engineering
Experience with one or more of the following in an HPC or AI Infrastructure environment: Vast, DDN, Pure Storage, NetApp, Weka
Experience deploying CEPH at scale greater than 25PB
Experience in serving one or more of the following storage protocols: object storage (e.g., S3), block storage (e.g., iSCSI), or file storage (e.g., NFS, SMB, Lustre)
Professional individual contributor experience as a storage engineer or storage SRE

Job Responsibility

Grow/Hire, lead, and mentor a top-talent team of high-performing storage engineers delivering HPC, petabyte-scale storage solutions
Foster a high-velocity culture of innovation, technical excellence, and collaboration
Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members
Drive outcomes by managing project priorities, deadlines, and deliverables using Agile methodologies
Drive the technical vision and strategy for Lambda distributed storage solutions
Lead storage vendor selection criteria, vendor selection, and vendor relationship management (support, installation, scheduling, specification, procurement)
Manage team in storage lifecycle management (installation, cabling, capacity upgrades, service, RMA, updating both hardware and software components as needed)
Guide choices around optimization of storage pools, sharding, and tiering/caching strategies
Lead team in tasks related to multi-tenant security, tenant provisioning, metering integration, storage protocol interconnection, and customer data-migration
Guide Storage SREs in development of scripting and automation tools for configuration management, monitoring, and operational tasks

What we offer

Generous cash & equity compensation
Health, dental, and vision coverage for you and your dependents
Wellness and commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible paid time off plan that we all actually use

Fulltime

Platform Engineering Manager

As Platform Engineering Manager at Power Design, you'll lead the buildout of our...

Location

United States , St Petersburg

Salary:

Not provided

Power Design

Expiration Date

Until further notice

Requirements

Education: Bachelor's degree in Computer Science, Computer Engineering, Information Systems, or a related field
equivalent professional experience considered
Experience: 7–10 years of progressive experience in infrastructure engineering, platform engineering, DevOps, or SRE — with meaningful time in both hands-on implementation and technical leadership
Preferred certifications: HashiCorp Terraform Associate, AWS/Azure Solutions Architect, CKA/CKAD, or equivalent cloud or platform engineering certifications
Hands-on production experience with at least one major cloud platform (Azure, AWS, GCP, or OCI)
breadth across multiple platforms strongly preferred
Demonstrated history of evaluating infrastructure decisions through a cloud-first lens, identifying when to leverage cloud services rather than defaulting to on-premises solutions
Hands-on expertise with Terraform or a comparable IaC framework
GitOps pipeline experience (GitHub Actions, Azure DevOps, GitLab CI, or similar)
Production experience implementing enterprise observability and AIOps tooling (Datadog, Dynatrace, New Relic, Prometheus/Grafana, or equivalent), including anomaly detection, event correlation, and automated remediation workflows

Job Responsibility

Design, build, and maintain automation for infrastructure provisioning, configuration, and lifecycle management — with security controls built in from the start
Lead the evaluation, selection, and implementation of Power Design's first enterprise observability and AIOps platform, owning the decision end-to-end from vendor assessment through production rollout
Develop and maintain observability tooling, dashboards, and automated remediation workflows covering metrics, logging, tracing, and alerting across cloud and on-premises environments
Build and enforce CI/CD pipelines for infrastructure and platform services using GitOps best practices
Continuously evaluate the infrastructure footprint and identify workloads where cloud migration would improve resilience, reduce complexity, or lower cost — and build the business case to act on it
Apply a security-first lens to every platform decision, including IAM/RBAC design, secrets management, Zero Trust implementation (Zscaler), and policy-as-code
Create self-service infrastructure workflows — provisioning automation, access workflows, and internal developer tooling — to reduce ticket volume and enable engineering teams to move faster
Leverage AI-assisted tooling for anomaly detection, event correlation, and operational insights to drive a proactive operations model
Establish and own design standards, architectural consistency, and IaC strategy across the Platform Engineering function
Provide technical leadership and mentorship to platform engineers

Fulltime

Technical Architect

Lead the design, modernization, and implementation of scalable, secure, and resi...

Location

United States , Armonk

Salary:

247319.00 - 250000.00 USD / Year

The New York Times

Expiration Date

Until further notice

Requirements

Bachelor's degree or equivalent in Computer Science, Information Technology, Engineering or related and five (5) years of experience as a Consultant Architect, Virtualization Architect, Senior Cloud Architect or related
Five (5) years of experience must include utilizing Hybrid Cloud, AWS, Azure, Red Hat Linux, Terraform, Ansible, Python, VMware Cloud Foundation (VCF) Stack

Job Responsibility

Lead the design, modernization, and implementation of scalable, secure, and resilient hybrid cloud and containerized infrastructure platforms
Define and lead the technical architecture strategy for hybrid cloud, container orchestration (Kubernetes, RedHat OpenShift, VMware Tanzu), and virtualized environments (VMware, Nutanix, RedHat)
Architect secure and scalable infrastructure across private, public, and hybrid cloud ecosystems
Evaluate, design, and implement solutions for computing, storage, networking, identity, and availability zones across global regions
Design and implement Kubernetes, RedHat OpenShift clusters across multi-cloud and on-prem environments, including CI/CD integration, policy enforcement, and workload orchestration
Define governance, observability, and security patterns for containerized workloads
Lead Infrastructure-as-Code (IaC) initiatives using Terraform, Ansible, GitOps, GitHub, PowerShell, and Python
Enable self-service infrastructure capabilities through automation frameworks and developer platforms
Partner with DevSecOps, SRE, Infrastructure Operations, Security, and Datacenter Operation teams to scope, define, size, and execute application onboarding, modernization, and consolidation initiatives
Mentor engineering teams and influence enterprise architecture (EA) roadmaps

Fulltime

New

Svp Of Infrastructure & Cloud Operations

Our client is a global game monetisation and payments platform, headquartered in...

Location

United States , US Remote

Salary:

300000.00 - 325000.00 USD / Year

Signify Technology

Expiration Date

Until further notice

Requirements

Minimum 10 years of experience leading infrastructure and operations across both private and public cloud platforms — public-only experience will not be considered
Strong GCP experience
familiarity with multi-cloud environments essential
Deep expertise in SLA and SLO management — the team is measured on availability, stability, and performance
Proven leadership of DevOps, SRE, Networking, and Database Management functions with direct cross-disciplinary team responsibility
Demonstrated experience reporting directly to a CTO or CIO
Strong background in AI-powered automation, with hands-on experience implementing intelligent systems for monitoring, alerting, and incident resolution
Experience managing and developing global, internationally distributed teams across different time zones and cultures
Flexibility to work across time zones — their teams span up to 15 hours of difference (Malaysia, China, North America)
Willingness to travel to international office locations, including Kuala Lumpur and/or Baku

Job Responsibility

Define and execute the vision, mission, and strategic roadmap for global infrastructure and operations, aligned with business priorities and technology goals
Build and scale high-performing teams across DevOps, SRE, Networking, and Database disciplines
Oversee global infrastructure operations across multiple time zones and cultural environments
Manage hybrid and multi-cloud environments (GCP preferred), including compute, storage, network, and security
Develop and implement robust automation strategies using AI/ML to reduce toil, accelerate issue resolution, and improve system reliability
Lead initiatives in observability, CI/CD security, and proactive incident prevention
Ensure infrastructure is secure, compliant, and resilient, with robust business continuity and disaster recovery practices
Partner with internal stakeholders across Product, Engineering, and Security to enable product velocity and stability
Own the hiring, mentoring, and development of global infrastructure teams with a focus on continuous improvement
Develop and manage the infrastructure budget, focusing on cost optimisation and resource forecasting

What we offer

100% company-paid medical, dental, and vision plans
Unlimited flexible time off
Personalised career roadmap and professional development investment
High-impact, company-wide scope with genuine leadership visibility
A collaborative, globally diverse team culture with a strong focus on job satisfaction and growth

Fulltime

New

Senior Azure Platform Engineer

We are seeking a Senior Azure Platform & Resiliency Engineer to design, build, a...

Location

United States , Tucker

Salary:

99360.00 - 159900.00 USD / Year

Georgia System Operations

Expiration Date

Until further notice

Requirements

6+ years of experience in cloud infrastructure, DevOps, or SRE roles
Strong hands-on experience with Microsoft Azure architecture and services
Proven experience designing and implementing Azure landing zones
Deep expertise in Terraform (or equivalent IaC tools)
Strong understanding of Azure networking (VNet, peering, DNS, private access)
Experience with CI/CD pipelines (GitHub Actions, Azure DevOps)
Proficiency in scripting (Python, PowerShell, or similar)

Job Responsibility

Design and implement Azure Landing Zones aligned with Microsoft Cloud Adoption Framework (CAF)
Define and manage management group and subscription strategy, RBAC and identity models, Azure Policy and governance controls
Build and maintain shared services (networking, logging, identity)
Develop and maintain infrastructure using Terraform (preferred), Bicep, or ARM
Create reusable modules for networking, compute, storage and platform services
Integrate IaC into CI/CD pipelines (GitHub Actions or Azure DevOps)
Design high availability and multi-region architectures
Define and implement disaster recovery (DR) strategies (RTO/RPO)
Conduct failover testing and resilience validation
Establish and track SLIs/SLOs and reliability metrics

What we offer

Comprehensive medical, dental, and vision coverage
Strong retirement program
Career development
Flexible work schedules

Fulltime

New

Principal Site Reliability Engineer (Sovereign Cloud)

As a Principal Site Reliability Engineer, you will serve as the technical author...

Location

Bulgaria , Sofia

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

7+ years of experience in Infrastructure, SRE, or DevOps roles
BS or MS in Computer Science, a related field, or equivalent professional experience
Kubernetes Mastery: Expert-level experience (6+ years) managing production K8s workloads (preferably within GKE, but will also consider EKS)
Deep understanding of Networking, Storage, and RBAC
CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins)
Programming: Proficient in Python for systems programming and automation
Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment
Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated, strong sense of ownership, urgency, and drive

Job Responsibility

Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization
GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model
Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance
Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability
AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks
Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements
Participate in on-call rotations to support critical business and production systems

Fulltime

Sr, Software Engineer, Cloud Storage

As a Software Engineer, you will play a key role in delivering an enterprise‑cla...

Location

United States , Morrisville

Salary:

170000.00 - 220000.00 USD / Year

NetApp

Expiration Date

Until further notice

Requirements

8+ Years of Software Engineering/Development Experience
Strong experience in software design, development, and system-level architecture
Proficiency in programming languages, with Go, Python, C++, or C
Deep knowledge of Kubernetes
hands-on experience building or deploying micro-services using Docker and Kubernetes
Practical experience with public cloud providers such as GCP, Azure, or AWS
Solid understanding of data structures, algorithms, multithreading, distributed systems, and modern programming practices
Strong collaboration and communication skills (verbal and written)
Demonstrated ability to lead features or small teams independently
Quick learner with the ability to adapt to new technologies and complex systems

Job Responsibility

Design, develop, and test new product features involving complex and interdependent distributed systems
Deliver high‑quality, maintainable code across cloud‑native storage components
Independently drive feature development from design to completion
Participate in technical discussions within the team and across partner groups
Collaborate with cloud hyperscalers and internal stakeholders on solutions built for first party cloud native platforms
Work closely with SRE, Product Management, and cross-functional engineering teams to align on design, requirements, and execution
Contribute to design reviews, architectural discussions, and problem investigations
Mentor junior engineers in best practices and technical execution
Ensure solutions meet scalability, reliability, and performance goals for enterprise-class cloud storage systems

What we offer

Health Insurance
Life Insurance
Retirement or Pension Plans
Paid Time Off
various Leave options
Performance-Based Incentives
employee stock purchase plan
restricted stocks (RSU’s)
Volunteer time off: 40 hours of paid volunteer time each year
Well-being: Employee Assistance Program, fitness, and mental health resources

Fulltime

Select Country

Engineering Manager, Storage SRE

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Engineering Manager, Storage SRE

Engineering Manager, Storage

Storage Engineering Manager

Platform Engineering Manager

Technical Architect

Svp Of Infrastructure & Cloud Operations

Senior Azure Platform Engineer

Principal Site Reliability Engineer (Sovereign Cloud)

Sr, Software Engineer, Cloud Storage

Our AI answers in your language