CrawlJobs Logo

Senior Infrastructure Software Engineer, Storage

Canada 190400.00 - 257600.00 CAD / Year · Job Posted January 08, 2026
Apply Position
Job Link Share

Job Description

As a Senior Software Engineer on the Storage team, you will help design, build, and operate Dropbox’s large-scale storage systems that provide high durability and scalability for millions of users across all of Dropbox products. The Storage team owns the distributed storage infrastructure at the heart of Dropbox, systems responsible for storing exabytes of user data across multiple data centers worldwide.

Job Responsibility

  • Design, implement, and maintain large-scale distributed storage systems that ensure data durability, availability, and performance
  • Collaborate with peers to evolve the architecture of Dropbox’s core storage infrastructure for improved scalability and efficiency
  • Contribute to the design of replication, erasure coding, and system lifecycle management systems that balance cost, reliability, and performance
  • Write high-quality, performant, and maintainable code in Go and Rust
  • Participate in the on-call rotation, gaining firsthand experience operating Dropbox’s production storage systems
  • Investigate and resolve complex production issues, performing root cause analysis and driving continuous reliability improvements
  • Partner with cross-functional teams (Networking, Hardware, Capacity Planning) to deliver end-to-end reliable and cost-efficient storage solutions
  • Take ownership of scoped projects and demonstrate growth toward leading larger, cross-team technical initiatives

Requirements

  • 8+ years of strong understanding of distributed systems principles, including replication, consistency, and fault tolerance
  • Experience developing and debugging production services in C++, Go, or Rust
  • Familiarity with distributed storage systems, file systems, or data infrastructure at scale
  • Demonstrated ability to write efficient, reliable, and maintainable code in mission-critical environments
  • Experience troubleshooting complex systems and participating in on-call or operational rotations
  • Solid communication and collaboration skills, with the ability to work across infrastructure and product teams
  • Eagerness to learn, grow, and contribute to multi-year infrastructure evolution initiatives

Nice to have

  • Experience building and operating large-scale object storage or distributed storage systems (e.g. S3, Ceph, GFS/Colossus)
  • Deep interest in systems performance, profiling, and low-level optimization
  • Familiarity with replication protocols, erasure coding, and data placement algorithms
  • Experience with production monitoring, observability, and incident response workflows
  • Contributions to infrastructure projects, open-source systems, or developer tooling that improved reliability and performance

What we offer

  • Competitive medical, dental and vision coverage
  • Retirement savings through a defined contribution pension or savings plan
  • Flexible PTO/Paid Time Off, paid holidays, Volunteer Time Off, and more
  • Income Protection Plans: Life and disability insurance
  • Business Travel Protection: Travel medical and accident insurance
  • Perks Allowance to be used on what matters most to you
  • Parental benefits including: Parental Leave, Fertility Benefits, Adoptions and Surrogacy support, and Lactation support
  • Mental health and wellness benefits

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Infrastructure Software Engineer, Storage

8 matching positions

Senior Software Engineer - Storage

The Windows Servicing & Delivery (WSD) team investigates and remediates security...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years of software engineering with deep expertise in C and C++ for Windows kernel-mode development
  • OR equivalent experience
  • Hands-on experience with Windows storage driver stack: StorPort miniport drivers, storage filter drivers, or file system minifilter drivers — understanding of IRP flow, completion routines, and cancel-safe queue management
  • Solid grounding in Windows kernel fundamentals
  • Demonstrated ability to perform crash dump analysis and live kernel debugging using WinDbg
  • Working knowledge of NTFS on-disk structures: MFT record layout, attribute types, USN journal, and the NTFS log file for crash recovery
  • Familiarity with ReFS (Resilient File System): B+ tree metadata structure, integrity streams, block cloning, and the differences in crash recovery model versus NTFS
  • Experience debugging file system corruption scenarios: cross-linked clusters, orphaned MFT records, directory entry inconsistencies, and reparse point cycles
  • Understanding of Windows file system minifilter architecture: altitude registration, pre/post operation callbacks
  • Hands-on experience with Windows Server Failover Clustering (WSFC): quorum models (Node Majority, Disk Witness, Cloud Witness), cluster network configuration, and the cluster API
Job Responsibility
Job Responsibility
  • Own end-to-end resolution of critical ICMs escalated from top enterprise customers — analyze memory dumps, ETW traces, Storage Spaces logs, and cluster event logs to root-cause failures in S2D, WSFC, CSV, NTFS, and ReFS that cannot be resolved by field support
  • Investigate and fix security vulnerabilities in the Windows storage stack: privilege escalation through NTFS reparse points and junctions, information disclosure via uninitialized kernel pool in file system drivers, and denial-of-service through crafted on-disk structures in ReFS or NTFS
  • Design and implement reliability and correctness fixes in kernel-mode storage miniport drivers (StorPort, NVMe, iSCSI, SMB Direct/RDMA) and file system filter drivers — owning the full fix lifecycle from root cause through regression test to servicing release
  • Work directly with Storage Spaces Direct (S2D): diagnose and fix rebuild, rebalance, and fault-domain logic errors
  • investigate cache tier promotion/demotion bugs
  • resolve pool fragmentation and storage bus layer (SBL) issues in hyper-converged deployments
  • Maintain and harden Windows Server Failover Clustering (WSFC) and Cluster Shared Volumes (CSV): resolve quorum edge cases, CSV ownership transfer failures, cluster validation regressions, and inter-node storage arbitration deadlocks
  • Contribute to the Volume Shadow Copy Service (VSS) and Windows Backup infrastructure: fix provider/requester interaction bugs, VSS writer timeouts in large-scale environments, and shadow copy metadata consistency failures
  • Develop diagnostic tooling and automated regression suites for the storage stack — including kernel debugger extensions (!sdt, !storport analysis), ETW provider instrumentation, and Storage Spaces health model validation
  • Collaborate with MSRC for coordinated disclosure and patch delivery on storage-related CVEs
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Storage

As a Senior Software Engineer on our storage team, you'll be joining our core en...
Location
Location
United States , San Francisco, Sunnyvale
Salary
Salary:
166000.00 - 201000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on proficiency in modern software development best practices, and practical experience in languages like Go, Java, C/C++, or Rust
  • Extensive experience developing multi-tenant, cloud scale distributed storage infrastructure software and systems
  • Experience contributing to at least one or more of the following storage products: File (e.g., NFS, SMB, Lustre), Object, or Block Storage (e.g., NVMe, iSCSI)
  • A strong background in high performance filesystem based products, VFS and linux filesystems (e.g., ext4, XFS, ZFS)
  • Proficiency working with Linux and its storage subsystems.
  • Knowledge of monitoring tools (Prometheus, Grafana), log analysis, distributed tracing and debugging
Job Responsibility
Job Responsibility
  • Building Our Multi-Petabyte Cloud Storage Platform
  • Building core components of our foundational storage products, purpose built for high performance AI and ML workloads
  • Contributing to distributed file, block and object storage products, with a focus on filesystem based solutions
  • System Design & Architecture
  • Design and implement high-performance, scalable, and resilient storage architectures that are highly extensible
  • Proposing and prototyping novel strategies to scale performance and system throughput for our most demanding customer workloads
  • Building observability, metrics and tooling for our services and fleet
  • High Velocity Problem Solving
  • Troubleshooting and resolving unique and complex distributed systems problems only seen at the scale we operate at
  • Provide ongoing support for production systems, and customer workloads including troubleshooting, performance tuning, and incident response
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Senior+ Software Engineer, Storage

The Cloud Storage team at Crusoe seeks a Staff Software Engineer to lead the dev...
Location
Location
United States , San Francisco; Sunnyvale
Salary
Salary:
155000.00 - 250000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience building and operating large scale, complex distributed cloud computing infrastructure products
  • Preferably, experience building redundant and fault tolerant storage solutions with backups, replication, encryption, and data protection mechanisms
  • Knowledge of professional software engineering practices and best practices for the full software development life cycle
  • Strong experience with at least one application programming language like Java or Go
  • Exposure to Infrastructure as Code tooling with any of Ansible, Chef, Puppet, and/or Terraform
  • Knowledge of Linux Systems Internals and computer architecture
  • Strong communication and collaboration skills
  • Must be able to pass a background check
Job Responsibility
Job Responsibility
  • Lead engineering efforts on cloud storage features by collaborating with product and engineering to define and execute features on the roadmap
  • Write and review code, generate and review design documentation
  • Participate in qualifications and rollouts of software across the stack journeying from bare metal to user-facing APIs
  • Guide the engineering team through architecture decisions, design processes, design reviews, code reviews, and implementation tasks
  • Mentor and grow engineers on your team
  • Champion and lead initiatives across the engineering organization such as tech talks, open source development, and book clubs
  • Benchmark, analyze, and improve scale, performance, and resiliency issues
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Infrastructure

We are a team of passionate engineers who love solving complex distributed syste...
Location
Location
Salary
Salary:
144200.00 - 169400.00 CAD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, MS, or PhD in computer science or a related field, or equivalent work experience
  • 2+ years of relevant cloud infrastructure/cloud networking experience
  • Strong fundamentals in distributed systems design and development
  • Experience building and operating large-scale systems in the Cloud
  • Solid understanding of basic systems operations (disk, network, operating systems, etc)
  • A self starter with the ability to work effectively in teams
  • Proficiency in Java, Scala, C/C++, Go or other statically typed languages
Job Responsibility
Job Responsibility
  • Build the software underpinning the mission-critical Confluent Cloud storage engine
  • Independently drive execution of software projects to deliver complex projects in production with a focus on quality
  • Identify root causes, and get beyond treating symptoms - motivated to dig deep and solve hard problems
  • Troubleshoot issues and improve operations for complex technical stack that spans all the 3 clouds
  • Have a strong sense of teamwork and be able to make decisions which benefit the team and company
  • Customer focused - making customers more successful by taking on their most challenging problems motivates you
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Leadership Principles
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

Serval is building an AI platform to automate complex IT workflows for modern en...
Location
Location
United States , San Francisco
Salary
Salary:
200000.00 - 300000.00 USD / Year
serval.com Logo
Serval
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years building and operating large-scale distributed systems in production environments
  • Strong experience writing and maintaining Terraform for infrastructure provisioning and management
  • Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure), including compute, networking, storage, and managed services
  • Experience building, packaging, and supporting self-hosted or on-premises software deployments for enterprise customers
  • Proficiency in Python, Go, or similar languages for building automation, tooling, and infrastructure services
  • Strong understanding of networking, databases, containerization (Docker, Kubernetes), and orchestration systems
  • Experience with monitoring, logging, alerting, and incident management tools (e.g., Datadog, Prometheus, Grafana, PagerDuty)
  • Ability to communicate technical concepts clearly to customers and provide infrastructure support and guidance
  • Ability to debug complex system issues, analyze performance bottlenecks, and implement effective solutions
Job Responsibility
Job Responsibility
  • Design, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines
  • Write and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments
  • Build and maintain deployment packages, installation scripts, and infrastructure templates that enable customers to self-host Serval in their own environments
  • Provide technical guidance and troubleshooting support to enterprise customers deploying and operating self-hosted instances of Serval
  • Ensure high availability, performance, and reliability of production systems through monitoring, alerting, incident response, and capacity planning
  • Build internal tools and platforms that enable product engineers to deploy, test, and operate services efficiently
  • Collaborate with engineering teams to design resilient, scalable architectures that support both cloud-hosted and self-hosted deployment models
  • Profile and optimize system performance, including compute, storage, networking, and database layers
  • Implement security best practices and ensure infrastructure meets enterprise compliance requirements for both managed and self-hosted deployments
What we offer
What we offer
  • Offers Equity
  • comprehensive health coverage
  • flexible PTO
  • daily lunches and snacks
  • onsite gym access
  • regular team events and offsites
  • Fulltime
Read More
Arrow Right

Senior .NET Engineer (Storage Infrastructure)

Our Client's team is looking for a self-motivated software engineer to join deve...
Location
Location
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience in high scale distributed systems
  • 8+ years of experience building resilient and highly available web services
  • Experience documenting architectural standards and decisions
  • Experience in full stack development
  • B.S., M.S., or PhD in Computer Science or equivalent experience
Job Responsibility
Job Responsibility
  • Design, develop, and maintain high-performance backend systems and APIs using C# and .NET technologies, hosted in azure and various compliance level data-centers
  • Leverage Azure services like Azure App Services, Azure Kubernetes Service (AKS), Azure Blob Storage, and SQL/No-SQL Databases to build scalable, secure, and reliable cloud-native solutions
  • Build and maintain microservices-based architectures using C#, ASP.NET, and others
  • Design and implement RESTful or gRPC APIs, ensure seamless integration with other systems and products
  • Optimize architecture and solution for scalability and availability with cost and maintenance in mind
  • Identify and address performance bottlenecks and scalability challenges proactively
  • Align across teams for designs, communicate and resolve roadblocks
  • Guide and mentor other engineers through design and code reviews
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross‑functional communication
  • effective collaboration with both technical and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open‑source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
  • Contribute improvements back to open source and CNCF‑aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off in accordance with local leave policies
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross-functional communication
  • effective collaboration with both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open-source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi-tenant, multi-cluster, and multi-cloud designs
  • Contribute improvements back to open source and CNCF-aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off work for vacation and other personal reasons
  • Fulltime
Read More
Arrow Right