Senior Infrastructure Software Engineer, Storage Job at Dropbox

Senior Software Engineer - Storage

The Windows Servicing & Delivery (WSD) team investigates and remediates security...

Location

India , Hyderabad

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 8+ years of software engineering with deep expertise in C and C++ for Windows kernel-mode development
OR equivalent experience
Hands-on experience with Windows storage driver stack: StorPort miniport drivers, storage filter drivers, or file system minifilter drivers — understanding of IRP flow, completion routines, and cancel-safe queue management
Solid grounding in Windows kernel fundamentals
Demonstrated ability to perform crash dump analysis and live kernel debugging using WinDbg
Working knowledge of NTFS on-disk structures: MFT record layout, attribute types, USN journal, and the NTFS log file for crash recovery
Familiarity with ReFS (Resilient File System): B+ tree metadata structure, integrity streams, block cloning, and the differences in crash recovery model versus NTFS
Experience debugging file system corruption scenarios: cross-linked clusters, orphaned MFT records, directory entry inconsistencies, and reparse point cycles
Understanding of Windows file system minifilter architecture: altitude registration, pre/post operation callbacks
Hands-on experience with Windows Server Failover Clustering (WSFC): quorum models (Node Majority, Disk Witness, Cloud Witness), cluster network configuration, and the cluster API

Job Responsibility

Own end-to-end resolution of critical ICMs escalated from top enterprise customers — analyze memory dumps, ETW traces, Storage Spaces logs, and cluster event logs to root-cause failures in S2D, WSFC, CSV, NTFS, and ReFS that cannot be resolved by field support
Investigate and fix security vulnerabilities in the Windows storage stack: privilege escalation through NTFS reparse points and junctions, information disclosure via uninitialized kernel pool in file system drivers, and denial-of-service through crafted on-disk structures in ReFS or NTFS
Design and implement reliability and correctness fixes in kernel-mode storage miniport drivers (StorPort, NVMe, iSCSI, SMB Direct/RDMA) and file system filter drivers — owning the full fix lifecycle from root cause through regression test to servicing release
Work directly with Storage Spaces Direct (S2D): diagnose and fix rebuild, rebalance, and fault-domain logic errors
investigate cache tier promotion/demotion bugs
resolve pool fragmentation and storage bus layer (SBL) issues in hyper-converged deployments
Maintain and harden Windows Server Failover Clustering (WSFC) and Cluster Shared Volumes (CSV): resolve quorum edge cases, CSV ownership transfer failures, cluster validation regressions, and inter-node storage arbitration deadlocks
Contribute to the Volume Shadow Copy Service (VSS) and Windows Backup infrastructure: fix provider/requester interaction bugs, VSS writer timeouts in large-scale environments, and shadow copy metadata consistency failures
Develop diagnostic tooling and automated regression suites for the storage stack — including kernel debugger extensions (!sdt, !storport analysis), ETW provider instrumentation, and Storage Spaces health model validation
Collaborate with MSRC for coordinated disclosure and patch delivery on storage-related CVEs

Fulltime

Senior Software Engineer, Storage

As a Senior Software Engineer on our storage team, you'll be joining our core en...

Location

United States , San Francisco, Sunnyvale

Salary:

166000.00 - 201000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

Hands-on proficiency in modern software development best practices, and practical experience in languages like Go, Java, C/C++, or Rust
Extensive experience developing multi-tenant, cloud scale distributed storage infrastructure software and systems
Experience contributing to at least one or more of the following storage products: File (e.g., NFS, SMB, Lustre), Object, or Block Storage (e.g., NVMe, iSCSI)
A strong background in high performance filesystem based products, VFS and linux filesystems (e.g., ext4, XFS, ZFS)
Proficiency working with Linux and its storage subsystems.
Knowledge of monitoring tools (Prometheus, Grafana), log analysis, distributed tracing and debugging

Job Responsibility

Building Our Multi-Petabyte Cloud Storage Platform
Building core components of our foundational storage products, purpose built for high performance AI and ML workloads
Contributing to distributed file, block and object storage products, with a focus on filesystem based solutions
System Design & Architecture
Design and implement high-performance, scalable, and resilient storage architectures that are highly extensible
Proposing and prototyping novel strategies to scale performance and system throughput for our most demanding customer workloads
Building observability, metrics and tooling for our services and fleet
High Velocity Problem Solving
Troubleshooting and resolving unique and complex distributed systems problems only seen at the scale we operate at
Provide ongoing support for production systems, and customer workloads including troubleshooting, performance tuning, and incident response

What we offer

Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement

Fulltime

Senior+ Software Engineer, Storage

The Cloud Storage team at Crusoe seeks a Staff Software Engineer to lead the dev...

Location

United States , San Francisco; Sunnyvale

Salary:

155000.00 - 250000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

Hands-on experience building and operating large scale, complex distributed cloud computing infrastructure products
Preferably, experience building redundant and fault tolerant storage solutions with backups, replication, encryption, and data protection mechanisms
Knowledge of professional software engineering practices and best practices for the full software development life cycle
Strong experience with at least one application programming language like Java or Go
Exposure to Infrastructure as Code tooling with any of Ansible, Chef, Puppet, and/or Terraform
Knowledge of Linux Systems Internals and computer architecture
Strong communication and collaboration skills
Must be able to pass a background check

Job Responsibility

Lead engineering efforts on cloud storage features by collaborating with product and engineering to define and execute features on the roadmap
Write and review code, generate and review design documentation
Participate in qualifications and rollouts of software across the stack journeying from bare metal to user-facing APIs
Guide the engineering team through architecture decisions, design processes, design reviews, code reviews, and implementation tasks
Mentor and grow engineers on your team
Champion and lead initiatives across the engineering organization such as tech talks, open source development, and book clubs
Benchmark, analyze, and improve scale, performance, and resiliency issues

What we offer

Restricted Stock Units
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Software Engineer - Infrastructure

We are a team of passionate engineers who love solving complex distributed syste...

Location

Salary:

144200.00 - 169400.00 CAD / Year

Confluent

Expiration Date

Until further notice

Requirements

BS, MS, or PhD in computer science or a related field, or equivalent work experience
2+ years of relevant cloud infrastructure/cloud networking experience
Strong fundamentals in distributed systems design and development
Experience building and operating large-scale systems in the Cloud
Solid understanding of basic systems operations (disk, network, operating systems, etc)
A self starter with the ability to work effectively in teams
Proficiency in Java, Scala, C/C++, Go or other statically typed languages

Job Responsibility

Build the software underpinning the mission-critical Confluent Cloud storage engine
Independently drive execution of software projects to deliver complex projects in production with a focus on quality
Identify root causes, and get beyond treating symptoms - motivated to dig deep and solve hard problems
Troubleshoot issues and improve operations for complex technical stack that spans all the 3 clouds
Have a strong sense of teamwork and be able to make decisions which benefit the team and company
Customer focused - making customers more successful by taking on their most challenging problems motivates you

What we offer

Remote-First Work
Robust Insurance Benefits
Flexible Time Away
The Best Teammates
Experience Ambassadors
Open and Honest Culture
Well-Being and Growth
Leadership Principles

Fulltime

Senior Software Engineer, Infrastructure

Serval is building an AI platform to automate complex IT workflows for modern en...

Location

United States , San Francisco

Salary:

200000.00 - 300000.00 USD / Year

Serval

Expiration Date

Until further notice

Requirements

3+ years building and operating large-scale distributed systems in production environments
Strong experience writing and maintaining Terraform for infrastructure provisioning and management
Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure), including compute, networking, storage, and managed services
Experience building, packaging, and supporting self-hosted or on-premises software deployments for enterprise customers
Proficiency in Python, Go, or similar languages for building automation, tooling, and infrastructure services
Strong understanding of networking, databases, containerization (Docker, Kubernetes), and orchestration systems
Experience with monitoring, logging, alerting, and incident management tools (e.g., Datadog, Prometheus, Grafana, PagerDuty)
Ability to communicate technical concepts clearly to customers and provide infrastructure support and guidance
Ability to debug complex system issues, analyze performance bottlenecks, and implement effective solutions

Job Responsibility

Design, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines
Write and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments
Build and maintain deployment packages, installation scripts, and infrastructure templates that enable customers to self-host Serval in their own environments
Provide technical guidance and troubleshooting support to enterprise customers deploying and operating self-hosted instances of Serval
Ensure high availability, performance, and reliability of production systems through monitoring, alerting, incident response, and capacity planning
Build internal tools and platforms that enable product engineers to deploy, test, and operate services efficiently
Collaborate with engineering teams to design resilient, scalable architectures that support both cloud-hosted and self-hosted deployment models
Profile and optimize system performance, including compute, storage, networking, and database layers
Implement security best practices and ensure infrastructure meets enterprise compliance requirements for both managed and self-hosted deployments

What we offer

Offers Equity
comprehensive health coverage
flexible PTO
daily lunches and snacks
onsite gym access
regular team events and offsites

Fulltime

Senior .NET Engineer (Storage Infrastructure)

Our Client's team is looking for a self-motivated software engineer to join deve...

Location

Salary:

Not provided

N-iX

Expiration Date

Until further notice

Requirements

8+ years of software engineering experience in high scale distributed systems
8+ years of experience building resilient and highly available web services
Experience documenting architectural standards and decisions
Experience in full stack development
B.S., M.S., or PhD in Computer Science or equivalent experience

Job Responsibility

Design, develop, and maintain high-performance backend systems and APIs using C# and .NET technologies, hosted in azure and various compliance level data-centers
Leverage Azure services like Azure App Services, Azure Kubernetes Service (AKS), Azure Blob Storage, and SQL/No-SQL Databases to build scalable, secure, and reliable cloud-native solutions
Build and maintain microservices-based architectures using C#, ASP.NET, and others
Design and implement RESTful or gRPC APIs, ensure seamless integration with other systems and products
Optimize architecture and solution for scalability and availability with cost and maintenance in mind
Identify and address performance bottlenecks and scalability challenges proactively
Align across teams for designs, communicate and resolve roadblocks
Guide and mentor other engineers through design and code reviews

What we offer

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

Senior Software Engineer - Cloud Infrastructure & Observability

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross‑functional communication
effective collaboration with both technical and non‑technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open‑source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
Contribute improvements back to open source and CNCF‑aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off in accordance with local leave policies

Fulltime

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...

Location

United Kingdom , Cambridge

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
Experience operating observability at pb-scale ingestion and hundreds of millions of series
Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
Proven experience integrating security as part of infrastructure and platform development
Exceptional cross-functional communication
effective collaboration with both technical and non-technical stakeholders

Job Responsibility

Architect and lead Roku’s observability platform across metrics, logs, and traces
evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
Extend and harden open-source observability systems
overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
augment and automate CI/CD flows and onboarding
Integrate security into infrastructure and platform services
ensure robust multi-tenant, multi-cluster, and multi-cloud designs
Contribute improvements back to open source and CNCF-aligned projects

What we offer

Global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life, accident, disability, commuter, and retirement options (401(k)/pension)
time off work for vacation and other personal reasons

Fulltime

Select Country

Senior Infrastructure Software Engineer, Storage

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?