This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Windows Servicing & Delivery (WSD) team investigates and remediates security vulnerabilities and high-severity reliability issues across the Windows platform. The Storage & File Systems team within WSD owns NTFS, ReFS, Storage Spaces Direct (S2D), Windows Server Failover Clustering (WSFC), Cluster Shared Volumes (CSV), the Volume Shadow Copy Service (VSS), and the full Windows storage driver stack — from NVMe and iSCSI miniport drivers through to the file system minifilter layer and user-mode storage management APIs. This Senior Software Engineer role sits at the intersection of kernel engineering and enterprise customer reliability. You will resolve the most complex ICMs escalated by top-tier enterprise and cloud customers — issues that have defeated Tier 1 and Tier 2 support and require deep ownership of source code, cluster state machines, and file system on-disk structures. Alongside security vulnerability work, you will own reliability fixes for S2D rebuild storms, CSV failover edge cases, NTFS metadata corruption, and NVMe queue-depth exhaustion scenarios that impact Fortune 500 production environments.
Job Responsibility:
Own end-to-end resolution of critical ICMs escalated from top enterprise customers — analyze memory dumps, ETW traces, Storage Spaces logs, and cluster event logs to root-cause failures in S2D, WSFC, CSV, NTFS, and ReFS that cannot be resolved by field support
Investigate and fix security vulnerabilities in the Windows storage stack: privilege escalation through NTFS reparse points and junctions, information disclosure via uninitialized kernel pool in file system drivers, and denial-of-service through crafted on-disk structures in ReFS or NTFS
Design and implement reliability and correctness fixes in kernel-mode storage miniport drivers (StorPort, NVMe, iSCSI, SMB Direct/RDMA) and file system filter drivers — owning the full fix lifecycle from root cause through regression test to servicing release
Work directly with Storage Spaces Direct (S2D): diagnose and fix rebuild, rebalance, and fault-domain logic errors
investigate cache tier promotion/demotion bugs
resolve pool fragmentation and storage bus layer (SBL) issues in hyper-converged deployments
Maintain and harden Windows Server Failover Clustering (WSFC) and Cluster Shared Volumes (CSV): resolve quorum edge cases, CSV ownership transfer failures, cluster validation regressions, and inter-node storage arbitration deadlocks
Contribute to the Volume Shadow Copy Service (VSS) and Windows Backup infrastructure: fix provider/requester interaction bugs, VSS writer timeouts in large-scale environments, and shadow copy metadata consistency failures
Develop diagnostic tooling and automated regression suites for the storage stack — including kernel debugger extensions (!sdt, !storport analysis), ETW provider instrumentation, and Storage Spaces health model validation
Collaborate with MSRC for coordinated disclosure and patch delivery on storage-related CVEs
participate in threat modeling and security design reviews for new file system and storage features
Engage directly with enterprise customers and Partner Technical Advisors (PTAs) during active outages to provide expert-level guidance and expedite fix delivery through the servicing pipeline
Mentor engineers
drive technical bar through code reviews, design reviews, and active participation in WSD hiring loops
Requirements:
Bachelor's Degree in Computer Science or related technical field AND 8+ years of software engineering with deep expertise in C and C++ for Windows kernel-mode development
OR equivalent experience
Hands-on experience with Windows storage driver stack: StorPort miniport drivers, storage filter drivers, or file system minifilter drivers — understanding of IRP flow, completion routines, and cancel-safe queue management
Solid grounding in Windows kernel fundamentals
Demonstrated ability to perform crash dump analysis and live kernel debugging using WinDbg
Working knowledge of NTFS on-disk structures: MFT record layout, attribute types, USN journal, and the NTFS log file for crash recovery
Familiarity with ReFS (Resilient File System): B+ tree metadata structure, integrity streams, block cloning, and the differences in crash recovery model versus NTFS
Experience debugging file system corruption scenarios: cross-linked clusters, orphaned MFT records, directory entry inconsistencies, and reparse point cycles
Understanding of Windows file system minifilter architecture: altitude registration, pre/post operation callbacks
Hands-on experience with Windows Server Failover Clustering (WSFC): quorum models (Node Majority, Disk Witness, Cloud Witness), cluster network configuration, and the cluster API
Deep understanding of Cluster Shared Volumes (CSV): CSV file system (CSVFS) redirected vs. direct I/O modes, CSV ownership arbitration, and coordination with the Storage Bus Layer
Experience with Storage Spaces Direct (S2D): storage pool creation, virtual disk provisioning, cache tier architecture (NVMe + SSD + HDD), fault domain awareness, and rebuild/rebalance behavior under node and drive failure
Familiarity with storage connectivity protocols in clustered environments: SMB Direct (RDMA), iSCSI multipath (MPIO/DSM), NVMe-oF, and Fibre Channel HBA integration with StorPort
Proven ability to work high-urgency customer escalations (ICMs / CritSits): triage under time pressure, communicate root cause to non-technical stakeholders, and deliver targeted fixes through the Windows servicing pipeline
Experience reading and interpreting Storage Spaces diagnostic packages, cluster logs, and ETW traces (StorPort, ReFS, NTFS providers) to reconstruct failure timelines
Familiarity with Microsoft Support tooling: ProcMon/xperf captures, and WPA (Windows Performance Analyzer) for I/O latency profiling
Nice to have:
Experience with Azure Stack HCI: S2D on validated hardware, stretched clustering across sites, Azure Arc integration, and software-defined storage policy management
Knowledge of NVMe specification internals: submission/completion queue mechanics, NVMe error log page analysis, and namespace management — beyond just driver-level consumption
Familiarity with SMB protocol internals (SMBv3): persistent handles, witness service (SWN), transparent failover, and scale-out file server (SOFS) architecture
Experience with deduplication and compression engines (Windows Data Deduplication): chunk store architecture, scrubbing, and garbage collection edge cases
Knowledge of Windows BitLocker full-volume encryption integration with clustered storage and its interaction with CSV and S2D volumes
Published CVE credits, conference presentations, or technical blog posts on file system or storage security topics
MS/BS in Computer Science, Electrical Engineering, or a closely related field