This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced HPC Storage Engineer to design, implement, and optimize the storage and data movement infrastructure that underpins our high-performance computing (HPC) environment. This role focuses on distributed and parallel filesystems, storage systems, and large-scale data movement, ensuring reliable, high-throughput access to data for compute-intensive workloads. You will work closely with HPC platform engineers, compute and networking teams, and application users to deliver scalable, performant, and resilient storage solutions that tightly integrate the storage layer with compute nodes.
Job Responsibility:
Design, deploy, and operate HPC storage systems and parallel/distributed filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS, Ceph)
Own data movement workflows across environments, including data ingest, replication, tiering, and archiving
Optimize filesystem and storage performance for large-scale parallel workloads
Design and tune load-balancing strategies across storage targets, metadata services, and data movement pipelines to ensure even utilization, high throughput, and predictable performance at scale
Troubleshoot storage, I/O, and data movement issues across HPC compute clusters
Develop and maintain automation for storage provisioning, monitoring, and lifecycle management
Partner with compute and networking teams to ensure end-to-end performance and reliability
Advise users and application teams on best practices for I/O patterns, data layout, and performance tuning
Evaluate and integrate new storage technologies and architectures as requirements evolve
Requirements:
Hands-on experience with parallel or distributed filesystems in production environments
Strong understanding of Linux systems administration
Experience with high-performance I/O, data locality, and throughput optimization
Proficiency in large-scale distributed systems development, preferably in C++
Proven ability to troubleshoot complex performance and reliability issues across storage and compute stacks
Experience with data transfer and movement tools
Nice to have:
Familiarity with object storage and hierarchical storage management (HSM)
Experience integrating storage with HPC schedulers (e.g., Slurm) and compute workflows
Background supporting scientific, ML/AI, or other data-intensive workloads