This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
System Administration & Maintenance: Install, configure, and maintain HPC clusters (hardware, software, operating systems), perform regular updates/patching, manage user accounts and permissions, and troubleshoot/resolve hardware or software issues
Performance & Optimization: Monitor and analyse system and application performance, identify bottlenecks, implement tuning solutions, and profile workloads to improve efficiency
Cluster & Resource Management: Manage and optimize job scheduling, resource allocation, and cluster operations using tools such as Slurm, LSF, Bright Cluster Manager / Base Command Manager, OpenHPC, and Warewulf
Networking & Interconnects: Configure, manage, and tune Linux networking (TCP/IP, DNS, routing) and high-speed HPC interconnects (InfiniBand, Ethernet) to ensure low-latency, high-bandwidth communication
Storage & Data Management: Implement and maintain large-scale storage and parallel file systems (Lustre, Ceph, GPFS), ensure data integrity, manage backups, and support disaster recovery
Security & Authentication: Implement security controls, ensure compliance with policies, and manage authentication and directory services such as LDAP and Active Directory
DevOps & Automation: Use configuration management and DevOps practices (Ansible, Terraform, Jenkins, Git) to automate deployments, application packaging (RPM/DEB), and system configurations
User Support & Collaboration: Provide technical support, documentation, and training to researchers
collaborate with scientists, HPC architects, and engineers to align infrastructure with research needs
Planning & Innovation: Contribute to the design and planning of HPC infrastructure upgrades, evaluate and recommend hardware/software solutions, and explore cloud-based HPC solutions where applicable
Requirements:
Bachelor’s degree in Computer Science, Engineering, or a related field (equivalent experience may substitute for degree)
Minimum of 10 years of systems experience, including at least 5 years working specifically with HPC
Strong knowledge of Linux operating systems (e.g., Rocky Linux, Ubuntu) with a fundamental understanding of Linux internals, system administration, and performance tuning
Experience building and managing RPM and DEB packages
Experience with cluster management tools such as Bright Cluster Manager, OpenHPC stack, or Warewulf
Proficiency with job schedulers and resource managers such as Slurm and LSF
Strong understanding of Linux networking (e.g., TCP/IP, DNS, routing) and HPC interconnects (e.g., InfiniBand, Ethernet) including performance tuning
Knowledge of parallel file systems such as Lustre, Ceph, or GPFS
Working knowledge of Linux authentication and directory services such as LDAP and Active Directory
Strong experience with DevOps and configuration management tools, including Ansible, Terraform, Jenkins, and Git
Strong knowledge of Linux security, compliance standards, and data protection best practices
Excellent communication, interpersonal, and problem-solving skills
Nice to have:
Proficiency in scripting languages (e.g., Python, Bash, R) and familiarity with MPI libraries for parallel and distributed computing
Knowledge of HPC in cloud environments (e.g., AWS, Azure, GCP HPC offerings) is a plus