Hpc Operations Lead Job at Linux Recruit (London)

Job Description

Lead the systems that power discovery. Behind every breakthrough in modern science sits the computational infrastructure that makes it possible. The platforms, clusters and storage environments that turn bold ideas into real progress. This is an opportunity to lead that foundation working at the intersection of technology and discovery. You will join a world leading research institute where scientists and engineers work side by side to tackle some of the most complex challenges in Science and Technology. The culture is open, collaborative and deeply curious, designed to remove barriers and enable innovation at scale. As HPC Operations Lead, you will play a central role in shaping how research computing services are delivered and evolved. Reporting into the Head of Research Computing Platforms, you will take ownership of the operational performance of a large scale HPC and storage environment, ensuring systems are robust, responsive and continuously improving. This is a leadership role with real breadth. You will guide a specialist team, oversee service delivery and act as a key point of connection between technical teams and scientific users. From managing incidents and service performance to influencing long term technology direction and strategy, your work will directly support research outcomes across the organisation. A key part of the role is ensuring that complex infrastructure remains accessible and usable. You will engage closely with researchers to understand their needs, translate technical concepts into clear language and help shape platforms that genuinely enable scientific progress. Alongside this, you will lead on the design and operation of high performance storage services, supporting both internal workloads and external collaboration. The environment includes large scale HPC clusters, Linux based systems, workload schedulers such as Slurm, networking with Infiniband and parallel file systems such as GPFS. Experience with high performance storage at petabyte scale is particularly relevant, alongside a broader understanding of automation, data centre environments or networking. You will bring proven leadership experience, strong operational awareness and the ability to manage complex services with limited resources and competing priorities. Just as important is your ability to work collaboratively across teams, balancing technical depth with a clear focus on outcomes. This is a role for someone who wants their work to matter. Every system you improve and every service you shape will contribute to research that has the potential to change lives.

Job Responsibility

Play a central role in shaping how research computing services are delivered and evolved
take ownership of the operational performance of a large scale HPC and storage environment
ensure systems are robust, responsive and continuously improving
guide a specialist team
oversee service delivery
act as a key point of connection between technical teams and scientific users
managing incidents and service performance
influencing long term technology direction and strategy
ensuring complex infrastructure remains accessible and usable
engage closely with researchers to understand their needs
translate technical concepts into clear language
help shape platforms that genuinely enable scientific progress
lead on the design and operation of high performance storage services
supporting both internal workloads and external collaboration

Requirements

Proven leadership experience
strong operational awareness
ability to manage complex services with limited resources and competing priorities
ability to work collaboratively across teams
experience with large scale HPC clusters
Linux based systems
workload schedulers such as Slurm
networking with Infiniband
parallel file systems such as GPFS
experience with high performance storage at petabyte scale
broader understanding of automation
data centre environments or networking

Linux Recruit - All Job Offers

Select Country

Hpc Operations Lead

Job Description

Job Responsibility

Requirements

Looking for more opportunities?

Hpc Operations Lead

HPC Operations Lead

Hpc Operations Engineering Manager

Systems and Operations QA Lead

Senior Software Engineer- ML Network Stack

Logistics Chargehand CLS - Days

Principal Product Manager - Virtualization Architect

Senior Cybersecurity Engineer

Lead Engineer, Ml Network Stack - Annapurna Labs

Our AI answers in your language