CrawlJobs Logo

HPC Lead

United Kingdom, London 75000.00 - 82000.00 GBP / Year · Job Posted January 16, 2026
Apply Position
Job Link Share

Job Description

The most significant breakthroughs in human history rarely happen in a vacuum. Behind every cured disease and every climate solution lies a hidden engine, a massive and complex world of high performance computing that turns raw data into life saving reality. This company are tackling the greatest challenges facing the planet today, but our scientists can only go as far as our infrastructure allows. This is where your story begins. As a Lead HPC Engineer, you are not just managing hardware; you are the master architect of a computational frontier. You understand that in the world of high stakes research, a millisecond of latency or a bottleneck in a storage cluster is not just a technical glitch, it is a delay in finding a potential cure. You will step into a role that demands a rare blend of visionary strategy and hands on technical mastery, taking full ownership of an HPC roadmap that must evolve as fast as the science it supports.Your day to day life will be a balance of high level orchestration and deep system craft. One hour you might be mentoring a talented team of engineers and fostering a culture where good enough is never the standard. The next, you are deep in the architecture of Slurm workloads or optimising parallel storage systems like GPFS to handle unprecedented scales of data. You are the vital translator, the person who sits with world class researchers to turn their most ambitious scientific dreams into reliable and high performing technical realities. We are looking for someone who has lived and breathed Linux environments and high speed networking, but who also possesses the leadership spark to inspire a team. You are the type of person who stays restless, always curious about the next emerging technology and eager to push our infrastructure ahead of the curve. You do not just want to maintain the status quo; you want to build the most resilient and innovative environment possible because you know exactly what is at stake. There is no need to polish a CV just yet. Let us start with a conversation about what we can build together.

Job Responsibility

  • Managing hardware as master architect of computational frontier
  • Taking full ownership of HPC roadmap
  • Mentoring team of engineers
  • Deep system craft with Slurm workloads
  • Optimising parallel storage systems like GPFS
  • Translating researcher ambitions into technical realities

Requirements

  • Lived and breathed Linux environments
  • Experience with high speed networking
  • Leadership skills to inspire a team
  • Restless and curious about emerging technology
  • Ability to build resilient and innovative environments

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

HPC Lead

8 matching positions

Hpc Operations Lead

Lead the systems that power discovery. Behind every breakthrough in modern scien...
Location
Location
United Kingdom , London
Salary
Salary:
73000.00 - 82000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven leadership experience
  • strong operational awareness
  • ability to manage complex services with limited resources and competing priorities
  • ability to work collaboratively across teams
  • experience with large scale HPC clusters
  • Linux based systems
  • workload schedulers such as Slurm
  • networking with Infiniband
  • parallel file systems such as GPFS
  • experience with high performance storage at petabyte scale
Job Responsibility
Job Responsibility
  • Play a central role in shaping how research computing services are delivered and evolved
  • take ownership of the operational performance of a large scale HPC and storage environment
  • ensure systems are robust, responsive and continuously improving
  • guide a specialist team
  • oversee service delivery
  • act as a key point of connection between technical teams and scientific users
  • managing incidents and service performance
  • influencing long term technology direction and strategy
  • ensuring complex infrastructure remains accessible and usable
  • engage closely with researchers to understand their needs
  • Fulltime
Read More
Arrow Right

HPC Operations Lead

One of Europe’s most exciting research organisations is on the hunt for a Lead E...
Location
Location
United Kingdom , London
Salary
Salary:
70000.00 - 80000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Knowledge of HPC environments and large-scale storage
  • Experience leading people and platforms
  • Ability to communicate with clarity and warmth
  • Comfortable juggling priorities and working with different stakeholders
  • Ability to find practical solutions in a fast-moving research setting
  • Experience in science or biomedical research is beneficial
  • Curiosity and a collaborative mindset
Job Responsibility
Job Responsibility
  • Take ownership of high-performance compute and large-scale storage platforms
  • Ensure platforms are reliable, responsive, and ready
  • Work closely with researchers and technology teams
  • Oversee the HPC service desk
  • Guide incident response
  • Help shape the future direction of the platforms
  • Design and deliver training
  • Support users
  • Step into a wider leadership role when required
What we offer
What we offer
  • Excellent benefits
  • Culture that encourages ideas, learning and teamwork
  • Fulltime
Read More
Arrow Right

HPC Supercomputer Onsite Administrator Team Lead

HPC Supercomputer Onsite Administrator Team Lead role at Hewlett Packard Enterpr...
Location
Location
United States , Spring
Salary
Salary:
105500.00 - 243000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience
  • Bachelor of Arts/Science or equivalent degree in computer science or related area of study
  • without a degree, 11+ years of relevant professional experience
  • Previous experience managing projects and leading small teams (of 3-10)
  • Installing, troubleshooting and supporting enterprise-level servers, storage, and networking equipment
  • HPC (High Performance Computing) or other large-scale systems/datacenter experience
  • Extensive Linux based hardware troubleshooting and diagnostics experience
  • US Citizenship required
  • Self-starter who can work independently without supervision
  • Understanding of architectural dependencies of technologies
Job Responsibility
Job Responsibility
  • Report daily to and physically work at the Customer's Site
  • Accountable for meeting and maintaining customer's SLA (Service Level Agreement)
  • Engage in technical problem solving across multiple technologies
  • Own and drive service tickets to ensure timely resolution of system or customer issues
  • Lead in technical assessment and delivery of specific technical solutions to the customer
  • Perform and direct team for daily hardware diagnostics and repairs
  • Verify and implement detailed technical solutions to problems
  • Maintain good relationships with team members and customers
  • Collect data to determine customer needs and requirements
  • Respond to requests for technical information from customers
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion policy
  • Comprehensive benefits suite supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Senior Software Engineer- ML Network Stack

We are seeking an experienced engineer to join our team that owns the network st...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
Amazon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of non-internship professional software development experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • 3+ years as a mentor, tech lead or leading engineering teams
  • 3+years experience in SW/HW Co-Design
Job Responsibility
Job Responsibility
  • Be a senior engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale
  • Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time
  • Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads
  • Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders
  • Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach customers
  • Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve
Read More
Arrow Right

Construction Delivery Lead

The Construction Delivery Lead (CDL) forms part of the Construction Delivery Tea...
Location
Location
United Kingdom , Bristol
Salary
Salary:
Not provided
amentum.com Logo
Amentum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Management of strategic planning of site set up, construction sequencing, recovery plans and resource allocation
  • Good working knowledge of commercial principles affecting construction matters
  • Ability to produce informative, concise reports
  • Motivational approach and the ability to energise team members by building a climate of trust and understanding
  • Considerable working knowledge in the delivery of large complex projects
  • Relevant Degree (or equivalent) in either Civil Construction
  • Working understanding of the post holder's obligations under CDM Regulations
  • NEBOSH, SMSTS or IOSH qualification holder
Job Responsibility
Job Responsibility
  • Manage and develop their Construction Delivery Managers, to ensure they have a good understanding of site activities and can carry out their role accordingly. Offer support and guidance, along with ensuring that all appropriate training for a CDM is undertaken
  • Fulfils Line Management responsibilities to Construction Delivery Managers
  • Oversee all aspects of Health Safety and Environment during construction activities in their allocated area (s), in accordance with the NNB HPC Construction Phase Plan
  • Undertake and record safety and/or assurance inspections that support the project KPI's. The tool at HPC for this is INSIGHT
  • Construction Delivery Lead shares the Construction Delivery Manager's responsibilities depending on work area and scope complexity. When nominated by the SCDM, undertakes role of Construction Delivery Manager for specific areas as and when required
What we offer
What we offer
  • free single medical cover and digital GP service
  • family-friendly benefits such as enhanced parental leave pay
  • free membership of employee assistance and parental programmes
  • reimbursement towards relevant professional development and memberships
  • matched-funding
  • paid volunteering time
  • charitable donations
  • Fulltime
Read More
Arrow Right

Principal Product Manager - Virtualization Architect

Designs, plans, develops, and manages a product or portfolio of virtualization p...
Location
Location
United States , All
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent in computer science, engineering or related field of study
  • 10+ years of experience in product management, engineering, or a related technical role, with significant exposure to virtualization platforms and hypervisor technologies
  • Demonstrated hands-on or architectural familiarity with KVM, QEMU, libvirt, and the broader Linux virtualization stack
  • Deep technical knowledge of KVM hypervisor architecture, virtual machine lifecycle, vCPU scheduling, memory management (huge pages, NUMA), virtio device emulation, and hardware-assisted virtualization (Intel VT-x/AMD-V, IOMMU)
  • Strong understanding of virtualized networking (OVS, macvtap, SR-IOV, DPDK) and storage virtualization (virtio, iSCSI, NVMe-oF, Ceph/RBD) as they apply to KVM guest workloads
  • Familiarity with virtualization management and orchestration ecosystems - including libvirt APIs, oVirt, OpenStack Nova, and KubeVirt - and the ability to define product integration requirements across these layers
  • Extensive cross-functional leadership skills: ability to drive alignment across engineering, field, and partner organizations on complex, technically ambiguous virtualization platform initiatives
  • Strong financial and business acumen, including experience building business cases, defining performance metrics, and analyzing competitive positioning for infrastructure software products
  • Ability to provide product-specific technical training and enablement to sales, partners, and customer-facing teams on KVM and virtualization platform capabilities
  • Experience engaging with open-source ecosystems and upstream communities (Linux kernel, QEMU, libvirt, oVirt, OpenStack) as a product stakeholder
Job Responsibility
Job Responsibility
  • Independently leads end-to-end strategy and operational roadmap for one or more KVM-based virtualization products or a broader virtualization platform portfolio, spanning hypervisor core, management APIs, and guest ecosystem
  • Defines and drives the virtualization platform value proposition - including performance benchmarks, TCO advantages, and feature differentiation versus VMware, Hyper-V, and other competing hypervisor stacks - to support go-to-market and sales enablement
  • Synthesizes market and customer requirements (MRDs) by maintaining deep knowledge of enterprise virtualization use cases: VDI, server consolidation, cloud-native workloads, telco NFV/edge, and HPC virtualization
  • Translates KVM/QEMU/libvirt engineering capabilities into customer-facing requirements and product specifications, ensuring technical feasibility and roadmap alignment with upstream open-source communities (e.g., Linux kernel KVM subsystem, QEMU project)
  • Guides key stakeholders through all lifecycle phases - from hypervisor feature planning and kernel integration to product launch, sustaining engineering, and platform end-of-life planning
  • Collaborates across engineering, supply chain, and marketing to optimize product configuration, SKU design, pricing, and go-to-market strategies for virtualization platform offerings
  • Acts as a subject matter authority on KVM virtualization architecture, providing technical direction to internal teams, enabling sales and partner technical communities, and representing the product externally with customers and at industry forums
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Cybersecurity Engineer

Senior Cybersecurity Engineer LOCATION: Eglin AFB, FL JOB STATUS: Full-time C...
Location
Location
United States , Eglin Air Force Base
Salary
Salary:
Not provided
astrion.us Logo
Astrion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s Degree (in Computer Science, Cybersecurity or a related field). Relevant experience may be substituted for the degree
  • 10 Years’ total experience, at least 8 of which is in cybersecurity engineering, architecture or R&D infrastructure
  • Top Secret Clearance with SCI. Eligible for Special Access Program (SAP) access. US Citizenship is required
  • DoD 8570/8140 IAT Level III (CISSP, CISM, or equivalent). Certifications: Security+, CEH, or other relevant security certifications
  • Expert-level knowledge of cybersecurity principles, risk management, and secure computing architectures
  • Hands-on experience with security tools and technologies, such as SIEM, intrusion detection/prevention systems, vulnerability scanners, and endpoint protection solutions. Experience with Host-Based Security System (HBSS), Assured Compliance Assessment Solution (ACAS), Nessus, Tenable.sc, Tenable.io, NNM, LCE, Nessus Manager, Agents, and Scanner
  • Experience with scripting (Python, PowerShell) and automation tools (Ansible, Chef)
  • Familiarity with Risk Management Framework (RMF), Authority to Operate (ATO) documentation, and enclave compliance management
  • Physically able to lift up to 50 lbs
  • adaptable to fieldwork and hands-on installations
Job Responsibility
Job Responsibility
  • Collaborate with network engineers to architect secure network topologies for current and future connected and isolated environments, ensuring security is embedded in the design phase
  • Design and deploy security solutions for S&T environments that support continuous research, development, and DevSecOps, working closely with network engineers to implement and maintain these solutions
  • Advise on security planning for long-term initiatives, including SDREN integration and the Weapons Technology Integration Center (WTIC) and other facility projects, in conjunction with network planning efforts
  • Develop security innovation roadmaps aligned with mission goals and emerging technologies, coordinating with network engineers to ensure alignment with network modernization efforts
  • Coordinate with facilities, engineering, and network teams to ensure robust infrastructure supports secure research operations, focusing on the security aspects of network hardware/power/cooling needs and structured cabling
  • Lead security aspects of containerization, virtualization, and orchestration of systems to support laboratory computing, HPC, and edge devices, working with network engineers to implement secure configurations
  • Engineer multiple S&T networks security architecture in compliance with NIST 800-series, DoD RMF, DISA Security Technical Implementation Guides (STIGs), and cybersecurity best practices, collaborating with network engineers to ensure seamless integration. Review engineering, architecture, and designs to ensure DoD security policies are met
  • Implement DevSecOps pipelines to automate security scans and CI/CD deployments, working with network engineers to integrate security into existing pipelines
  • Manage ATO package development and collaborate with ISSMs, network engineers, and cybersecurity stakeholders to ensure compliance. Review and develop RMF Assessment and Authorization (A&A) documentation, e.g. System Security Plans (SSPs), Security Assessment Reports (SARs), and Plans of Action and Milestones (POA&Ms)
  • Integrate identity management and single sign-on solutions across enclaves and hybrid environments, coordinating with network engineers to implement and maintain these solutions. Analyze and tune HBSS policies for assets during integration test events. Perform verification and troubleshooting across all HBSS modules. Install updates to HBSS software as released and in compliance with STIG requirements. Monitor HBSS software to ensure that the clients/servers are operational and reporting properly
What we offer
What we offer
  • Competitive salaries
  • Continuing education assistance
  • Professional development
  • Multiple healthcare benefits package options
  • 401K with employer matching
  • Competitive time off policy along with a federally recognized holiday schedule
  • Fulltime
Read More
Arrow Right

Lead Engineer, Ml Network Stack - Annapurna Labs

We are seeking an experienced engineer and technical leader to join our team tha...
Location
Location
United States , Seattle; Cupertino
Salary
Salary:
168100.00 - 261500.00 USD / Year
Amazon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of non-internship professional software development experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • 3+ years as a mentor, tech lead or leading engineering teams
  • 3+ years experience in SW/HW Co-Design
Job Responsibility
Job Responsibility
  • Be the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale
  • Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time
  • Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads
  • Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders
  • Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach customers
  • Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve
What we offer
What we offer
  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
  • Fulltime
Read More
Arrow Right