CrawlJobs Logo

HPC Operations Lead

linuxrecruit.co.uk Logo

Linux Recruit

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

70000.00 - 80000.00 GBP / Year

Job Description:

One of Europe’s most exciting research organisations is on the hunt for a Lead Engineer as part of their HPC team. This is a place where big ideas, brilliant people and cutting-edge technology come together to tackle some of the most important questions. It’s collaborative, ambitious and refreshingly down to earth. In this role, you’ll be the person who keeps the engines running. It’s varied, hands-on and influential in all the right ways. In return, you’ll join an organisation that genuinely values its people. It’s open, supportive and driven by purpose. You’ll be helping to power research that can change lives, alongside people who care deeply about what they do.

Job Responsibility:

  • Take ownership of high-performance compute and large-scale storage platforms
  • Ensure platforms are reliable, responsive, and ready
  • Work closely with researchers and technology teams
  • Oversee the HPC service desk
  • Guide incident response
  • Help shape the future direction of the platforms
  • Design and deliver training
  • Support users
  • Step into a wider leadership role when required

Requirements:

  • Knowledge of HPC environments and large-scale storage
  • Experience leading people and platforms
  • Ability to communicate with clarity and warmth
  • Comfortable juggling priorities and working with different stakeholders
  • Ability to find practical solutions in a fast-moving research setting
  • Experience in science or biomedical research is beneficial
  • Curiosity and a collaborative mindset

Nice to have:

Experience in science or biomedical research

What we offer:
  • Excellent benefits
  • Culture that encourages ideas, learning and teamwork

Additional Information:

Job Posted:
February 14, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for HPC Operations Lead

Lead Solution Architect

Lead Solution Architect role at Hewlett Packard Enterprise focusing on designing...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience
  • Bachelor of Arts/Science or equivalent degree in computer science or related area of study
  • without a degree, 11+ years of relevant professional experience
  • Proficiency in container orchestration platforms (Red Hat OpenShift, SUSE Rancher, CNCF Kubernetes)
  • Experience with GPU-accelerated workloads and tools like NVIDIA GPU Operator and DCGM
  • Ability to integrate Kubernetes with AI/ML workloads and GPU infrastructure in hybrid or private cloud environments
  • Experience architecting HPC clusters including GPU/compute nodes and HPC storage technologies (Lustre, WEKA, Parallel Filesystems)
  • Understanding of high-speed networking (InfiniBand, Mellanox, RoCE)
  • Experience with HPC cluster management tools (HPE Cluster Management, NVIDIA Base Command Manager)
  • Familiarity with HPC workload schedulers (Slurm, Altair PBS Pro)
Job Responsibility
Job Responsibility
  • Design and scope multiple deliverables across multiple technologies
  • Lead team in delivery of multiple deliverables
  • Develop solutions that enhance availability, performance, maintainability and agility of customer's enterprise
  • Contribute to design and application of new tools
  • Re-use existing experience to develop new solutions
  • Understand architectural dependencies of technologies in customer's IT environment
  • Advise, integrate, and accelerate customers' outcomes from digital transformation
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Career development programs
  • Flexible work arrangements
  • Inclusive work environment
  • Fulltime
Read More
Arrow Right

HPC System Software Analyst

Provide technology consulting to external customers and internal project teams. ...
Location
Location
United States , Los Alamos
Salary
Salary:
101900.00 - 234500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Active Department of Energy (DOE) Q Clearance or have held one in the past 3 years
  • if previous clearance, must not foresee a problem with it being reinstated
  • duties require US Citizenship
  • Bachelor’s degree in Computer Science, Engineering, or related area of study
  • 4+ years HPC-related experience, ideally with large-scale HPC and parallel file system administration and support
  • without a degree, three additional years of relevant professional experience (7+ years in total)
  • understanding of a HPC Data Center IT Operations environment
  • expertise in HPC application consulting and support
  • strong system administration skills, particularly in HPC environments
  • extensive knowledge and experience with Linux operating systems (RHEL or SLES)
Job Responsibility
Job Responsibility
  • Provide on-site system administration and HPC application consulting services
  • address and resolve the current top issues in the HPC environment
  • maintain the HPC systems availability to the customer
  • monitor system performance and provide recommendations for improvements
  • collaborate with team members and stakeholders to deliver high-quality support and solutions
  • create and document site procedures, system diagrams, and other configuration or support documents
  • maintain system software and firmware revisions, including patches, updates, and OS upgrades
  • solve system hardware, software, and third-party software issues, and provide detailed and thoughtful analysis of problem and solution
  • gather data, perform analysis, and escalate problems to higher-level product support groups and appropriate management when necessary to ensure timely resolution of system or customer issues
  • provide solutions and implement repair or workarounds when possible, fully documenting steps taken when required
What we offer
What we offer
  • comprehensive suite of benefits that supports their physical, financial and emotional wellbeing
  • specific programs catered to helping you reach any career goals you have
  • unconditionally inclusive in the way we work and celebrate individual uniqueness
  • diverse backgrounds are valued and succeed here
  • Fulltime
Read More
Arrow Right

HPC AI-BU District Service Manager

Responsible for the overall management of a service segment of significant scope...
Location
Location
United States , Memphis
Salary
Salary:
101900.00 - 234500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree preferred or equivalent experience
  • Five to ten years of related experience in customer support in a technical environment with proven managerial abilities
  • 3-10 years of small to medium team (3-25 people) team lead experience as HPC Field Team Lead or similar
  • Experience in a dynamic environment / adaptable to change / receptive to constructive feedback
  • 360-degree relationship and communication to peers, junior staff, leaders, and customers
Job Responsibility
Job Responsibility
  • Work closely with the Site Team Leads to plan, direct, and monitor operational/tactical activities of technical on-site team
  • Manage / coordinate customer escalations, and escalations of technical, process, or materials issues encountered by field team
  • Provide guidance on process improvements and recommend changes in alignment with business tactics and strategy for area of responsibility
  • Responsible for the full understanding of the service contract and associated terms and conditions
  • Proactively identify, report on, and close risks to Service Level Agreement (SLA) or customer satisfaction
  • Meet business and operational targets by managing core site and business metrics - Key Performance Indicators (KPIs)
  • Routine status updates to Services Geo Lead (Director)
  • Establish and manage relationships with customers
  • Establish and maintain close collaborative relationship with the sales account team and stakeholders
  • Regularly visit sites to field teams and customers (approximately 25% travel)
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Software Automation Engineer

Microsoft Silicon Cloud Hardware Infrastructure Engineering (SCHIE) is the team ...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • CoE Leadership & Technical Authority: Own the end-to-end automation strategy for HPC, operational platforms, and Azure integrations. Define reference architectures, standards, and coding methodologies. Serve as the highest-level technical escalation point for automation, reliability, and integration challenges across the org
  • Roadmaps & Standards: Create and maintain multi-year automation roadmaps aligned to business priorities. Establish coding standards, testing strategies, code quality, security baselines, and operational readiness criteria adopted across teams
  • Team Leadership: Build, mentor, and technically lead a software automation team over time. Set hiring bar, role definitions, and career paths
  • coach senior engineers
  • lead by example through hands-on contributions
  • Hands-on Engineering (Principal IC): Architect, design, implement, and operate production-grade automation platforms for HPC infrastructure and cloud services
  • Operational Automation at Scale: Eliminate manual and error-prone work by codifying provisioning, imaging, patching, validation, break/fix, incident response, and self-healing remediation workflows
  • Platform & Service Integrations: Design robust API-first, event-driven, and asynchronous integrations across internal platforms for HPC services, and Azure-native services
  • ETL & Data Engineering: Build and evolve data pipelines that ingest, transform, and validate telemetry, logs, metrics, and operational signals. Enable reliability analysis, capacity forecasting, cost optimization, and executive reporting
  • Azure Automation & Governance: Lead infrastructure-as-code, CI/CD pipelines, identity and access automation (RBAC), policy enforcement, secrets management, and monitoring with security-by-default and compliance-aware practices
  • Fulltime
Read More
Arrow Right

HPC SME

HPE Operations is our innovative IT services organization. It provides the exper...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-12 years of experience with different flavours of Linux like SLES, RHEL and Ubuntu/Debian
  • 5-8 years experience in managing HPC/Linux clusters with good understanding of its architecture
  • Skilled in installation and configuration of various applications on Linux
  • Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration
  • Diagnose and resolve system issues and performance issues
  • Experience in drafting technical SOPs, action plans and knowledge documents
  • Good understanding of different cloud platforms
  • Reinstate integrity of system as quickly as possible following an outage
  • Triage and solve user-submitted tickets
  • Track resource usage using monitoring and queuing software
Job Responsibility
Job Responsibility
  • Review and Validate HPC solutions and Environment through POCs and Benchmarking
  • Architecting and designing HPC solutions tailored to the customer's needs
  • Overseeing solution implementation, integration and testing
  • Diagnose and correct solution issues during the implementation
  • Providing training, documentation and ongoing support
  • Maintain the Life-cycle management of the HPC environment
  • Oversee the team operations and deliverables
  • Lead the team with technical expertise ensure regular technical session and case reviews
  • Demonstrate high level of technical & communication skills under critical situations
  • Takes responsibility for end-to-end problem ownership and its solutions
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC SME

HPE Operations is our innovative IT services organization. It provides the exper...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8 - 12 years of experience different flavours of Linux like SLES, RHEL and Ubuntu/Debian
  • 5 - 8 years Experience in managing HPC/Linux clusters and should have good understanding of its architecture
  • Skilled in installation and configuration of various applications on Linux
  • Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration
  • Diagnose and resolve system issues and performance issues
  • Should have experience in drafting technical SOPs, action plans and knowledge documents
  • Should have good understanding of different cloud platforms
  • Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime
  • Triage and solve user-submitted tickets, especially when they relate to the infrastructure
  • Track resource usage using monitoring and queuing software
Job Responsibility
Job Responsibility
  • Review and Validate HPC solutions and Environment through POCs and Benchmarking
  • Architecting and designing HPC solutions tailored to the customer’s needs
  • Overseeing solution implementation, integration and testing
  • Diagnose and correct solution issues during the implementation
  • Providing training, documentation and ongoing support
  • Maintain the Life-cycle management of the HPC environment
  • Oversee the team operations and deliverables
  • Lead the team with technical expertise ensure regular technical session and case reviews
  • Demonstrate high level of technical & communication skills under critical situations
  • Takes responsibility for end-to-end problem ownership and its solutions
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Read More
Arrow Right

HPC Technical Consultant

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way...
Location
Location
United States
Salary
Salary:
78700.00 - 181200.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Lead and on site team of HPE service engineers
  • Set daily priorities based on the customer case load
  • Interact with and keep the customer up to date on case progress
  • Interpret log files and assist site engineers on case resolution with targeted repairs
  • Work with direct manager on any reports that maybe required for site management
  • Perform repair and maintenance activities on HPC compute, network, and storage hardware
  • Review tickets for hardware actions needed and claim for action
  • Interact with ticket system to document actions taken and pass ticket to next step
  • Complete training on specialized compute hardware, network, and storage components
  • Read system documentation and diagrams to identify specified components within system
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Gcp Devops Hpc Engineer

Location
Location
Spain
Salary
Salary:
70000.00 - 80000.00 EUR / Year
signifytechnology.com Logo
Signify Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years’ experience in HPC environments (SLURM, MPI, parallel workloads)
  • Strong Linux systems expertise in performance-critical environments
  • Hands-on experience running or migrating HPC workloads in the cloud (GCP preferred)
  • Solid experience with Terraform and Ansible
  • Strong scripting skills (Python, Bash)
  • Deep understanding of GCP services (GCE, VPC, Cloud Storage)
Job Responsibility
Job Responsibility
  • Lead end-to-end migrations of SLURM-based HPC clusters from on-prem to GCP
  • Design, build, and operate secure, scalable HPC architectures in the cloud
  • Optimise SLURM scheduling, workload performance, and resource utilisation
  • Automate cluster deployment and operations using Terraform, Ansible, Python, and Bash
  • Manage HPC software stacks using Spack
  • Deploy and support parallel workloads using MPI, OpenMP, and related frameworks
  • Troubleshoot performance issues and drive continuous optimisation
  • Collaborate with engineering teams and stakeholders in a fully remote environment
  • Fulltime
Read More
Arrow Right