CrawlJobs Logo

HPC Systems Engineer

United States, San Francisco · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

The Consumer Products Infrastructure team builds and operates the high-performance computing platforms that support product design, simulation, and validation across OpenAI’s consumer-facing hardware and software efforts. We partner closely with product engineers, designers, and applied scientists to ensure fast iteration cycles, predictable performance, and highly available systems that directly impact shipped products. We are seeking an experienced HPC Systems Engineer to design, scale, and operate large-scale HPC environments that power simulation-driven product development. These clusters support 1,000+ compute nodes per environment and are used heavily for engineering simulations such as FEA, CFD, multiphysics, and optimization that inform real-world product decisions. In this role, you will work directly with consumer product and simulation engineers to optimize scheduling, reduce iteration latency, ensure license availability, and maintain highly reliable on-prem and hybrid-cloud HPC systems that accelerate product timelines.

Job Responsibility

  • Architect, deploy, and operate large-scale HPC clusters (1,000+ nodes) supporting simulation workloads critical to consumer product development
  • Optimize workload management using NC, IBM/Platform LSF, and Slurm, with a focus on throughput, fairness, and minimizing queue wait times for product teams
  • Design and implement strategies for workload balancing, cluster federation, and multi-scheduler environments that support diverse product workflows
  • Partner closely with product design, mechanical, electrical, and simulation engineers to debug jobs, improve parallel scaling, and accelerate design-to-validation cycles
  • Administer and harden Linux-based HPC systems (RHEL, Rocky Linux, AlmaLinux), including patching, kernel tuning, and performance optimization
  • Operate and optimize software licensing infrastructure (FlexLM, DSLS, LUM, RLM) to maximize utilization and prevent license-related development bottlenecks
  • Deploy and manage Azure CycleCloud and/or TotalCAE to enable elastic capacity, cloud bursting, and hybrid HPC workflows during peak product development cycles
  • Configure and tune high-speed interconnects, including InfiniBand (HDR/EDR/FDR), to support low-latency, tightly coupled simulation workloads
  • Design and maintain high-performance storage systems (NFS, DFS, Lustre, GPFS / Spectrum Scale, BeeGFS, Azure NetApp) optimized for simulation I/O patterns
  • Build automation and internal tooling using Python and Bash to streamline provisioning, monitoring, diagnostics, and job submission workflows
  • Implement monitoring, alerting, and capacity planning systems (Grafana/Prometheus, Slurm accounting, LSF monitoring) to ensure predictable performance for product teams
  • Ensure high availability and resiliency across globally distributed HPC clusters and hybrid cloud environments
  • Manage authentication, networking, and secure access (LDAP/AD integration, networking, VPNs)
  • Produce clear documentation on cluster architecture, policies, and best practices tailored to consumer product engineering workflows

Requirements

  • 7+ years of experience designing and operating large-scale HPC clusters (1,000+ nodes)
  • Deep expertise with NC, IBM/Platform LSF, and Slurm workload managers
  • Strong Linux system administration experience (RHEL-family preferred)
  • Hands-on experience with MPI, parallel scaling, and performance tuning for simulation workloads
  • Experience using Azure CycleCloud to provision and manage HPC clusters in hybrid cloud environments
  • Proven experience operating InfiniBand or other high-speed interconnects
  • Strong Python and Bash skills for automation, tooling, and workflow optimization
  • Experience with distributed filesystems (NFS, DFS, Lustre, GPFS, BeeGFS)
  • Deep familiarity with HPC licensing systems (FlexLM, DSLS, RLM, LUM)
  • Experience supporting product-oriented engineering or simulation teams
  • Strong networking fundamentals (DNS, DHCP, VLANs, routing, security)

Nice to have

  • Experience supporting simulation-driven consumer product development (hardware, devices, or integrated systems)
  • Familiarity with containerized HPC workflows (Apptainer/Singularity)
  • Exposure to infrastructure-as-code tools such as Terraform or Ansible
  • Experience with HPC profiling and performance analysis tools (VTune, mpiP, TAU, Nsight)
  • Background in consumer electronics, product design, automotive, aerospace, or similar environments
  • Strong cross-functional communication skills and a collaborative working style
  • High-ownership mindset with comfort operating in fast-moving, product-critical environments
  • Strong debugging instincts and attention to operational detail
  • Clear documentation habits and a user-centric approach to infrastructure design

What we offer

  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

HPC Systems Engineer

8 matching positions

HPC & AI Systems Engineer for Integrated Systems Test

HPC & AI Systems Engineer for Integrated Systems Test role at Hewlett Packard En...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in Computer Engineering, Computer Science, Electrical Engineering, Information Systems, or equivalent
  • Minimum 4 years of experience
  • Experience with certification & submission to OS vendors of Linux (RedHat, SLES, Ubuntu, etc.), Windows Server operating systems, Windows Client operating systems, and VMWare (ESXi)
  • Experience installing and working with Linux, Windows and VMWare OSes
  • Experience in programming or scripting languages, Python, PowerShell, Perl, Linux Shell, Java, MySQL, MS SQL Server
  • Understanding of Redfish commands, RESTful API, and JSON format
  • Knowledge of creating and using Docker containers and VMs
  • Experience in configuring Storage (internal/external storage, file systems, and raid/non-raid settings) and Networking devices (iSCSI, FCoE, IPs, VLANs, Bonding, Jumbo Frames, LAGs)
  • Knowledge of networking concepts such as NIC teaming, VLANs, IPv4, IPv6
  • Excellent written and verbal communication skills in English
Job Responsibility
Job Responsibility
  • Work with Program & Product Management, technical leads, and product development teams to obtain product feature requirements
  • Design and implement new test features in existing and new test cases
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Implement software solutions for multiple test programs/projects with internal and outsourced development partners
  • Review and evaluate the implementation and use of test automation and test tools
  • Planning, development, and implementation of software tools for the testing and evaluation of current and next-generation HPE HPC products
  • Debug and analyze issues to a successful resolution
  • Perform testing in local and remote labs
  • Drive appropriate automated test execution to test engineers at various global locations
  • Provide training and guidance to test teams both onshore and offshore
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineering Manager

HPC & AI System Test Engineering Manager role at Hewlett Packard Enterprise. Man...
Location
Location
United States , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically 10 or more years of related work experience
  • People management experience
  • Strong leadership skills including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Ability to manage human capital across geographies
  • Strong analytical and problem-solving skills
  • Excellent understanding of testing methodologies
  • Great understanding of hardware and software interactions
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors designing and developing new products, enhancements and updates
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions
  • Provides people-care management for assigned team members including hiring, performance plans, coaching, and career development
  • Writes and executes complete testing plans, protocols, and documentation
  • Works with systems engineers and development partners to develop reliable, cost effective and high-quality solutions
  • Collaborates and communicates with management regarding systems design status, project progress, and issue resolution
  • Represents the systems engineering team for all phases of larger development projects
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineering Manager

Manages a team of systems engineers for high-performance computing (HPC) server ...
Location
Location
United States , Chippewa Falls
Salary
Salary:
137000.00 - 315000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically 5 or more years of related work experience, including 0-2 years of people management experience
  • Strong leadership skills, including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Strong analytical and problem-solving skills
  • Ability to manage human capital across geographies to drive workforce development and achieve desired results
  • Strong verbal and written communication skills, including negotiation, presentation, and influence skills
  • Advanced business acumen, technical knowledge, and extensive knowledge in applications and technologies
  • Strong multi-tasking and prioritization skills
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors testing and validating new products, enhancements and updates
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and internal and external development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions opportunities
  • Provides people-care management for assigned team members, including hiring, setting and monitoring of annual performance plans, coaching, and career development
  • Coordinates with third-party product vendors and engineering managers to track development issues and implement solutions
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right

Cpe and Uss Systems Integration and Validation Engineer

Hewlett Packard Enterprise is seeking a Systems Integration and Validation Engin...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S or M.S. degree in a related software engineering field is required
  • 5+ years prior experience in system integration, validation, or systems engineering
  • Deep knowledge of Linux, HPC environments, and distributed system architectures is required
  • Experience with AI is a plus
Job Responsibility
Job Responsibility
  • Perform system-level testing and validation
  • Create, evolve, and maintain system-level integration and validation plans
  • Drive and own the evolution of system-level testing and validation approaches within a HPC and AI environment
  • Review engineering requirements, test plans, and test cases to create and maintain integration test and validation methodologies
  • Provide detailed integration test and validation documentation to stakeholders
  • Ensure overall system-level quality, performance, and stability
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Specific programs catered to helping you reach career goals
  • Inclusive work environment
  • Fulltime
Read More
Arrow Right

HPC Systems/Software Engineer

HPC Systems/Software Engineer needs to understand cluster concepts and required ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Systems, or equivalent
  • Typically 6+ years experience
  • Expertise in multiple software systems design tools and languages
  • Strong analytical and problem solving skills
  • Designing software systems running on multiple platform types
  • Should have very good systems knowledge including hardware, firmware and Operating System
  • Linux systems knowledge with Python and other languages
  • Good understanding of Network boot technologies (PXE or gPXE/Etherboot etc)
  • Storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
  • Exposure to Opensource community and software
Job Responsibility
Job Responsibility
  • Designs enhancements, updates, and programming changes for portions and subsystems of systems software
  • Analyzes design and determines coding, programming, and integration activities required
  • Writes and executes complete testing plans, protocols, and documentation
  • Leads a project team of other software systems engineers
  • Collaborates and communicates with management and development partners
  • Represents the software systems engineering team for all phases of development projects
  • Provides guidance and mentoring to less-experienced staff members
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right

Senior Linux System Administrator - Support Engineer

Senior Linux System Administrator/System Support Engineer with expertise support...
Location
Location
Australia , Canberra
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent work experience
  • At least 5 years of hands-on experience managing Linux systems in production environments, including HPC systems
  • Expertise in Linux/Unix operating systems, parallel file systems (Lustre, GPFS), and networking technologies
  • Proficiency in scripting/programming languages (Bash, Python, Perl, C++)
  • Experience with automation/configuration management tools (Ansible, Puppet, Chef, Terraform)
  • Strong understanding of networking concepts (TCP/IP, DNS, DHCP, firewalls, VPNs)
  • Familiarity with monitoring/logging tools (Nagios, Grafana, ELK Stack)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Excellent problem-solving, analytical, and communication skills
  • Demonstrated ability to work independently in multi-technology environments and collaborate across teams
Job Responsibility
Job Responsibility
  • Deploy, configure, maintain, and troubleshoot Linux servers (Red Hat, CentOS, Ubuntu, or others) across physical, virtual, and cloud environments
  • Support, maintain, and optimize HPC systems, including installation, servicing, and advanced technical troubleshooting of hardware/software and parallel file systems
  • Monitor system performance, availability, and security using industry-standard tools and practices
  • Plan and execute upgrades, patches, enhancements, and migrations to ensure systems are current, secure, and optimized
  • Automate system administration tasks using scripting languages and configuration management tools
  • Implement and maintain backup/recovery strategies, disaster recovery plans, and system documentation
  • Collaborate with development, network, and security teams to support application deployments and troubleshoot issues
  • Provide technical consulting, mentoring, and guidance to junior team members
  • Ensure compliance with strict security protocols in sensitive environments
  • Participate in on-call rotation and respond to system incidents and outages
What we offer
What we offer
  • Competitive salary and performance-based bonuses
  • Comprehensive health, dental, and vision insurance
  • Retirement plan options
  • Paid time off and holidays
  • Professional development opportunities
  • Flexible work arrangements
  • Fulltime
Read More
Arrow Right

Hpc Engineer

We are currently looking for an experienced HPC Specialist to join a small but g...
Location
Location
United Kingdom , London
Salary
Salary:
50000.00 - 55000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with HPC environments, particularly GPU clusters
  • Familiarity with hybrid infrastructures (on-prem and cloud – AWS preferred)
  • Knowledge of DevOps tools, automation, and Infrastructure-as-Code
  • Ability to work collaboratively with researchers and technical colleagues
  • A proactive, problem-solving mindset with a desire to innovate
Job Responsibility
Job Responsibility
  • Manage and optimise GPU clusters (including the latest H200 hardware)
  • Build and maintain hybrid HPC environments (on-prem + AWS)
  • Implement Infrastructure-as-Code and DevOps tooling to drive scalability
  • Support researchers with access to high-performance compute resources for AI, machine learning, and large-scale data projects
  • Help shape the development of a new advanced HPC system from the ground up
  • Fulltime
Read More
Arrow Right

Federal HPC Linux System Administrator

HPE is seeking a passionate and skilled Linux Systems Administrator to provide s...
Location
Location
United States , Utah, near Salt Lake City
Salary
Salary:
101900.00 - 234500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • TS/SCI with Poly clearance REQUIRED
  • US citizenship is required
  • Bachelor's degree in Computer Science, Engineering, or related area of study OR equivalent work experience
  • 3+ years' HPC-related experience, ideally with large-scale HPC and parallel file system administration and support
  • Linux operating systems (RHEL or SLES), workload management systems, parallel file systems, networking and security
  • Technical skills to investigate and resolve complex problems
  • Ability to maintain system software, utilizing debugging tools for problem isolation
  • Possess the organizational and analytical skills needed to effectively isolate both hardware and software problems and drive solutions through to conclusion
  • Able to clearly document processes and procedures with a focus toward mentoring and knowledge sharing
Job Responsibility
Job Responsibility
  • Work as an active member of the HPE account team
  • Answer customer inquiries concerning system software versions, product lifecycles, new releases, and third-party applications
  • Maintain the Linux system availability to the customer
  • Create and document site procedures, system diagrams, and other configuration or support documents
  • Maintain system software and firmware revisions, including patches, updates, and OS upgrades
  • Solve system hardware, software, and third-party software issues
  • Gather data, perform analysis, and escalate problems to higher-level product support groups
  • Implement solutions, repairs and workarounds
  • Document and share troubleshooting techniques, new ideas, and utilities
  • Manage software issues for both the system and user applications
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right