CrawlJobs Logo

HPC Systems Engineer

openai.com Logo

OpenAI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Consumer Products Infrastructure team builds and operates the high-performance computing platforms that support product design, simulation, and validation across OpenAI’s consumer-facing hardware and software efforts. We partner closely with product engineers, designers, and applied scientists to ensure fast iteration cycles, predictable performance, and highly available systems that directly impact shipped products. We are seeking an experienced HPC Systems Engineer to design, scale, and operate large-scale HPC environments that power simulation-driven product development. These clusters support 1,000+ compute nodes per environment and are used heavily for engineering simulations such as FEA, CFD, multiphysics, and optimization that inform real-world product decisions. In this role, you will work directly with consumer product and simulation engineers to optimize scheduling, reduce iteration latency, ensure license availability, and maintain highly reliable on-prem and hybrid-cloud HPC systems that accelerate product timelines.

Job Responsibility:

  • Architect, deploy, and operate large-scale HPC clusters (1,000+ nodes) supporting simulation workloads critical to consumer product development
  • Optimize workload management using NC, IBM/Platform LSF, and Slurm, with a focus on throughput, fairness, and minimizing queue wait times for product teams
  • Design and implement strategies for workload balancing, cluster federation, and multi-scheduler environments that support diverse product workflows
  • Partner closely with product design, mechanical, electrical, and simulation engineers to debug jobs, improve parallel scaling, and accelerate design-to-validation cycles
  • Administer and harden Linux-based HPC systems (RHEL, Rocky Linux, AlmaLinux), including patching, kernel tuning, and performance optimization
  • Operate and optimize software licensing infrastructure (FlexLM, DSLS, LUM, RLM) to maximize utilization and prevent license-related development bottlenecks
  • Deploy and manage Azure CycleCloud and/or TotalCAE to enable elastic capacity, cloud bursting, and hybrid HPC workflows during peak product development cycles
  • Configure and tune high-speed interconnects, including InfiniBand (HDR/EDR/FDR), to support low-latency, tightly coupled simulation workloads
  • Design and maintain high-performance storage systems (NFS, DFS, Lustre, GPFS / Spectrum Scale, BeeGFS, Azure NetApp) optimized for simulation I/O patterns
  • Build automation and internal tooling using Python and Bash to streamline provisioning, monitoring, diagnostics, and job submission workflows
  • Implement monitoring, alerting, and capacity planning systems (Grafana/Prometheus, Slurm accounting, LSF monitoring) to ensure predictable performance for product teams
  • Ensure high availability and resiliency across globally distributed HPC clusters and hybrid cloud environments
  • Manage authentication, networking, and secure access (LDAP/AD integration, networking, VPNs)
  • Produce clear documentation on cluster architecture, policies, and best practices tailored to consumer product engineering workflows

Requirements:

  • 7+ years of experience designing and operating large-scale HPC clusters (1,000+ nodes)
  • Deep expertise with NC, IBM/Platform LSF, and Slurm workload managers
  • Strong Linux system administration experience (RHEL-family preferred)
  • Hands-on experience with MPI, parallel scaling, and performance tuning for simulation workloads
  • Experience using Azure CycleCloud to provision and manage HPC clusters in hybrid cloud environments
  • Proven experience operating InfiniBand or other high-speed interconnects
  • Strong Python and Bash skills for automation, tooling, and workflow optimization
  • Experience with distributed filesystems (NFS, DFS, Lustre, GPFS, BeeGFS)
  • Deep familiarity with HPC licensing systems (FlexLM, DSLS, RLM, LUM)
  • Experience supporting product-oriented engineering or simulation teams
  • Strong networking fundamentals (DNS, DHCP, VLANs, routing, security)

Nice to have:

  • Experience supporting simulation-driven consumer product development (hardware, devices, or integrated systems)
  • Familiarity with containerized HPC workflows (Apptainer/Singularity)
  • Exposure to infrastructure-as-code tools such as Terraform or Ansible
  • Experience with HPC profiling and performance analysis tools (VTune, mpiP, TAU, Nsight)
  • Background in consumer electronics, product design, automotive, aerospace, or similar environments
  • Strong cross-functional communication skills and a collaborative working style
  • High-ownership mindset with comfort operating in fast-moving, product-critical environments
  • Strong debugging instincts and attention to operational detail
  • Clear documentation habits and a user-centric approach to infrastructure design
What we offer:
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for HPC Systems Engineer

HPC & AI System Test Engineering Manager

The HPC Integrated Systems Test (IST) team is seeking a Systems Engineering Mana...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically, 5 or more years of related work experience, including 0 -2 years of people management experience
  • Strong leadership skills, including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Strong analytical and problem-solving skills
  • Ability to manage human capital across geographies to drive workforce development and achieve desired results
  • Strong verbal and written communication skills, including negotiation, presentation, and influence skills
  • Advanced business acumen, technical knowledge, and extensive knowledge in applications and technologies
  • Strong multi-tasking and prioritization skills
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors testing and validating new products, enhancements and updates
  • Coordinates projects for systems software, including operating systems, networking, utilities, and Internet-related tools
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and internal and external development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions opportunities
  • Provides people-care management for assigned team members, including hiring, setting and monitoring of annual performance plans, coaching, and career development
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

HPC & AI Systems Engineer for Integrated Systems Test

HPC & AI Systems Engineer for Integrated Systems Test role at Hewlett Packard En...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in Computer Engineering, Computer Science, Electrical Engineering, Information Systems, or equivalent
  • Minimum 4 years of experience
  • Experience with certification & submission to OS vendors of Linux (RedHat, SLES, Ubuntu, etc.), Windows Server operating systems, Windows Client operating systems, and VMWare (ESXi)
  • Experience installing and working with Linux, Windows and VMWare OSes
  • Experience in programming or scripting languages, Python, PowerShell, Perl, Linux Shell, Java, MySQL, MS SQL Server
  • Understanding of Redfish commands, RESTful API, and JSON format
  • Knowledge of creating and using Docker containers and VMs
  • Experience in configuring Storage (internal/external storage, file systems, and raid/non-raid settings) and Networking devices (iSCSI, FCoE, IPs, VLANs, Bonding, Jumbo Frames, LAGs)
  • Knowledge of networking concepts such as NIC teaming, VLANs, IPv4, IPv6
  • Excellent written and verbal communication skills in English
Job Responsibility
Job Responsibility
  • Work with Program & Product Management, technical leads, and product development teams to obtain product feature requirements
  • Design and implement new test features in existing and new test cases
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Implement software solutions for multiple test programs/projects with internal and outsourced development partners
  • Review and evaluate the implementation and use of test automation and test tools
  • Planning, development, and implementation of software tools for the testing and evaluation of current and next-generation HPE HPC products
  • Debug and analyze issues to a successful resolution
  • Perform testing in local and remote labs
  • Drive appropriate automated test execution to test engineers at various global locations
  • Provide training and guidance to test teams both onshore and offshore
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineering Manager

HPC & AI System Test Engineering Manager role at Hewlett Packard Enterprise. Man...
Location
Location
United States , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically 10 or more years of related work experience
  • People management experience
  • Strong leadership skills including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Ability to manage human capital across geographies
  • Strong analytical and problem-solving skills
  • Excellent understanding of testing methodologies
  • Great understanding of hardware and software interactions
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors designing and developing new products, enhancements and updates
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions
  • Provides people-care management for assigned team members including hiring, performance plans, coaching, and career development
  • Writes and executes complete testing plans, protocols, and documentation
  • Works with systems engineers and development partners to develop reliable, cost effective and high-quality solutions
  • Collaborates and communicates with management regarding systems design status, project progress, and issue resolution
  • Represents the systems engineering team for all phases of larger development projects
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineering Manager

Manages a team of systems engineers for high-performance computing (HPC) server ...
Location
Location
United States , Chippewa Falls
Salary
Salary:
137000.00 - 315000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically 5 or more years of related work experience, including 0-2 years of people management experience
  • Strong leadership skills, including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Strong analytical and problem-solving skills
  • Ability to manage human capital across geographies to drive workforce development and achieve desired results
  • Strong verbal and written communication skills, including negotiation, presentation, and influence skills
  • Advanced business acumen, technical knowledge, and extensive knowledge in applications and technologies
  • Strong multi-tasking and prioritization skills
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors testing and validating new products, enhancements and updates
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and internal and external development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions opportunities
  • Provides people-care management for assigned team members, including hiring, setting and monitoring of annual performance plans, coaching, and career development
  • Coordinates with third-party product vendors and engineering managers to track development issues and implement solutions
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineer

Our organization includes high-performance computing (HPC) server platforms, net...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Systems Engineering, or equivalent
  • Typically 4-6 years experience
  • Possess experience with XD, Apollo, Industry Standard Server, Storage, and Networking products
  • Have experience with Linux Operating Systems (OS) such as Ubuntu, RHEL and SUSE
  • Excellent understanding of testing methodologies
  • Excellent understanding of hardware and software interactions
  • Excellent analytical and problem-solving skills
  • Experience in the overall architecture of software and hardware for products and solutions
  • Strong analytical and problem solving skills
  • Knowledge of a programming or scripting language (Python, Perl, Linux Shell)
Job Responsibility
Job Responsibility
  • Work with Program & Product Management teams to understand test requirements
  • Debug and troubleshoot issues with various teams
  • Work with cross-functional teams to deliver quality HPC systems
  • Work with 3rd party product vendors and engineering teams to track development issues and solutions
  • Demonstrate the ability to effectively manage diverse test tasks and priorities in a fast-paced fluid environment
  • Effectively respond to changing program requirements, changes to product test plans and compressed schedules while meeting program development requirements
  • Work with product development teams to understand new product features required for test programs/projects, work with technical leads and testers to design and develop appropriate test plans
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineer

The HPC Integrated Systems Test (IST) team is seeking early career and new gradu...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Systems, or equivalent
  • 0-4 years experience
  • Experience with Industry Standard Server, Storage, and Networking products
  • Experience with Linux Operating Systems (OS) such as Ubuntu, RHEL and SUSE
  • Understanding of testing methodologies
  • Understanding of hardware and software interactions
  • Analytical and problem-solving skills
  • Ability to perform testing in local and remote labs
  • Experience in the overall architecture of software and hardware for products and solutions
  • Knowledge of a programming or scripting language (Python, Perl, Linux Shell)
Job Responsibility
Job Responsibility
  • Work with IST technical leads and program managers to understand test requirements, design and develop appropriate test plans, execute test plans, debug and troubleshoot issues
  • Work with various cross-functional teams and the product development teams to understand new product features required for test programs/projects to deliver quality HPC systems
  • Work with 3rd party product vendors and engineering teams to track development issues and solutions
  • Demonstrate the ability to effectively manage diverse test tasks and priorities in a fast-paced fluid environment
  • Effectively respond to changing program requirements, changes to product test plans and compressed schedules while meeting program development requirements
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Review and evaluate the implementation and use of test automation and test tools
  • Ensure development issues are resolved in a cost-effective, efficient, and timely manner.
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Specific programs for career development
  • Unconditional inclusiveness aligned with individual uniqueness.
  • Fulltime
Read More
Arrow Right

HPC & AI System Test Engineer

Hewlett Packard Enterprise is seeking early career professionals with a backgrou...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Systems, or equivalent
  • Typically 0-4 years experience
  • Possess experience with Industry Standard Server, Storage, and Networking products
  • Have experience with Linux Operating Systems (OS) such as Ubuntu, RHEL and SUSE
  • Have an understanding of testing methodologies
  • Have an understanding of hardware and software interactions
  • Have an analytical and problem-solving skills
  • Perform testing in local and remote labs
  • Experience in the overall architecture of software and hardware for products and solutions
  • Strong analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Work with IST technical leads and program managers to understand test requirements, design and develop appropriate test plans, execute test plans, debug and troubleshoot issues
  • Work with various cross-functional teams and the product development teams to understand new product features required for test programs/projects to deliver quality HPC systems
  • Work with 3rd party product vendors and engineering teams to track development issues and solutions
  • Demonstrate the ability to effectively manage diverse test tasks and priorities in a fast-paced fluid environment
  • Effectively respond to changing program requirements, changes to product test plans and compressed schedules while meeting program development requirements
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Review and evaluate the implementation and use of test automation and test tools
  • Ensure development issues are resolved in a cost-effective, efficient, and timely manner
  • Debug and analyze issues to a successful resolution
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Cpe and Uss Systems Integration and Validation Engineer

Hewlett Packard Enterprise is seeking a Systems Integration and Validation Engin...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S or M.S. degree in a related software engineering field is required
  • 5+ years prior experience in system integration, validation, or systems engineering
  • Deep knowledge of Linux, HPC environments, and distributed system architectures is required
  • Experience with AI is a plus
Job Responsibility
Job Responsibility
  • Perform system-level testing and validation
  • Create, evolve, and maintain system-level integration and validation plans
  • Drive and own the evolution of system-level testing and validation approaches within a HPC and AI environment
  • Review engineering requirements, test plans, and test cases to create and maintain integration test and validation methodologies
  • Provide detailed integration test and validation documentation to stakeholders
  • Ensure overall system-level quality, performance, and stability
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Specific programs catered to helping you reach career goals
  • Inclusive work environment
  • Fulltime
Read More
Arrow Right