CrawlJobs Logo

HPC & AI Software - Engineering Resolution Engineer

United States 71700.00 - 165800.00 USD / Year · Job Posted March 01, 2026
Apply Position
Job Link Share

Job Description

Provides customer support for Workload Manager and Job Scheduler software products including Slurm and PBS Pro and collaborates with software vendors on issues. Responsibilities may include leading customer troubleshooting meetings, system testing, and monitoring vendor bug tickets. Candidate is expected to analyze software issues and provide final resolutions or workarounds when possible. This may also include managing escalations to the next level of engineering and ensuring issues are documented completely and are technically sound.

Job Responsibility

  • Provide technical support for customer reported issues escalated to level 3 support
  • Analyze, reproduce, isolate and resolve issues
  • Escalate unresolved issues internally to HPE R&D or to vendors depending upon isolation
  • Document/communicate throughout the whole process of working an issue until closure
  • Create and review selected customer documentation and notices
  • Perform selected product pre-release testing

Requirements

  • Bachelor's or Master's degree in Computer Science/Engineering, Information Systems, or equivalent
  • Typically 0-2 years experience
  • Knowledge and experience of Linux operating systems and networking
  • Knowledge and experience with programming environments, i.e. C, Fortran, Python, MPI
  • Experience with debugging and performance analysis tools (CPE tools and libraries)
  • Experience with batch scripting
  • Ability to gather data, perform analysis, reproduce and resolve issues or escalate to HPE R&D or external partners, working with them until closure
  • Ability to multi-task and prioritize, switching between working several issues at once
  • Ability to work effectively in a team environment to investigate and resolve complex problems as part of a team
  • Good communication skills, internally within HPE and externally with customers and suppliers, both verbal and written
  • Willing and able to obtain security clearance
  • Must be able to provide after-hours support when necessary

What we offer

  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

HPC & AI Software - Engineering Resolution Engineer

8 matching positions

HPC & AI System Test Engineering Manager

HPC & AI System Test Engineering Manager role at Hewlett Packard Enterprise. Man...
Location
Location
United States , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • First level university degree or equivalent experience required
  • May have advanced university degree
  • Typically 10 or more years of related work experience
  • People management experience
  • Strong leadership skills including coaching, team building, and conflict resolution
  • Advanced project management skills including time and risk management, resource prioritization, and project structuring
  • Ability to manage human capital across geographies
  • Strong analytical and problem-solving skills
  • Excellent understanding of testing methodologies
  • Great understanding of hardware and software interactions
Job Responsibility
Job Responsibility
  • Provides direct and ongoing leadership for a team of individual contributors designing and developing new products, enhancements and updates
  • Manages headcount, deliverables, schedules, and costs for multiple ongoing projects
  • Communicates project status and escalates issues to direct managers, program managers, and development partners
  • Manages relationships with outsourced partners and suppliers
  • Proactively identifies opportunities for process improvement and cost reductions
  • Provides people-care management for assigned team members including hiring, performance plans, coaching, and career development
  • Writes and executes complete testing plans, protocols, and documentation
  • Works with systems engineers and development partners to develop reliable, cost effective and high-quality solutions
  • Collaborates and communicates with management regarding systems design status, project progress, and issue resolution
  • Represents the systems engineering team for all phases of larger development projects
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

HPC & AI Systems Engineer for Integrated Systems Test

HPC & AI Systems Engineer for Integrated Systems Test role at Hewlett Packard En...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in Computer Engineering, Computer Science, Electrical Engineering, Information Systems, or equivalent
  • Minimum 4 years of experience
  • Experience with certification & submission to OS vendors of Linux (RedHat, SLES, Ubuntu, etc.), Windows Server operating systems, Windows Client operating systems, and VMWare (ESXi)
  • Experience installing and working with Linux, Windows and VMWare OSes
  • Experience in programming or scripting languages, Python, PowerShell, Perl, Linux Shell, Java, MySQL, MS SQL Server
  • Understanding of Redfish commands, RESTful API, and JSON format
  • Knowledge of creating and using Docker containers and VMs
  • Experience in configuring Storage (internal/external storage, file systems, and raid/non-raid settings) and Networking devices (iSCSI, FCoE, IPs, VLANs, Bonding, Jumbo Frames, LAGs)
  • Knowledge of networking concepts such as NIC teaming, VLANs, IPv4, IPv6
  • Excellent written and verbal communication skills in English
Job Responsibility
Job Responsibility
  • Work with Program & Product Management, technical leads, and product development teams to obtain product feature requirements
  • Design and implement new test features in existing and new test cases
  • Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
  • Implement software solutions for multiple test programs/projects with internal and outsourced development partners
  • Review and evaluate the implementation and use of test automation and test tools
  • Planning, development, and implementation of software tools for the testing and evaluation of current and next-generation HPE HPC products
  • Debug and analyze issues to a successful resolution
  • Perform testing in local and remote labs
  • Drive appropriate automated test execution to test engineers at various global locations
  • Provide training and guidance to test teams both onshore and offshore
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

QA, Automation, and Software Engineering Manager

Hewlett Packard Enterprise (HPE) is hiring a QA, Automation, and Software Engine...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S or M.S. degree in a related software engineering field
  • prior experience in HPC, AI, or related technical software development
  • prior experience using agile methodologies
  • prior experience managing or developing software in a production software environment
  • 2 to 5 years prior experience managing a technical team in a software related field
  • 2 to 5 years prior experience managing managers
  • prior experience developing and managing software written in C, C++, or Fortran within a Linux environment is highly desirable
  • a technical background in software development, HPC, AI, or related work is highly desirable
  • strong leadership skills, including coaching, team-building, and conflict resolution
  • advanced project management skills including time and risk management, resource prioritization, and project structuring
Job Responsibility
Job Responsibility
  • provides direct and ongoing leadership for a team of QA, Automation, and software engineers
  • mentor, coach, and develop the talent in the team
  • manages headcount, deliverables, schedules, and costs for multiple ongoing projects ensuring that resources are appropriately allocated and that goals, objectives, timelines, and budgets are met in accordance with program and organizational roadmaps
  • communicates project status effectively to stakeholders
  • manages relationships with customers, partners and internal stakeholders
  • sets expectations for deliverables, product quality, schedules, and costs
  • ensures that team members are effectively communicating and collaborating across the organization
  • proactively identifies opportunities for improvements in products and leads innovation efforts
  • provides people-care management for assigned team members, including hiring, setting and monitoring of annual performance plans, coaching, and career development
  • ensures that proper knowledge and career development tools are in place to support ongoing team member and process development
What we offer
What we offer
  • health & wellbeing benefits for team members and their loved ones
  • personal & professional development programs
  • unconditional inclusion
  • Fulltime
Read More
Arrow Right

Sovereign AI Field Application Engineer

We are seeking a Senior Field Application Engineer (FAE) to join the Centre of E...
Location
Location
United Kingdom
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrable hands-on expertise working with either popular AI frameworks and models on GPU
  • Experience leading large technical programs or opportunities
  • Strong systems background. Understands and can quantify the impact of system architecture on performance
  • Strong positive can-do attitude willing to do what is necessary and lead others in the wider FAE team by example. Available to help colleagues
  • Skilled in independently prioritizing opportunities to deliver results on time
  • Excellent verbal and written communication skills
  • Based in Europe ideally EU zone
  • Open to travel both domestic and international, approximately 10-20% over a year. Anticipate a ramp period with increased travel at the start
  • Bachelors' Degree in a technical field (Computer Science, Electrical Engineering, Physics, Mathematics) preferred
Job Responsibility
Job Responsibility
  • Support winning new AI business in national AI and HPC centres. Enabling customers to execute their AI workloads on AMD Instinct GPUs, EPYC CPUs, and AI NICs. Supporting partners in RFP responses by testing requested workloads
  • Owning technical qualification of the customer, partnering with Sales and Business Unit orgs
  • Demonstrate and advise customers and partners through Proof of Concepts, presentations, and training
  • Engineering: execute popular and customer-driven AI inference and training workloads, generate results and create a characteristic understanding of AI performance on AMD hardware. Understand how system and software choices affect performance. Compare performance to our competition
  • Run training and inference performance investigations using common frameworks (Pytorch, Tensorflow, JAX) and using MLperf, Hugging Face etc
  • Build a body of documentation for internal and external dissemination: AMD-internal guides, whitepapers, tuning guides, training collateral
  • Provide onsite training
  • Proactive engagement across AMD teams: GPU Business Unit, Engineering, Architecture, Platform, Software, and Product Development teams providing feedback and leadership from the field on requirements. Gathering missing functionality and working with Engineering to resolve and test
  • Assist in creating Total Cost of Ownership models to aid pricing with bid desk
  • Technically owning and resolving customer and partner issues. Submitting JIRA tickets and driving resolution
  • Fulltime
Read More
Arrow Right

HPC AI Electrical Engineer

Designs, analyzes, develops, modifies and evaluates electrical/electronic parts,...
Location
Location
United States , Spring
Salary
Salary:
92600.00 - 213500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Electrical Engineering
  • Typically 4-6 years experience
  • Experience with lab equipment - oscilloscopes, TDR, power supplies, grounding schemes, logic analyzer, data recorders
  • Linux - how networking/software/firmware/hardware interact
  • Report writing and data analysis and research - comparing measured data to specifications
  • Reading schematics and PCB layout files
  • Understanding of signal integrity and how measurement equipment can affect signal integrity
  • Using electrical design tools and software packages
  • Strong analytical and problem solving skills
  • Designing electronic components, integrated circuitry, and algorithms
Job Responsibility
Job Responsibility
  • Designs engineering solutions for electrical and electronic parts, subsystems, integrated circuitry, and algorithms based on established engineering principles
  • Develops and implements parameters and test plans for new and existing designs, including validation of tolerances, form/fit/function, shock and vibration, electromagnetic interference, safety, reliability, thermal generation, and system power measurements
  • Collaborates and communicates with management, internal, and outsourced development partners regarding design status, project progress, and issue resolution
  • Leads a project team of other electrical hardware engineers and internal and outsourced development partners to develop reliable, cost effective and high quality solutions for moderately-complex products
  • Represents the electrical hardware team for all phases of larger and more- complex development projects
  • Provides guidance and mentoring to less- experienced staff members
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Principal Software Engineer

Microsoft Azure High Performance Computing & AI Engineering (HPC & AI Eng) team ...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python - OR equivalent experience
  • 5+ years hands on experience designing and developing high volume low latency pipelines using products such as AzPubSub, Event Hubs, Azure Stream Analytics, Kafka, Grafana, Event Hubs, Prometheus or equivalent products
  • 3+ years of experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect, design and develop high volume low latency end to end event pipelines that can provide first-to-know-insights on events causing job interrupts and job reliability
  • Conduct analysis of existing event pipelines to evaluate fidelity, granularity and latency of critical events
  • Contribute to improving key metrics such as Job Mean Time to Interrupt, Nodes in Service, Mean Time to Resolve on flagship supercomputers by enabling data scientists and domain experts to use the telemetry to identify events & issues at the intersection of datacenter and hardware, develop hypothesis, conduct A/B tests and synthesize results
  • Partner with cross organizational teams to evaluate available telemetry and latency drive architecture, design, development and deployment of end-to-end solutions to manage core infrastructure including current & next generation datacenter, IT hardware, power & cooling technologies
  • Drive engineering and operational excellence based on issues and learnings from strategic customers on their usage scenarios to improve product features and capabilities
  • Partner with teams on continuous learning and continuous improvement programs by leading the resolution of complex incidents, driving root cause analyses and championing initiatives to minimize future customer impact
  • Fulltime
Read More
Arrow Right

Lead Software Engineer - Cloud

The area of Scientific Computing in Airbus provides engineering with state of th...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
airbus.com Logo
Airbus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Information Technology, or a related field
  • 7-10 years of IT experience with at least 2 years of hands-on experience on the AWS Cloud platform
  • Proficiency in at least one of the following languages: Python or Shell script or TypeScript or Boto3 is mandatory
  • Deep expertise in core AWS services and building data platforms, including OpenSearch, Glue Catalog, EMR, and Redshift
  • Mandatory experience writing Infrastructure as Code (IaC) using CloudFormation or CDK
  • Proficiency with high-performance storage services like Amazon FSx for Lustre and Amazon EFS for shared file systems
  • Hands-on experience designing, developing, and maintaining data pipelines and workflows using an orchestration tool like Apache Airflow, Prefect, or Dagster
  • Solid understanding of security best practices (e.g., IAM Roles, KMS)
  • Previous exposure to designing and maintaining large-scale, cloud-native systems
  • Working knowledge of Agile Scrum, SAFe, or Kanban methodologies
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain scalable and resilient cloud solutions for high-performance computing (HPC) and scientific simulation workloads
  • Optimize compute resources by selecting and managing appropriate Compute-optimized EC2 instances
  • Design and manage high-performance storage solutions using Amazon S3, Amazon EFS, and Amazon FSx
  • Automate cloud environment provisioning by developing and implementing Infrastructure as Code (IaC)
  • Monitor system performance and resource utilization with AWS CloudWatch
  • Monitor and troubleshoot workflow execution and pipeline failures
  • Collaborate with engineering teams to understand their computational needs and provide technical guidance
  • Troubleshoot and resolve complex technical issues related to compute, storage, and networking for scientific simulation jobs
  • Enforce and maintain security best practices, ensuring compliance with data governance and security policies
  • Contribute to the full development lifecycle within a self-sufficient, multi-disciplinary team
  • Fulltime
Read More
Arrow Right

HPC Computational Scientist / Engineer

AMD’s Software and Solutions Team is seeking a HPC and AI Computational Scientis...
Location
Location
France , Paris
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • PhD in Computer Science, Computational Physics, Engineering or related subjects, or equivalent experience
  • At least 3-5 years of relevant HPC and/or AI experience in research or industry
  • Strong experience in scientific computing disciplines, distributed-memory parallel programming, HPC profiling tools, GPU acceleration technologies, and HPC application modernisation
  • Must be proficient in French and/or English communications
Job Responsibility
Job Responsibility
  • Develop, port, and optimize high-performance computing software and applications for use on AMD hardware
  • Work with other members of the COE team to collaboratively solve issues
  • Represent AMD to the customer and other third parties, and act as the customer advocate when presenting to AMD audiences
  • Engage with AMD product groups to drive resolution of customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
  • Fulltime
Read More
Arrow Right