CrawlJobs Logo

HPC Engineer

India, Chennai · Job Posted April 27, 2026
Apply Position
Job Link Share

Job Responsibility

  • Design, implementation & support of high-performance compute clusters
  • Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures
  • Apply their attention to detail to generate HW BOMs for the HCP Clusters, provide vendor management and oversee HW release activities
  • Use their strong skills with the Linux OS to configure appropriate operating systems for the HPC system
  • Understand and assemble the project specifications and performance requirements at the subsystem and system levels
  • Adhere and drive to project timelines to insure program achievements complete on time
  • Support design and release of new products to manufacturing and ultimately the customer, providing quality golden images, procedures, scripts and documentation to the manufacturing team and customer support team
  • Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
  • Experience of crafting and maintaining robust storage
  • Strong HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas
  • Experience in System-D, Net boot/PXE, Linux HA
  • Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP
  • Ability to code and develop Shell and Python scripts
  • Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc)

Requirements

  • Experience in designing, implementing, and supporting high-performance computing (HPC) clusters with strong knowledge of CPU/GPU architecture, scalable storage, interconnects, and cloud-based systems
  • Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures
  • Apply their attention to detail to generate HW BOMs for the HCP Clusters, provide vendor management and oversee HW release activities
  • Use their strong skills with the Linux OS to configure appropriate operating systems for the HPC system
  • Understand and assemble the project specifications and performance requirements at the subsystem and system levels
  • Adhere and drive to project timelines to insure program achievements complete on time
  • Support design and release of new products to manufacturing and ultimately the customer, providing quality golden images, procedures, scripts and documentation to the manufacturing team and customer support team
  • Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
  • Experience of crafting and maintaining robust storage
  • Strong HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas
  • Experience in System-D, Net boot/PXE, Linux HA
  • Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP
  • Ability to code and develop Shell and Python scripts
  • Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc)
  • 8-10 years

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

HPC Engineer

8 matching positions

HPC Engineer

Join an internationally renowned institute as it establishes a new High Performa...
Location
Location
United Kingdom , London
Salary
Salary:
45000.00 - 52500.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Solid foundation in HPC
  • Two to three years of production experience
  • Confidence to work with Linux systems, clusters, and automation technology
  • Exposure to parallel computing frameworks
  • Exposure to GPU acceleration using CUDA or OpenCL
  • Experience with workload schedulers such as Slurm, PBS, or HTCondor
  • Familiarity with InfiniBand and hybrid cloud integration, particularly AWS
Job Responsibility
Job Responsibility
  • Designing, scaling, and optimising a hybrid HPC environment that blends on-premise GPU systems with the power of AWS cloud
  • Driving performance, resilience, and innovation across both technology and the wider organisation
  • Establishing the engineering that will power the next generation of the organisation
What we offer
What we offer
  • Training and access to senior engineers
  • Fulltime
Read More
Arrow Right

Hpc Engineer

We are currently looking for an experienced HPC Specialist to join a small but g...
Location
Location
United Kingdom , London
Salary
Salary:
50000.00 - 55000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with HPC environments, particularly GPU clusters
  • Familiarity with hybrid infrastructures (on-prem and cloud – AWS preferred)
  • Knowledge of DevOps tools, automation, and Infrastructure-as-Code
  • Ability to work collaboratively with researchers and technical colleagues
  • A proactive, problem-solving mindset with a desire to innovate
Job Responsibility
Job Responsibility
  • Manage and optimise GPU clusters (including the latest H200 hardware)
  • Build and maintain hybrid HPC environments (on-prem + AWS)
  • Implement Infrastructure-as-Code and DevOps tooling to drive scalability
  • Support researchers with access to high-performance compute resources for AI, machine learning, and large-scale data projects
  • Help shape the development of a new advanced HPC system from the ground up
  • Fulltime
Read More
Arrow Right

Senior Distributed Systems Engineer (HPC Platform)

We are looking for a Senior Distributed Systems Engineer to design and build cor...
Location
Location
European Union
Salary
Salary:
Not provided
itransition.com Logo
Itransition
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in backend development with Rust
  • Solid understanding of distributed systems architecture
  • Hands-on experience with message queues (e.g., Apache Pulsar, RabbitMQ)
  • Experience designing and building gRPC-based APIs / service-oriented architectures
  • Experience with AWS or similar cloud platforms
  • Strong problem-solving skills and ability to work with complex systems
Job Responsibility
Job Responsibility
  • design and build core backend services for a high-performance distributed computing platform
  • develop resilient, high-throughput infrastructure that orchestrates workloads across CPU and GPU nodes
What we offer
What we offer
  • Projects for such clients as PayPal, Wargaming, Xerox, Philips, Adidas and Toyota
  • Competitive compensation that depends on your qualification and skills
  • Career development system with clear skill qualifications
  • Flexible working hours aligned to your schedule
  • Options to work remotely
  • Corporate medical insurance covering services of private and public medical centers
  • English courses online
  • Corporate parties and events for employees and their children
  • Internal conferences, workshops and meetups for learning and experience sharing
  • Gym membership compensation
Read More
Arrow Right

Senior Field Application Engineer - HPC

We are seeking a Senior Field Application Engineer to join our HPC and AI Centre...
Location
Location
United Kingdom
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Clear track record of technical execution on HPC opportunities
  • Expertise in HPC application performance testing on CPU (and ideally GPU)
  • Strong technical program management skills
  • Ability to independently prioritize opportunities to deliver results on time
  • Excellent verbal and written communication skills
  • Good level of business English
  • Strong positive can-do attitude
  • Open to travel domestic and international, approximately 10-25%
  • Bachelors' Degree in a technical field (Computer Science, Electrical Engineering, Physics, Mathematics) preferred
Job Responsibility
Job Responsibility
  • Lead HPC technical engagement in key customer accounts
  • Perform Proof of Concepts and performance testing applications
  • Hardware and Toolchain performance debug, testing, characterisation and comparison
  • Providing CPU recommendations based on generated performance data
  • Liaising with partners on their HPC testing
  • Pitching and demonstrating EPYC and Instinct for HPC
  • Assist in creating TCO models to assist pricing with bid desk
  • Characterising applications on EPYC and Instinct, documenting results
  • Perform testing on early CPU samples
  • Create necessary training material
  • Fulltime
Read More
Arrow Right

Staff Flight Sciences Software and HPC Engineer

Archer is an aerospace company based in San Jose, California building an all-ele...
Location
Location
United States , San Jose
Salary
Salary:
162800.00 - 217600.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's or Ph.D. in Aerospace Engineering, Mechanical Engineering, Computational Engineering, or a related field
  • 5+ years of experience as a user and developer of scientific/engineering software for flight sciences or similar disciplines (such as aerodynamics, acoustics, control, loads, thermal analysis, mass properties, vehicle simulation, etc.) in a fast-moving environment
  • Demonstrated experience in developing computing software and infrastructure, with proficiency in the scientific Python ecosystem (NumPy, SciPy, Pandas, Scikit-learn, TensorFlow/PyTorch, VTK)
  • Demonstrated experience in standard best practices in software development, including version control, CI/CD, software testing, environment management
  • Demonstrated experience with the design and administration of HPC systems, either on-premise or cloud (AWS preferred). Knowledge of Linux administration, high speed network interconnects, parallel file systems, and MPI required
  • Experience with HPC management software (Slurm/PBS/Torque, OpenHPC/Bright, Warewulf/XCat, Spack/EasyBuild, Lmod)
  • Good understanding of enterprise IT and common network security practices
  • Excellent problem-solving skills and ability to work collaboratively in a team environment
Job Responsibility
Job Responsibility
  • Design, implement, and maintain internal software libraries and applications as well as computing infrastructure to enable engineers to solve problems faster and more efficiently. Promote the use of shared computational infrastructure, tools, and practices across engineering teams within the Flight Sciences department
  • Develop processes and software tools to improve the reproducibility and traceability of computations. Drive the implementation of such tools
  • Promote a culture of software excellence across the engineering organization
  • Understand the needs of various engineering teams to efficiently utilize High-Performance Computing (HPC) resources, and make informed decisions on infrastructure solutions to ensure optimal resource utilization and cost savings
  • Maintain and administer on-premises HPC resources
  • Advocate for engineering and computing needs with the company-wide IT department
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Site Reliability Engineer (HPC)

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • OR equivalent experience
  • Strong proficiency in Kubernetes, Docker, and container orchestration
  • Knowledge of CI/CD pipelines for Inference and ML model deployment
  • Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code
  • Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.)
  • Strong programming/scripting skills in Python, Go, or Bash
  • Solid knowledge of distributed systems, networking, and storage
  • Experience running large-scale GPU clusters for ML/AI workloads (preferred)
Job Responsibility
Job Responsibility
  • Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of HPC clusters powering MAI model training and inference
  • Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into all aspects of HPC systems including GPU, clusters, storage and networking
  • Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in CPU+GPU environments
  • Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements
  • Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments
  • Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows
What we offer
What we offer
  • Competitive compensation, equity options, and comprehensive benefits
  • Fulltime
Read More
Arrow Right

Machine Learning Engineer – HPC

At Meshy, we believe 3D creation should be boundless and accessible. Our mission...
Location
Location
China , Shanghai
Salary
Salary:
Not provided
meshy.ai Logo
Meshy LLC
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience with CUDA and GPU programming
  • Strong programming skills in C++ and Python
  • Solid understanding of parallel programming, performance tuning, and numerical computation
Job Responsibility
Job Responsibility
  • Design, implement, and optimize GPU computing kernels to accelerate model training and inference for next-generation 3D GenAI models
  • Develop and maintain domain-specific libraries and performance-critical components for 3D generation workloads
  • Work closely with researchers and infra engineers to identify bottlenecks, benchmark performance, and deliver high-efficiency, production-ready GPU modules
  • Fulltime
Read More
Arrow Right

Product Development Engineer-hpc

This position provides direct exposure to new products and technologies, with op...
Location
Location
United States , Andover
Salary
Salary:
105000.00 - 130000.00 USD / Year
vicorpower.com Logo
Vicor Corp.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor of Science in Electrical Engineering or equivalent (MSEE a plus)
  • 5+ years experience in Power Electronics Design, Product Development, or Applications Engineering
  • Excellent understanding of analog design and an analytical approach to engineering required
  • Troubleshooting skills with basic power circuitries and systems required
  • Familiarity with feedback and other control systems
Job Responsibility
Job Responsibility
  • Working independently and with other members of the team on various projects designing VR systems
  • Architecting complete systems based on customer requirements, creating reference designs and working with the customer on design-in, bring-up and validation of their system
  • Write complete specifications based on customer requirements and work closely with the Design Engineering organization
  • Designing, simulating, prototyping, testing, and optimizing custom circuitry with Advanced products
  • Conducting bench testing of new products at a system level
  • Development of reference design and evaluation printed circuit boards
  • Supporting corporate New Product Introduction (NPI) efforts
  • Troubleshooting and optimizing customer systems which use Vicor Advanced products
  • Demonstrating system solutions and prototypes in front of the customer
Read More
Arrow Right