CrawlJobs Logo

HPC SME

India, Bangalore · Job Posted January 16, 2026
Apply Position
Job Link Share

Job Description

HPE Operations is our innovative IT services organization. It provides the expertise to advise, integrate, and accelerate our customers’ outcomes from their digital transformation. Our teams collaborate to transform insight into innovation. In today’s fast paced, hybrid IT world, being at business speed means overcoming IT complexity to match the speed of actions to the speed of opportunities. Deploy the right technology to respond quickly to market possibilities. Join us and redefine what’s next for you.

Job Responsibility

  • Review and Validate HPC solutions and Environment through POCs and Benchmarking
  • Architecting and designing HPC solutions tailored to the customer’s needs
  • Overseeing solution implementation, integration and testing
  • Diagnose and correct solution issues during the implementation
  • Providing training, documentation and ongoing support
  • Maintain the Life-cycle management of the HPC environment
  • Oversee the team operations and deliverables
  • Lead the team with technical expertise ensure regular technical session and case reviews
  • Demonstrate high level of technical & communication skills under critical situations
  • Takes responsibility for end-to-end problem ownership and its solutions
  • Should be a good team player

Requirements

  • 8 - 12 years of experience different flavours of Linux like SLES, RHEL and Ubuntu/Debian
  • 5 - 8 years Experience in managing HPC/Linux clusters and should have good understanding of its architecture
  • Skilled in installation and configuration of various applications on Linux
  • Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration
  • Diagnose and resolve system issues and performance issues
  • Should have experience in drafting technical SOPs, action plans and knowledge documents
  • Should have good understanding of different cloud platforms
  • Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime
  • Triage and solve user-submitted tickets, especially when they relate to the infrastructure
  • Track resource usage using monitoring and queuing software
  • Peer assistance is an added trait
  • Demonstrated expertise with Linux system administration, including OS, networking, storage, Docker and security
  • Experience with high-speed networking such as InfiniBand and 10/40 Gigabit Ethernet
  • Familiarity with large storage systems (Scality, Weka, Lustre, GPFS, others)
  • Experience with HPC clusters manager ( HPCM, Bright Cluster Manager)
  • Experience in server hardware patching and troubleshooting
  • Experience managing HPC clusters and GPUs
  • Experience using and supporting job schedulers such as SLURM, PBS or other schedulers
  • Familiar with Shell/python scripting and Ansible
  • Familiar with monitoring tools like Grafana/Nagios/Opsramp
  • Familiar with virtualization technologies like KVM, VMWare, vCenter
  • Infrastructure Monitoring: Nagios, OpsRamp, HPE PCM, NVIDIA BCM, Solar Winds
  • Virtualization: Containers, Kubernetes, Vmware and OpenShift

What we offer

  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

HPC SME

8 matching positions

HPC SME

HPE Operations is our innovative IT services organization. It provides the exper...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8-12 years of experience with different flavours of Linux like SLES, RHEL and Ubuntu/Debian
  • 5-8 years experience in managing HPC/Linux clusters with good understanding of its architecture
  • Skilled in installation and configuration of various applications on Linux
  • Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration
  • Diagnose and resolve system issues and performance issues
  • Experience in drafting technical SOPs, action plans and knowledge documents
  • Good understanding of different cloud platforms
  • Reinstate integrity of system as quickly as possible following an outage
  • Triage and solve user-submitted tickets
  • Track resource usage using monitoring and queuing software
Job Responsibility
Job Responsibility
  • Review and Validate HPC solutions and Environment through POCs and Benchmarking
  • Architecting and designing HPC solutions tailored to the customer's needs
  • Overseeing solution implementation, integration and testing
  • Diagnose and correct solution issues during the implementation
  • Providing training, documentation and ongoing support
  • Maintain the Life-cycle management of the HPC environment
  • Oversee the team operations and deliverables
  • Lead the team with technical expertise ensure regular technical session and case reviews
  • Demonstrate high level of technical & communication skills under critical situations
  • Takes responsibility for end-to-end problem ownership and its solutions
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Senior Manager IT Storage Engineering

Lead the strategy, architecture, and delivery of enterprise storage and data pro...
Location
Location
United States , San Jose
Salary
Salary:
180400.00 - 270600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Leadership and people management skills
  • Ability to mentor and grow high-performing teams
  • Communication skills, ability to translate complex technical concepts into business outcomes
  • Collaborative mindset with strong stakeholder management across Engineering, IT, Security, and Compliance
  • Strategic thinking with a balance of innovation and pragmatic execution
  • Problem-solving orientation with a focus on continuous improvement and operational excellence
  • Customer-focused approach with an emphasis on user experience and service reliability
  • Experience in enterprise storage and data protection (file, block, object)
  • Engineering team management experience
  • Proven expertise in high-performance storage solutions for EDA, HPC, AI/ML workloads
Job Responsibility
Job Responsibility
  • Own and evolve the storage platform roadmap, balancing EDA performance requirements with enterprise resilience, compliance, and cost goals
  • Define and govern reference architectures across file, block, object storage, and data protection (on-prem and cloud)
  • Lead storage design reviews for EDA workflows, including metadata-intensive and high IOPS/low latency workloads (build/test/simulation/regression flows)
  • Establish standard patterns for tiering, archiving, retention, immutability (WORM), and disaster recovery (RPO/RTO, replication, failover)
  • Design solutions supporting AI workloads (including >5TB/s training throughput)
  • Ensure storage segmentation and isolation aligned with performance and security requirements
  • Align all architectures with security, encryption, RBAC, audit logging, and data governance standards
  • Own end-to-end storage and backup service delivery (availability, performance, capacity, recoverability, user experience)
  • Lead major incident management, root cause analysis, and corrective/preventive actions
  • Define and track SLAs/SLOs (latency, throughput, backup success, restore times, replication health)
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance
  • Fulltime
Read More
Arrow Right

OpenShift Architect

We are currently seeking a OpenShift Architect to join our team in Bangalore, Ka...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must be a graduate (B.Tech/B.E./MCA or equivalent)
  • Post-graduate degree in Computer Science or related field is highly preferred
  • 10 to 15 years of experience in Infrastructure Engineering, Unix/Linux Systems Architecture, and Cloud-Native platforms
  • 5+ years of experience as a primary Architect leading enterprise-scale Red Hat OpenShift (OCP 4.x) environments
  • Red Hat Certified Architect (RHCA) – Level II or higher (Cloud/Datacenter)
  • Red Hat Certified Specialist in MultiCluster Management (EX432) or Automation (EX380)
  • Solutions Architect Professional (AWS SAP-C02, Azure AZ-305, or GCP Professional Architect)
  • Willingness to work in rotational shifts/on-call as a technical lead in a 24x7 support window
Job Responsibility
Job Responsibility
  • Serve as the global SME for RHEL/RHCOS, architecting kernel-level optimizations, advanced system tuning, and high-performance computing (HPC) configurations
  • Define the strategy for transitioning legacy UNIX (AIX/Solaris/HP-UX) and monolithic Linux workloads into containerized or virtualized environments on OpenShift
  • Lead architectural decisions for Bare Metal, VMware, and KVM integration
  • Design global, highly available OpenShift architectures across hybrid and multi-cloud environments (IPI/UPI)
  • Direct architectural oversight for ROSA (AWS) and ARO (Azure)
  • Drive the roadmap for OpenShift Virtualization (KubeVirt) to unify VM and container management
  • Architect software-defined networking (SDN/OVN) and enterprise storage strategies using OpenShift Data Foundation (ODF)
  • Architect global automation frameworks using Ansible Automation Platform and Terraform
  • Establish organizational standards for OpenShift GitOps (ArgoCD)
  • Expert-level implementation of Red Hat Advanced Cluster Management (RHACM) for global governance
  • Fulltime
Read More
Arrow Right

Senior Director, CTIO Engineering Technologists

From applied research to advanced engineering, the Engineering Technologist team...
Location
Location
United States , Austin; Santa Clara
Salary
Salary:
277000.00 - 358000.00 USD / Year
dell.com Logo
Dell
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 18 yrs overall experience with 5 years' experience Leading and Directing the strategic and operational objectives of their organization related to HPC (high-performance compute) clusters, AI compute, AI Datacenter, AI Storage etc.
  • Demonstrated experience delivering AI Solutions
  • Experience developing long-term technology strategies based on the technical and business information
  • Drives for internal and external alignment of the strategy
  • Identifies and develops differentiation opportunities, provides technical information, and makes recommendations to marketing, procurement, engineering, customers, and business executives
  • Participates as required in strategic initiatives for improvements in process, quality, and cost
Job Responsibility
Job Responsibility
  • Lead a team of highly skilled SME's in the development of next generation large scale AI Systems including accelerated compute, AI fabrics and AI optimized storage and AI Software Stack
  • Responsibilities include the assimilation and understanding of the industry and competitive environment for a given technology or product line, and the derivation of a technology/product strategy from this information.
  • Leads technology investigations, performs a strategic analysis of the industry capabilities, and develops recommendations, which influence the technical product strategy and/or definition of products for a given product line, including evaluation of potential acquisitions and vendor partner opportunities.
  • Engages design teams, systems engineering, marketing teams, suppliers, and business unit leaders and executives to ensure the strategy or product architecture meets Dell’s requirement of product leadership for the given technology area or product line.
What we offer
What we offer
  • Comprehensive Healthcare Programs
  • Award Winning Financial Wellness Tools and Resources
  • Generous Leave of Absence for New Parents and Caregivers
  • Industry Leading Wellness Platform
  • Employee Assistance Program
  • Fulltime
Read More
Arrow Right

Fire Safety Specialist

The DSEAR Specialist provides expert advice, assessment, and assurance on the sa...
Location
Location
United Kingdom , Bridgwater
Salary
Salary:
Not provided
rullion.co.uk Logo
Rullion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expert advice, assessment, and assurance on the safe management of dangerous substances and explosive atmospheres
  • Ensuring organisational compliance with the Dangerous Substances and Explosive Atmospheres Regulations (DSEAR) 2002
  • Supporting safe operational practices and reducing fire and explosion risk
  • Conducting, reviewing, and approving DSEAR assessments
  • Implementing and overseeing control measures to mitigate risks
  • Managing legal registers and ensuring compliance with statutory, regulatory, and non-regulatory provisions
  • Maintaining up-to-date knowledge of HSE guidance, British Standards, and best practices
  • Supporting the development of DSEAR arrangements and procedures
  • Supporting investigation into DSEAR related incidents
Job Responsibility
Job Responsibility
  • Act as the PC's SME for DSEAR providing specialist advice and guidance
  • Develop, maintain, and review the implementation of the Principal Contractors DSEAR policy and DSEAR management arrangements
  • Provide the regulators with confidence in the adequacy of DSEAR arrangements on site
  • Support the review of DSEAR risk assessments carried out by other organisations
  • Anticipate future DSEAR needs on the construction site
  • Support construction to continue in line with the project leads whilst ensuring legislative compliance
  • Delivering monthly and annual objectives, goals, and KPIs to support the project
  • Maintain an up-to-date knowledge and understanding of matters relevant to the post
  • Support the HPC Fire and Rescue/Emergency Preparedness/Health and Safety Teams
What we offer
What we offer
  • Be Part of History: Work on the UK's first new nuclear power station in a generation
  • Scale & Impact: Over 22,000 workers contributing to a £36 billion project that supports 70,000+ UK jobs
  • Net Zero Future: Contribute to a project essential to Britain's low-carbon energy transformation
  • Career Development: Work in a multi-disciplinary environment with exposure to high-level planning and world-class logistics operations
  • Fulltime
Read More
Arrow Right

Fire Safety - DSEAR - Specialist

The DSEAR Specialist provides expert advice, assessment, and assurance on the sa...
Location
Location
United Kingdom , Bridgwater
Salary
Salary:
Not provided
morson.com Logo
Morson Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • In-depth knowledge of DSEAR 2002, ATEX Directives, and relevant ACoPs
  • Strong understanding of explosion science, hazardous properties, CLP, and ignition source control
  • Experience conducting DSEAR assessments and hazardous area classification
  • Ability to interpret engineering drawings
  • Strong analytical and problem-solving skills with a pragmatic approach
  • Excellent written and verbal communication skills, able to simplify technical content
  • Degree in Engineering, Fire Safety, Process Safety, Chemistry, or related discipline
  • DSEAR-specific training or competency accreditation (e.g., CompEx Ex12 and Ex14, NEBOSH Fire/Process Safety)
  • Experience in COMAH, process industries, manufacturing, labs, or fuel environments
  • Experience auditing Ex-rated equipment and installations
Job Responsibility
Job Responsibility
  • Act as the Principal Contractor’s SME for DSEAR providing specialist advice and guidance to the project and contractors
  • Conducting, reviewing, and approving DSEAR assessments to identify and evaluate risks associated with dangerous substances
  • Implementing and overseeing control measures to mitigate risks, such as fire prevention and explosion protection
  • Managing legal registers and ensuring compliance with statutory, regulatory, and non-regulatory provisions
  • Providing consultancy, advice, guidance, and regulatory support to ensure that all aspects of DSEAR compliance are met
  • Maintaining up-to-date knowledge of HSE guidance, British Standards, and best practices in the field
  • Support the development of DSEAR arrangements (both general and process related) and procedures that meet legislative requirements and follow best practice
  • To develop, maintain, and review the implementation of the Principal Contractors DSEAR policy and DSEAR management arrangements and assessments, in conjunction with the multitude of other contractors, as part of the process of ensuring legal compliance with statutory and regulatory requirements
  • Provide the regulators with confidence in the adequacy of DSEAR arrangements on site
  • Support the review of DSEAR risk assessments carried out by other organisations concerned with the HPC project
Read More
Arrow Right

Senior Software Engineer

The High-Performance Computing (HPC) Software Engineer shall be responsible for ...
Location
Location
United States , Annapolis Junction
Salary
Salary:
Not provided
2hb.com Logo
2HB
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Masters degree in Math, Computer Engineering, Computer Science, or related discipline from an accredited college or university, plus five (5) years of experience as an HSE, in programs and contracts of similar scope, type, and complexity
  • OR Bachelor's degree in Math, Computer Engineering, Computer Science, or related discipline from an accredited college or university, plus seven (7) years of experience as an HSE, in programs and contracts of similar scope, type, and complexity
  • OR Nine (9) years of experience as an HSE, in programs and contracts of similar scope, type, and complexity
  • Experience using the Linux CLI and Linux tools
  • Experience developing Bash scripts to automate manual processes
  • Recent software development experience using C/C++ and Python
  • Strong experience with parallel programming models such as MPI, OpenMP, CUDA
  • Deep understanding of multi-threading and concurrency, memory hierarchy and cache optimization, NUMA architectures, vectorization and SIMD
  • Experience implementing and maintaining parallel and distributed algorithms optimized for scalability and performance across HPC components including CPU, GPU, memory, storage, and network layers
  • SME for parallel computing strategies and statistical modeling
Job Responsibility
Job Responsibility
  • Design, development, optimization, and maintaining scalable parallel and distributed systems that operate in high-performance computing environments
  • Serve as a technical leader, driving innovation, architectural decisions and mentoring engineers in advanced HPC methodologies
  • Fulltime
Read More
Arrow Right
New

IT Training Lead

The IT Training Lead will drive technology learning and user adoption across the...
Location
Location
United States , Delray Beach
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in IT training, instructional design, technical enablement, or learning and development
  • Strong knowledge of Microsoft 365
  • Excellent communication, facilitation, and content development skills
  • Ability to translate technical concepts into practical, user-friendly training.
Job Responsibility
Job Responsibility
  • Design, develop, and deliver IT training programs in instructor-led, virtual, and self-paced formats
  • Take lead in the Microsoft Copilot and AI training strategy, including onboarding, advanced use cases, responsible AI usage, and ongoing enablement
  • Partner with IT leadership to support new technology rollouts, system upgrades, and digital transformation initiatives
  • Create and maintain training content, including videos, guides, tutorials, and job aids
  • Identify skill gaps and develop targeted learning solutions to improve adoption and productivity
  • Gather feedback and measure training effectiveness to continuously improve programs.
Read More
Arrow Right