CrawlJobs Logo

Senior Systems Engineer HPC

rackspace.com Logo

Rackspace

Location Icon

Location:
India , Gurgaon

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Responsibility:

  • System Administration & Maintenance: Install, configure, and maintain HPC clusters (hardware, software, operating systems), perform regular updates/patching, manage user accounts and permissions, and troubleshoot/resolve hardware or software issues
  • Performance & Optimization: Monitor and analyse system and application performance, identify bottlenecks, implement tuning solutions, and profile workloads to improve efficiency
  • Cluster & Resource Management: Manage and optimize job scheduling, resource allocation, and cluster operations using tools such as Slurm, LSF, Bright Cluster Manager / Base Command Manager, OpenHPC, and Warewulf
  • Networking & Interconnects: Configure, manage, and tune Linux networking (TCP/IP, DNS, routing) and high-speed HPC interconnects (InfiniBand, Ethernet) to ensure low-latency, high-bandwidth communication
  • Storage & Data Management: Implement and maintain large-scale storage and parallel file systems (Lustre, Ceph, GPFS), ensure data integrity, manage backups, and support disaster recovery
  • Security & Authentication: Implement security controls, ensure compliance with policies, and manage authentication and directory services such as LDAP and Active Directory
  • DevOps & Automation: Use configuration management and DevOps practices (Ansible, Terraform, Jenkins, Git) to automate deployments, application packaging (RPM/DEB), and system configurations
  • User Support & Collaboration: Provide technical support, documentation, and training to researchers
  • collaborate with scientists, HPC architects, and engineers to align infrastructure with research needs
  • Planning & Innovation: Contribute to the design and planning of HPC infrastructure upgrades, evaluate and recommend hardware/software solutions, and explore cloud-based HPC solutions where applicable

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field (equivalent experience may substitute for degree)
  • Minimum of 10 years of systems experience, including at least 5 years working specifically with HPC
  • Strong knowledge of Linux operating systems (e.g., Rocky Linux, Ubuntu) with a fundamental understanding of Linux internals, system administration, and performance tuning
  • Experience building and managing RPM and DEB packages
  • Experience with cluster management tools such as Bright Cluster Manager, OpenHPC stack, or Warewulf
  • Proficiency with job schedulers and resource managers such as Slurm and LSF
  • Strong understanding of Linux networking (e.g., TCP/IP, DNS, routing) and HPC interconnects (e.g., InfiniBand, Ethernet) including performance tuning
  • Knowledge of parallel file systems such as Lustre, Ceph, or GPFS
  • Working knowledge of Linux authentication and directory services such as LDAP and Active Directory
  • Strong experience with DevOps and configuration management tools, including Ansible, Terraform, Jenkins, and Git
  • Strong knowledge of Linux security, compliance standards, and data protection best practices
  • Excellent communication, interpersonal, and problem-solving skills

Nice to have:

  • Proficiency in scripting languages (e.g., Python, Bash, R) and familiarity with MPI libraries for parallel and distributed computing
  • Knowledge of HPC in cloud environments (e.g., AWS, Azure, GCP HPC offerings) is a plus

Additional Information:

Job Posted:
January 05, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Systems Engineer HPC

Senior Linux System Administrator - Support Engineer

Senior Linux System Administrator/System Support Engineer with expertise support...
Location
Location
Australia , Canberra
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent work experience
  • At least 5 years of hands-on experience managing Linux systems in production environments, including HPC systems
  • Expertise in Linux/Unix operating systems, parallel file systems (Lustre, GPFS), and networking technologies
  • Proficiency in scripting/programming languages (Bash, Python, Perl, C++)
  • Experience with automation/configuration management tools (Ansible, Puppet, Chef, Terraform)
  • Strong understanding of networking concepts (TCP/IP, DNS, DHCP, firewalls, VPNs)
  • Familiarity with monitoring/logging tools (Nagios, Grafana, ELK Stack)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Excellent problem-solving, analytical, and communication skills
  • Demonstrated ability to work independently in multi-technology environments and collaborate across teams
Job Responsibility
Job Responsibility
  • Deploy, configure, maintain, and troubleshoot Linux servers (Red Hat, CentOS, Ubuntu, or others) across physical, virtual, and cloud environments
  • Support, maintain, and optimize HPC systems, including installation, servicing, and advanced technical troubleshooting of hardware/software and parallel file systems
  • Monitor system performance, availability, and security using industry-standard tools and practices
  • Plan and execute upgrades, patches, enhancements, and migrations to ensure systems are current, secure, and optimized
  • Automate system administration tasks using scripting languages and configuration management tools
  • Implement and maintain backup/recovery strategies, disaster recovery plans, and system documentation
  • Collaborate with development, network, and security teams to support application deployments and troubleshoot issues
  • Provide technical consulting, mentoring, and guidance to junior team members
  • Ensure compliance with strict security protocols in sensitive environments
  • Participate in on-call rotation and respond to system incidents and outages
What we offer
What we offer
  • Competitive salary and performance-based bonuses
  • Comprehensive health, dental, and vision insurance
  • Retirement plan options
  • Paid time off and holidays
  • Professional development opportunities
  • Flexible work arrangements
  • Fulltime
Read More
Arrow Right

Senior HPC Deployment Engineer

As a High Performance Computer (HPC) Solution Installation and Deployment Engine...
Location
Location
Australia , Melbourne
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in installing, configuring, and deploying HPC systems
  • strong knowledge of HPC architectures, parallel computing, and cluster management
  • proficiency in Linux/Unix operating systems
  • experience with HPC software tools and libraries (e.g., MPI, OpenMP, SLURM, Torque)
  • familiarity with high-speed networking technologies (e.g., InfiniBand, Ethernet)
  • excellent problem-solving skills and attention to detail
  • strong communication and interpersonal skills
  • ability to work independently and as part of a team
  • certifications in relevant technologies (e.g., Red Hat Certified Engineer, Certified HPC Professional)
  • experience with cloud-based HPC solutions
Job Responsibility
Job Responsibility
  • Install and configure HPC hardware and software components, including servers, storage, and networking equipment
  • set up and manage high-speed interconnects (e.g., InfiniBand, Ethernet)
  • deploy operating systems, cluster management software, and parallel file systems
  • coordinate with clients and project managers to understand deployment requirements and timelines
  • implement and document HPC deployment processes and best practices
  • perform system testing and validation to ensure optimal performance and reliability
  • provide technical support to clients during the installation and deployment phases
  • conduct training sessions for clients on HPC system usage and maintenance
  • develop and maintain user documentation and guides
  • monitor and analyze system performance to identify and resolve bottlenecks
What we offer
What we offer
  • Comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
  • specific programs for personal and professional development
  • inclusion and flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Senior System Support Engineer – High Performance Computing

The HPC Senior System Support Engineer provides highly visible on-site technical...
Location
Location
Australia , Canberra
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum TSPV Government Security clearance is mandatory for the role
  • Expertise in Linux/Unix operating systems, parallel file systems (e.g., Lustre, GPFS) and networking technologies is essential
  • Proficient in programming and scripting languages such as Python and C++
  • Ability to develop solutions that enhance the availability, performance, maintainability and agility of HPC solutions
  • Has contributed to the design and application of new tools
  • Possesses an understanding, at a detailed level, of architectural dependencies of technologies in use in the customer's IT environment
  • Frequently uses product and application knowledge along with internals or architectural knowledge to develop solutions
  • Able to communicate with internal and external senior management confidently and demonstrate the professionalism
  • Ability to work in a multi- technology environment with the ability to diagnose complex technical problems to their root cause
  • In addition to troubleshooting skills and consulting skills, has ability to summarise prognosis and impact at practice lead level
Job Responsibility
Job Responsibility
  • Responsible for verifying and implementing the detailed technical design solution to the problem as identified by the Project/Technical Manager
  • Provides detailed technical design, analyses and develops enterprise solutions
  • Regularly leads technical assessment and delivery solutions to the customer
  • Coordinates implementation of new installations, designs, and migrations for HPC solutions
  • Provides advanced technical consulting and advice to others on proposal efforts, solution design, system management, tuning and modification of solutions
  • Provides input to the company strategy moving forward
  • Collects and determines data from appropriate sources to assist in determining customer needs and requirements
  • Responds to requests for technical information from customers
  • Engages in technical problem solving across multiple technologies
  • often needs to develop new methods to apply to the situation
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Research Engineer

The HPE HPC & AI EMEA Research Lab (ERL) is characterized by a unique blend of i...
Location
Location
Germany , Munich, Berlin
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Development experience in compiled languages such as C, C++ or Fortran and experience with interpreted environments such as Python
  • At least a B.Sc. equivalent in a Science, Technology, Engineering or Mathematical discipline
  • Parallel programming experience, with programming models such as OpenMP, MPI, CUDA, OpenACC, HIP, PGAS languages, etc.
  • An understanding of AI/ML frameworks, experience with frameworks such as TensorFlow or PyTorch is highly desirable
  • An interest in system- and data center monitoring and operational data analysis
  • Professional language skills in English and German
Job Responsibility
Job Responsibility
  • Perform world-class research while also shaping products of the future
  • Work with the most esteemed research partners across Europe
  • Enable high performance research software on pre-Exascale and Exascale supercomputers
  • Provide new environments/abstractions to support application developers to build, deploy, and run applications taking advantage of leading-edge hardware at scale
  • Make and operate HPC/AI systems and datacenters in a sustainable way
  • Manage modern data-intensive workloads in high performance environments
What we offer
What we offer
  • Competitive salary and extensive benefits package (pension scheme, insurances, bike and car leasing, and other fringe benefits)
  • Work-life balance (flexible working time and hybrid workplace model, 30 vacation days, four HPE Wellness-Fridays, up to six months paid parental leave)
  • Support for education, training, and career development
  • Diverse and dynamic work environment
Read More
Arrow Right

Senior Linux Engineer

Are you an experienced Linux Engineer looking to take the next step in your care...
Location
Location
United Kingdom , Stevenage
Salary
Salary:
55000.00 - 60000.00 GBP / Year
morson.com Logo
Morson Talent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Linux administration experience across multiple distributions
  • RHCE certification or equivalent real-world expertise
  • Excellent communication and stakeholder engagement skills
  • Experience with automation and configuration management (e.g., Ansible)
  • Scripting capability (Bash, Python, or similar)
  • Experience in clustered, HPC, or performance computing environments
  • Knowledge of parallel file systems (e.g., Lustre) and HPC toolsets (e.g., Bright)
  • Understanding of networking (InfiniBand/Ethernet) and enterprise storage platforms (DDN, NetApp, IBM, Dell EMC)
  • Experience using batch schedulers (PBS Pro, Slurm, SGE/UGE, Microsoft Scheduler)
  • Candidates must be eligible for DV clearance
Job Responsibility
Job Responsibility
  • Supporting, developing, and enhancing secure Linux infrastructures across multiple customer contracts
  • Acting as a subject matter expert and providing mentorship to colleagues
  • Troubleshooting and resolving escalated or complex Linux issues
  • Working closely with customers and internal teams, building strong and trusted relationships
  • Contributing to solution design and future architecture when required
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

Senior Software Engineer responsible for delivering integrated product solutions...
Location
Location
United States , St. Louis
Salary
Salary:
Not provided
sovereigntec.com Logo
Sovereign Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced ability to translate business needs and problems into systems design and technical solutions
  • Proven experience with structured and object-oriented programming, design patterns, relational database design, operating systems, networking concepts, and systems integration
  • Demonstrated ability to evaluate project objectives and scope for feasibility, understanding, and scheduling, and to ensure projects meet budget and plan criteria
  • Complex analytical and problem-solving skills
  • Ability to multi-task and work well within a team environment
  • Advanced interpersonal skills, demonstrating an ability to apply leadership when required
  • Advanced oral and written communication skills
  • Agile
  • Master’s degree in Computer Science
  • Certification in Microsoft C#.NET software development
Job Responsibility
Job Responsibility
  • Provide IT solution design, delivery, and support expertise in Prophet, C#, Web, JavaScript, Oracle, and SQL Server technologies
  • Apply leadership and ownership through full solution development lifecycle while providing estimates, deliverables, and results
  • Meet regularly with Project Management and Technical Leads to manage status, milestones, risks, and issues in an Agile SDLC
  • Engage in customer planning sessions and demonstrate ability to drive out requirements
  • Analyze requirements, develop technical specifications, and perform solution gap analysis via Agile/Scrum methodology
  • Provide technical and/or business application consultation to customers and team members regarding functionality, architecture, operating systems, and databases for complex product systems
  • Prepare and present application and programming design solutions to fulfill business requirements
  • Engage technical analysts and business users to provide input on test cases, test scenarios, and test plans
  • Engage teams outside of immediate group as required (e.g. product integration points, infrastructure, help desk, security, and vendors)
  • Evaluate and balance application change risk with business need for timely product enhancements
Read More
Arrow Right

Senior Solution Architect AI & HPC

AI is a high-growth market for HPE, and we believe we are uniquely suited to bri...
Location
Location
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Engineering, Computer Science, or similar quantitative focus preferred
  • Ability to quickly prototype functionality into scripts for demos, integrations, troubleshooting, etc.
  • Expertise in cloud architectures, specifically with public cloud platforms such as AWS, Azure, or Google Cloud
  • Strong understanding of AI technologies, including machine learning, deep learning, and neural networks
  • Experience participating in solution configurations and the creation of PoCs to meet customer requirements
  • Solid knowledge of infrastructure components, including servers, storage, networking, and virtualization
  • Experience with high-performance computing (HPC) and GPU-accelerated systems is advantageous
  • Demonstrates expert technical skills in assigned area of specialization
  • Expert knowledge of the company offerings, strategic initiatives, current trends, competitor products and strategies within area of responsibility
  • Expert level written and verbal communication skills and mastery over English and local language
Job Responsibility
Job Responsibility
  • Collaborate with sales teams to understand customer requirements and develop tailored solutions for their AI infrastructure needs
  • Engage in pre-sales activities, including technical presentations, demonstrations, and proof-of-concepts
  • Act as a trusted advisor to customers, addressing their questions, concerns, and technical challenges effectively
  • Stay up-to-date with the latest advancements in AI technologies, cloud architectures, and infrastructure trends
  • Lead Proof-of-Concepts (PoC) for HPE customers expanding into Deep Learning or Machine Learning use cases
  • Architect reusable end-to-end AI solutions for HPE customers and prospects
  • Lead technical discussions with customers and partners to propose HPE and partner Integrated solutions
  • Identify solutions, define action plans, and help coordinate and deliver optimal solutions and enhancements
  • Recommend configurations and settings for different types of hardware and interconnect fabrics
  • Assist in any product or technical issue towards an initial sale or renewal of a customer
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Customer Support Engineer

As a Customer Support Engineer at a pioneering AI company, you'll be the first l...
Location
Location
India
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI
  • Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
  • Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages
  • Familiarity with operating storage systems in HPC environments such as Vast and Weka
  • Familiarity with inspecting and resolving network-related errors
  • Strong knowledge of Python, TypeScript, and/or JavaScript with testing/debugging experience using curl and Postman-like tools
  • Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
  • Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution
  • Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success
  • Strong sense of ownership and willingness to learn new skills to ensure both team and customer success
Job Responsibility
Job Responsibility
  • Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services
  • ensure swift and effective solutions every time
  • Become a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams
  • Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns
  • collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction
  • Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
  • Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers
  • Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work for the respective hiring region
Read More
Arrow Right