CrawlJobs Logo

Senior Compute Cluster Administrator

United States, Austin 109760.00 - 164640.00 USD / Year · Job Posted March 25, 2026
Apply Position
Job Link Share

Job Description

We are looking for a Senior Compute Cluster Administrator responsible for operating and supporting compute clusters used in upcoming datacenter buildouts leveraging AMD Instinct products. This role owns Day Two and beyond operations, encompassing both proactive maintenance and reactive support across complex, highly technical environments. This is an operational role supporting a demanding user base of AI server hardware, software, and firmware developers. You will manage a mix of R&D lab and production lab environments, each with distinct release cycles, stability requirements, and operational expectations. The role requires close collaboration with IT, Infosec, infrastructure automation teams, and deeply technical end users to ensure service quality, delivery commitments, and governance standards are consistently met.

Job Responsibility

  • Work directly with tenants and stakeholders to maximize service quality, utilization, and availability of managed compute clusters
  • Collaborate with highly technical users working deep within AMD’s Instinct platform (e.g., ROCm) to troubleshoot misconfigurations impacting HPC performance
  • Lead the resolution of complex issues during new deployments and ongoing operations
  • Partner with hardware vendors on technical escalations involving third‑party OEM platforms and coordinate maintenance cycles aligned with upstream releases
  • Support multiple Linux distributions across Red Hat and Ubuntu/Debian families
  • Act as a subject matter expert in one or more cluster scheduling technologies such as Slurm, LSF, Sun Grid Engine, OpenLava, or Kubernetes
  • Compare configurations and behaviors across heterogeneous clusters within AMD’s compute estate
  • Engage with emerging technologies where formal documentation may be limited, including white‑box platforms and pre‑beta hardware
  • Maintain and evolve compute images using automated CI/CD pipelines, or deploy software manually where automation is not available
  • Monitor cluster health, performance, and availability using standard tooling such as Grafana, Prometheus, and Zabbix
  • Work collaboratively with team members to reproduce and resolve difficult or intermittent issues
  • Train and enable on‑site L1 support teams
  • Participate in on‑call incident response as L2 support when required

Requirements

  • Hands‑on experience administering or supporting HPC clusters in production, research, or academic environments
  • Practical experience working as an HPC user combined with Linux system administration in enterprise or lab environments
  • Background in software development combined with deep Linux systems exposure in server or infrastructure contexts
  • Demonstrated intermediate to advanced Linux expertise
  • Strong understanding of networking fundamentals, including the OSI model, multi‑homed systems, firewall troubleshooting, and high‑speed interconnects
  • Willingness to experiment with open‑source and emerging technologies
  • Experience supporting infrastructure services such as DNS, DHCP, BOOTP, PXE, TFTP, NTP, and PAM
  • Understanding of interprocess communication and familiarity with MPI implementations such as OpenMPI or MPICH
  • Proficiency with Linux troubleshooting tools such as nmap, gdb, lsof, sar, and server management interfaces including IPMI, iDRAC, and iLO
  • Working knowledge of virtualization, VLANs, and directory services
  • Strong written communication skills with the ability to produce clear technical documentation
  • Experience developing automation using Python and/or Ansible
  • Familiarity with version control systems such as Git
  • Self‑directed, analytical, dependable, and comfortable working both independently and in a team‑based environment
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical discipline

Nice to have

  • Experience with RDMA
  • familiarity with PCIe, I2C, compiler optimization, or other low‑level system components is beneficial

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Compute Cluster Administrator

8 matching positions

Senior Cluster IT Specialist

Troubleshoot and resolve technical problems or issues related to computer softwa...
Location
Location
India , New Delhi
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Technical, Trade, or Vocational School Degree
  • At least 2 years of related work experience
Job Responsibility
Job Responsibility
  • Troubleshoot and resolve technical problems or issues related to computer software and systems
  • Provide technical guidance and recommendations to resolve business problems
  • Analyze, recommend, and implement process improvements
  • Enter commands and activate controls on computer and peripheral equipment to integrate and operate equipment
  • Troubleshoot, modify, support, manage, and maintain applications programs and user accounts
  • Maintain records of daily data communication transactions, problems and remedial actions taken, or installation activities
  • Train or instruct users in the proper use of hardware or software
  • Manage and coordinate planning, design, operations, maintenance, and resource allocation of telecommunications activities, including client/server support and strategic and tactical planning
  • Consult with and advise others on administrative policies and procedures, technical problems, priorities, and methods related to telecommunications
  • Assist management in hiring, training, scheduling, evaluating, disciplining, and motivating and coaching employees
  • Fulltime
Read More
Arrow Right

Senior Cluster IT Specialist

Troubleshoot and resolve technical problems or issues related to computer softwa...
Location
Location
India , New Delhi
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Technical, Trade, or Vocational School Degree
  • At least 2 years of related work experience
Job Responsibility
Job Responsibility
  • Troubleshoot and resolve technical problems or issues related to computer software and systems
  • Provide technical guidance and recommendations to resolve business problems
  • Analyze, recommend, and implement process improvements
  • Enter commands and activate controls on computer and peripheral equipment to integrate and operate equipment
  • Troubleshoot, modify, support, manage, and maintain applications programs and user accounts
  • Maintain records of daily data communication transactions, problems and remedial actions taken, or installation activities
  • Train or instruct users in the proper use of hardware or software
  • Manage and coordinate planning, design, operations, maintenance, and resource allocation of telecommunications activities, including client/server support and strategic and tactical planning
  • Consult with and advise others on administrative policies and procedures, technical problems, priorities, and methods related to telecommunications
  • Assist management in hiring, training, scheduling, evaluating, disciplining, and motivating and coaching employees
  • Fulltime
Read More
Arrow Right

Senior Database Administrator

As a Senior Data Administrator, you will be responsible for the database adminis...
Location
Location
United States
Salary
Salary:
Not provided
tier4group.com Logo
Tier4 Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in computer science or related technical degree
  • 7+ years of experience as a Database Administrator, with a strong focus on Microsoft SQL Server
  • Deep expertise in SQL Server architecture, performance tuning, indexing, and query optimization
  • Hands-on experience with Redgate tools or similar monitoring and control tools
  • Strong knowledge of high availability and disaster recovery solutions (e.g., Always On Availability Groups, clustering, replication)
  • Experience with backup/recovery strategies and performance monitoring tools
  • Solid understanding of database security best practices
  • Highly skilled in T-SQL scripting and automation including best practices and techniques to avoid duplication of knowledge in queries
  • Experience working in a mid-size or enterprise environment
Job Responsibility
Job Responsibility
  • Design, implement, and maintain highly available and scalable SQL Server database systems
  • Monitor database performance, identify bottlenecks, and implement tuning and optimization strategies
  • Manage database security, including access controls, encryption, and compliance with organizational policies. This includes table-level, column/field-level and row-level security controls
  • Assure adherence to ACID principles for transactional integrity, consistency and resilience
  • Perform backup, recovery, and disaster recovery planning and execution
  • Automate routine database administration tasks and deployments
  • Use and enhance monitoring of Redgate tools (e.g., SQL Toolbelt, SQL Compare, SQL Monitor) for performance, version control, and release management
  • Collaborate with development teams to design efficient database schemas, queries, and indexing strategies
  • Establish and enforce database standards, governance, and best practices
  • Troubleshoot and resolve database-related issues in a timely manner
  • Fulltime
Read More
Arrow Right

Senior HPC Administrator Technology Consultant

Provide technology consulting to external customers and internal project teams. ...
Location
Location
Saudi Arabia , Riyadh
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience
  • Bachelor of Arts/Science or equivalent degree in computer science or related area of study
  • without a degree, 11+ years of relevant professional experience
  • technical background and knowledge of industry trends
  • experience in HPC services including hardware and software for massively parallel (MPP) supercomputer systems, clusters and storage systems
  • ability to work on a 24 X 7 basis and be on standby when needed
  • willingness to learn new technologies in HPC including Cray EX Liquid cooled systems and Shasta
  • ability to manage customer relationship and communication
  • ability to analyze, qualify, troubleshoot, and resolve incidents
  • ability to collaborate with team members, other internal organizations, customers, and third parties
Job Responsibility
Job Responsibility
  • Verify and implement the detailed technical design solution
  • Provide a detailed technical design for enterprise solutions
  • Analyze and develop enterprise technology solutions
  • Lead in the technical assessment and delivery of specific technical solutions to the customer
  • Provide a team structure conducive to high performance, and manage the team lifecycle stages
  • Coordinate implementation of new installations, designs, and migrations for technology solutions
  • Provide advanced technical consulting and advice to others on proposal efforts, solution design, system management, tuning and modification of solutions
  • Provide input to the company strategy moving forward
  • Collect and determine data from appropriate sources to assist in determining customer needs and requirements
  • Respond to requests for technical information from customers
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Database Administrator

Senior Database Administrator role focusing on PostgreSQL and ClickHouse databas...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Information Technology, or a related field
  • 5-8 years of experience in database administration with strong PostgreSQL and ClickHouse expertise
  • Deep understanding of PostgreSQL architecture, configuration, replication, and tuning
  • Experience with ClickHouse configuration and optimization for large-scale analytics
  • Strong SQL and query optimization capabilities
  • Familiarity with backup/recovery tools such as pgBackRest, Barman, and ClickHouse utilities
  • Proficiency in Linux environments and shell scripting
  • Exposure to cloud-hosted database services like AWS RDS/Aurora, Azure Database, or GCP
  • Strong analytical thinking and problem-solving ability
  • Clear communication and effective cross-team collaboration
Job Responsibility
Job Responsibility
  • Install, configure, and maintain PostgreSQL and ClickHouse across development, staging, and production environments
  • Monitor database health, performance, and resource usage
  • Manage schemas, indexes, roles, and permissions for performance and security
  • Analyze and tune queries, indexing strategies, and configurations for low-latency, high-throughput workloads
  • Optimize ClickHouse for analytical and OLAP workloads on large datasets
  • Implement backup strategies and disaster recovery solutions
  • Configure and manage replication, clustering, and failover setups
  • Apply database security best practices including encryption, access controls, and audit logging
  • Ensure compliance with industry standards and regulations
  • Investigate and resolve database-related incidents and performance issues
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Career growth opportunities
  • Comprehensive benefits suite supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Senior Systems Administrator

The role of the System Administrator includes supporting the implementation, tro...
Location
Location
United States , Laurel
Salary
Salary:
Not provided
wrench.io Logo
Wrench Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fourteen (10) years of experience of professional experience as a SA
  • Bachelor’s degree in Computer Science or related discipline from an accredited college or university is required
  • Five (5) years of additional SA experience may be substituted for a bachelor’s degree
  • Provide expert in troubleshooting IT systems
  • Provide thorough analysis and feedback to management and internal customers regarding escalated tickets
  • Extend support for dispatch system and hardware issues, remaining actively engaged in the resolution process
  • Handle configuration and management of UNIX and Windows (or other relevant) operating systems, including installation/loading of software, troubleshooting, maintaining integrity, configuring network components, and implementing enhancements to improve reliability and performance
  • NetApp experience required
  • Able to write the following scripting languages: Python, Ruby and Perl
Job Responsibility
Job Responsibility
  • Supporting the implementation, troubleshooting, and upkeep of Information Technology (IT) systems
  • Overseeing the IT system infrastructure and associated processes
  • Providing assistance for day-to-day operations, monitoring, and resolving issues related to client/server/storage/network devices, as well as mobile devices
  • Diagnosing and resolving problems
  • Configuring, and managing UNIX and Windows operating systems
  • Installing, and maintaining operating system software
  • Ensuring integrity, and configuring network components
  • Implementing enhancements to operating systems to enhance reliability and performance
  • Provides assistance with the installation, configuration, optimization, and administration of extensive Hadoop (Apache Accumulo) clusters dedicated to data-intensive computing tasks
Read More
Arrow Right

Senior Database Administrator

The Senior Database Administrator role requires over 8 years of experience in da...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of overall database administration experience
  • 3+ years in a lead or senior DBA role handling enterprise customers
  • Act as Lead DBA for Oracle, MS SQL Server, and Open-Source databases in production and non-production environments
  • Manage customer SLA compliance and service stability
  • Manage Customer satisfaction and trust
  • Strong customer communication and stakeholder management skills
  • Ability to lead teams and manage shifts in a 24x7 environment
  • Excellent documentation, reporting, and presentation skills
  • Provide L3/L4 support for critical database issues and escalations
  • Strong hands-on experience in: Oracle Database (11g/12c/18c/19c or higher)
Job Responsibility
Job Responsibility
  • Lead a team
  • Ensure customer satisfaction
  • Manage database performance and security
  • Manage customer SLA compliance and service stability
  • Provide L3/L4 support for critical database issues and escalations
  • Monitor and tune database performance
  • Proactively manage capacity planning, growth forecasting, and optimization
  • Analyze trends and recommend improvements to ensure stability and scalability
  • Implement and manage database security and encryption
  • Plan and execute database upgrades and migrations across platforms
Read More
Arrow Right

Senior Database Administrator

We are seeking a Senior Database Administrator to manage, maintain, and optimize...
Location
Location
South Africa , Pretoria
Salary
Salary:
Not provided
overturerede.in Logo
Overture Rede
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, IT, or a related field
  • 3–5 years of hands-on experience as a Database Administrator
  • Relevant certification(s), such as: Microsoft Certified: Database Administrator Associate
  • Microsoft Certified: Azure Database Administrator Associate
  • Or similar recognized database certifications
  • Strong experience with Microsoft SQL Server administration
  • Knowledge of performance tuning, indexing, and query optimization
  • Experience with backup, disaster recovery, and data security practices
Job Responsibility
Job Responsibility
  • Install, configure, administer, and support database systems, primarily Microsoft SQL Server
  • Monitor database performance, capacity, and availability
  • perform tuning and optimization
  • Implement backup, recovery, and high-availability solutions (Always On, clustering, replication)
  • Ensure database security, access controls, and compliance with organizational policies
  • Troubleshoot and resolve complex database-related issues
  • Support database upgrades, patching, and migrations
  • Work closely with application and infrastructure teams to support database-driven systems
  • Maintain documentation, standards, and operational procedures
What we offer
What we offer
  • Strong job stability
  • Career progression
  • On-call & critical support allowances
  • Fulltime
Read More
Arrow Right