CrawlJobs Logo

Senior Infrastructure Engineer

United States, San Francisco 183000.00 - 210000.00 USD / Year · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

We are seeking a highly skilled and motivated GPU Fleet Operations Engineer to join Crusoe’s Fleet Operations team. This role is focused on the advanced diagnosis, maintenance, and repair of high-performance GPU compute clusters, ensuring maximum uptime, reliability, and performance across our fleet. The ideal candidate will be hands-on with GPU rack-level troubleshooting and work closely with data center operations, engineering, and vendors to support cutting-edge infrastructure featuring the latest NVIDIA and AMD GPUs. This position plays a critical role in maintaining the health and scalability of Crusoe’s rapidly growing GPU fleet.

Job Responsibility

  • Perform deep-level diagnosis and troubleshooting of hardware faults within GPU racks and high-density compute systems
  • Troubleshoot and support GPU platforms including NVIDIA A100, H200, GB200, B200 and AMD 350X / 355X
  • Execute component-level diagnosis and remediation for failed or degraded hardware
  • Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for GPUs, power supplies, cooling systems, interconnects, and networking hardware
  • Conduct post-repair validation, burn-in testing, torch testing, and NVIDIA NCCL testing to ensure system stability and performance
  • Implement and execute preventative maintenance procedures to improve fleet reliability and extend hardware lifespan
  • Perform firmware and BIOS upgrades across the GPU fleet
  • Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems
  • Develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows
  • Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and implement preventative solutions

Requirements

  • Proven experience diagnosing and repairing high-density, rack-mounted compute hardware in production environments
  • Deep understanding of GPU architectures and hands-on experience with GPU-based systems
  • Experience supporting NVIDIA A100, H200, GB200, B200 and AMD 350X / 355X series platforms
  • Familiarity with high-speed interconnects such as InfiniBand, NVLink, and RDMA over Converged Ethernet (RoCE)
  • Strong Linux experience (Ubuntu, Rocky Linux, CentOS) using the command line for diagnostics and testing
  • Proficiency with GPU and system diagnostic tools such as NVIDIA DCGM and NVIDIA field diagnostic utilities
  • Experience working with enterprise server hardware, power delivery, and cooling systems
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration skills
  • Ability to work independently in a fast-paced data center or operations environment

Nice to have

  • Technical certification or Associate’s/Bachelor’s degree in Electrical Engineering, Computer Science, or a related field or demonstrated experience
  • Experience working directly with hardware vendors and escalations
  • Background in large-scale GPU fleet operations or hyperscale data center environments

What we offer

  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit
  • $300 per pay period

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Infrastructure Engineer

8 matching positions

New

Senior Infrastructure Engineer

About this role: Wells Fargo is seeking a Senior Infrastructure Engineer. In ...
Location
Location
India , BENGALURU
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
June 30, 2026
Flip Icon
Requirements
Requirements
  • 4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • Programming & Automation
  • Strong proficiency in Python in Python for automation, scripting, and problem-solving
  • Working knowledge of Ansible for automation, including playbook and role development
  • API & Integration
  • Experience in developing and consuming REST APIs
  • Ability to integrate automation solutions with enterprise systems and services
  • CI/CD & DevOps Practices
  • Hands-on experience with CI/CD tools (GitHub Actions, Jenkins, or similar)
  • Exposure to pipelines-as-code and SDLC automation practices
Job Responsibility
Job Responsibility
  • Lead or participate in high level technical concepts spanning technology and business
  • Develop specifications for complex infrastructure systems, design and test solutions
  • Contribute to the testing of business, application and technical infrastructure requirements
  • Drive solutions to reduce recovery
  • Review and analyze solutions for cloud security, secrets management and key rotations
  • Design, code, test, debug and document programs using Agile development practices
  • Design complex system upgrades
  • Resolve troublesome trends as they develop
  • Develop a long range plan designed to resolve problems and prevent them from recurring
  • Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Location
Location
United States , Santa Barbara
Salary
Salary:
100000.00 - 150000.00 USD / Year
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in IT consulting, infrastructure engineering, or enterprise IT support
  • Strong technical knowledge across cloud platforms (Azure/M365), networking, security, and virtualization
  • Experience with firewalls, backups, identity management, and endpoint security tools
  • Proven ability to manage multiple client environments and technical projects simultaneously
  • Excellent communication skills with the ability to translate technical concepts to non-technical audiences
  • Strong customer service mindset and ability to build trusted client relationships
  • Experience troubleshooting complex systems across Windows, macOS, and network environments
  • Valid driver’s license and ability to travel to client sites
Job Responsibility
Job Responsibility
  • Own and manage client IT environments end-to-end, including design, implementation, maintenance, and escalation support
  • Lead technical projects from planning through execution, ensuring timely and successful delivery
  • Design and implement cloud, infrastructure, and security solutions aligned to client needs
  • Serve as a strategic advisor to clients, recommending improvements and technology roadmaps
  • Install, configure, and support networks, servers, cloud platforms, and endpoint systems
  • Monitor system health, manage backups, apply updates, and ensure security best practices
  • Troubleshoot and resolve complex technical issues across infrastructure, applications, and networks
  • Provide clear communication to both technical and non-technical stakeholders, including executives
  • Mentor junior team members and collaborate across technical teams
  • Maintain documentation, track work progress, and ensure high service quality standards
What we offer
What we offer
  • Performance-based bonuses and incentive opportunities
  • medical, dental, vision, and life insurance
  • Retirement plan with employer contribution
  • Paid time off and company holidays
  • Professional development support, including training and certifications
  • wellness benefits, team events, and employee recognition programs
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Location
Location
India , Hyderabad
Salary
Salary:
Not provided
alterdomus.com Logo
Alter Domus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 7-8 years of experience in Infrastructure administration
  • Strong hands-on experience with Linux systems (RHEL, Ubuntu, SUSE)
  • Proven expertise in Linux system administration, performance tuning, and troubleshooting
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, Zabbix, Nagios, ELK Stack, or Splunk
  • Strong understanding of observability, logging, and system health monitoring
  • Proficiency in scripting and automation tools (Bash, Python, Ansible)
  • Good understanding of networking fundamentals (TCP/IP, DNS, HTTP)
  • Familiarity with virtualization and cloud platforms
  • Experience in capacity planning and performance optimization
  • Strong troubleshooting and analytical skills
Job Responsibility
Job Responsibility
  • Manage Linux infrastructure (RHEL, CentOS, Ubuntu, SUSE), including installation, patching, upgrades, and troubleshooting
  • Perform OS-level monitoring and tuning (CPU, memory, disk, processes, kernel)
  • Maintain file systems and storage configurations (LVM, NFS, ext4, xfs)
  • Design and manage monitoring and alerting frameworks
  • Configure and tune alerts for system, application, and network performance
  • Ensure high availability and continuous system monitoring
  • Administer monitoring tools such as Prometheus, Grafana, Zabbix, Nagios, ELK, and Splunk
  • Build dashboards and track system health, KPIs, and performance trends
  • Perform proactive health checks and capacity planning
  • Troubleshoot complex issues and perform root cause analysis (RCA)
What we offer
What we offer
  • Support for professional accreditations such as ACCA and study leave
  • Flexible arrangements, generous holidays, plus an additional day off for your birthday
  • Continuous mentoring along your career progression
  • Active sports, events and social committees across our offices
  • 24/7 support available from our Employee Assistance Program
  • The opportunity to invest in our growth and success through our Employee Share Plan
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Our client is a world-leading reinsurance and risk management company, deliverin...
Location
Location
Romania , Cluj
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Informatics/ or similar field of study/or equivalent working experience is required
  • Minimum 5 years demonstrable experience in a DevOps Engineer role
  • Strong experience with Terraform, YAML, and PowerShell scripting
  • Proficient with Azure infrastructure components (IaaS and PaaS)
  • Familiarity with distributed systems and administration of Windows Servers
  • Understanding of Azure Policies and redundancy scaling concepts
  • Experience with Azure networking components and API Management
  • Knowledge of OAuth2, OpenID Connect protocols, and Azure Entra Identity Provider
  • Basic proficiency in SQL
  • Excellent problem-solving and communication skills
Job Responsibility
Job Responsibility
  • Utilize Terraform and ARM templates to automate the deployment and management of cloud infrastructure
  • Develop scripts in PowerShell and YAML for continuous integration and continuous deployment (CI/CD) pipelines and other automation tasks
  • Implement and manage Azure IaaS and PaaS services, ensuring optimal performance and cost management
  • Configure and manage Azure networking components, including Application Gateways and Azure Firewalls, to ensure secure and efficient traffic flow
  • Visualize and monitor infrastructure performance and efficiency, leveraging native Azure tools and third-party solutions
  • Implement OAuth2 and OpenID Connect for secure authentication, utilizing Azure Entra ID as the identity provider
  • Enforce and manage Azure Policies to ensure compliance and governance of cloud resources
  • Design and implement scalable solutions with redundancy features, such as geo-redundancy, to ensure high availability
  • Apply basic SQL knowledge for managing and querying databases as part of application deployments and operational support
  • Work with development and operations teams to foster a culture of collaboration and continuous improvement in DevOps practices
What we offer
What we offer
  • New beginnings can be a challenge. We promise a smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Early bird or night owl? Our projects have different working hours to suit your needs
  • Nobody is born an expert. Sharpen your tech skills with our sponsored certifications, trainings and top e-learning platforms
  • We want you to stay healthy! Enjoy our Private Health Insurance ⁠– it’s custom-made for you
  • A clear mind is a healthy mind. Attend individual coaching sessions or go one step further by joining our accredited Coaching School
  • Make the most of our epic parties or themed events – they’re lovingly designed for our people and their families
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

The IT infrastructure engineer will design, build, manage and support the IT-inf...
Location
Location
United Kingdom , Hereford
Salary
Salary:
53.73 GBP / Hour
outsource-uk.co.uk Logo
Outsource UK
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Current SC clearance
  • Minimum of 3 years as an infrastructure engineer
  • Minimum of 8 years in a related IT field
  • Strong knowledge of Linux OS, Windows Server OS and virtualisation (VMware, Hyper-V)
  • Certifications (CCNA, RHCA, MCSE, VCP-DCV)
Job Responsibility
Job Responsibility
  • Design, build, manage and support the IT-infrastructure that underpins a tactical mission network hosted on an on-premise environment
  • Maintain IT infrastructure
  • Troubleshoot and rectify problems
  • Develop and deploy new infrastructure
  • Monitor system performance
  • End-user device build process
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

I'm working with a leading charity looking to hire a Senior Infrastructure Engin...
Location
Location
United Kingdom , Reading
Salary
Salary:
47000.00 - 50000.00 GBP / Year
understandingrecruitmentnfp.com Logo
Understanding Recruitment NFP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience across Microsoft Azure(VM's, Networking, Firewalls)
  • Solid networking knowledge (firewalls, routing, switching, VPNs, WAN/LAN)
  • Proven troubleshooting and problem-solving skills across complex environments
  • Experience delivering infrastructure projects and driving improvements/automation (e.g. PowerShell)
  • Ability to communicate technical concepts clearly to both technical and non-technical
Job Responsibility
Job Responsibility
  • Play a key role in shaping and supporting their technical environment
  • mix of technical leadership, project delivery, and continuous improvement
  • Work across core Microsoft and networking technologies & azure, supporting day-to-day operations
  • contributing to infrastructure projects, automation, and long-term improvements
  • act as a key technical voice within the team, working with stakeholders and third parties to deliver effective solutions
What we offer
What we offer
  • £2,000 yearly benefits allowance
  • Private Medical
  • Gym Membership
  • 9% Pension
  • 26 days annual Leave + bank holidays
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

I'm working with a leading charity as they look to appoint a Senior Infrastructu...
Location
Location
United Kingdom , London
Salary
Salary:
45000.00 - 50000.00 GBP / Year
understandingrecruitmentnfp.com Logo
Understanding Recruitment NFP
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience across Microsoft infrastructure (Azure, Active Directory, Entra, Microsoft 365)
  • Solid networking knowledge (firewalls, routing, switching, VPNs, WAN/LAN)
  • Proven troubleshooting and problem-solving skills across complex environments
  • Experience delivering infrastructure projects and driving improvements/automation (e.g. PowerShell)
  • Ability to communicate technical concepts clearly to both technical and non-technical
Job Responsibility
Job Responsibility
  • play a key role in shaping and supporting their technical environment
  • act as a key technical voice within the team, working with stakeholders and third parties to deliver effective solutions
  • contribute to infrastructure projects, automation, and long-term improvements
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

Senior Infrastructure Engineer is critical to maintaining the stability, securit...
Location
Location
Egypt
Salary
Salary:
Not provided
Nile Air
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, Information Technology, or a related field
  • Minimum of 5–7 years of hands-on experience in system administration and infrastructure management
  • Proven experience in enterprise environments with virtualization, security, and backup solutions
  • Strong expertise in Active Directory and Windows Server environments
  • Solid experience with VMware vSphere (vCenter/ESXi)
  • Good understanding of cybersecurity principles and endpoint protection systems
  • Experience in backup and disaster recovery solutions
  • Knowledge of networking fundamentals and system performance tuning
  • Strong troubleshooting, analytical, and problem-solving skills
  • Ability to manage multiple priorities in a dynamic environment
Job Responsibility
Job Responsibility
  • Manages and maintains Active Directory services, including users, computers, authentication, and authorization
  • Administers domain policies and ensure compliance with company standards
  • Oversees member domain services such as file servers, reporting servers, printing servers, Windows update servers, and SQL database servers
  • Manages SFTP services and external connectivity
  • Monitors system performance and implement upgrades for enhanced functionality and security
  • Maintains overall network and system security posture
  • Administers and monitors antivirus systems and security policies
  • Ensures timely patching, hotfix deployment, and vulnerability mitigation
  • Provides regular security and protection reports
  • Recommends and implements improvements based on risk assessments
  • Fulltime
Read More
Arrow Right