This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe’s Hardware Systems Engineering team and close critical skill gaps in debugging, validation, and production support of high-performance compute systems. In this role, you will take ownership of the full hardware lifecycle—from prototype bring-up to large-scale production—while driving automation, deep issue resolution, and reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure. You will work closely with cross-functional teams to support, debug, and improve hardware platforms at scale, with a particular focus on PCIe, InfiniBand, and NVMe/storage, which have been identified as essential areas for deeper expertise. Your work will directly impact Crusoe’s ability to deploy and operate sustainable, AI-first compute systems with world-class performance and reliability.
Job Responsibility
Drive the full hardware development and sustaining lifecycle, including feasibility, bring-up, validation, deployment, and ongoing production support
Develop and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability improvements
Lead deep troubleshooting and debugging across PCIe (link training, topology, performance issues), InfiniBand (fabric debugging, throughput, connectivity issues), NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)
Conduct rigorous system validation and characterization for GPU, CPU, and high-performance compute platforms
Support E2E integration and solution testing to ensure Crusoe Cloud products meet performance, reliability, and scalability expectations
Collaborate with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level issues and enable stable production operation
Drive prototyping, qualification, and readiness for high-volume manufacturing with both internal teams and external vendors
Identify opportunities for new hardware technologies, testing methods, and sustainability improvements aligned with Crusoe’s long-term objectives
Provide data-driven insights to influence Crusoe’s hardware roadmap and reliability strategy
Requirements
8–10+ years of experience in hardware development, validation, sustaining engineering, or production engineering
Strong hands-on expertise in PCIe, InfiniBand, and NVMe/storage debugging and development
Deep proficiency in hardware bring-up, board-level debugging, and system-level validation
Ability to design and implement automation frameworks for hardware testing (Python, Shell, or similar)
Technical background in digital and analog design, server architecture, and high-performance compute hardware
Experience working across thermal, mechanical, firmware, and software functions in multidisciplinary environments
Strong analytical and problem-solving skills with a data-driven approach
Excellent communication and collaboration skills for working with internal teams and external partners
Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or equivalent experience
Nice to have
Experience designing or optimizing GPU-to-GPU communication architectures for AI/ML workloads
Direct experience integrating NVLink or other next-generation GPU interconnect technologies
Familiarity with cutting-edge GPU architectures and how to leverage them in AI/HPC environments
Expertise supporting or designing systems across both ARM and x86 server architectures
Background in sustainable or energy-efficient hardware design practices
Advanced certifications or coursework in AI/HPC hardware systems
What we offer
Competitive compensation
Restricted Stock Units
Paid time off & paid holidays
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary