This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Senior Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor senior engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally.
Job Responsibility:
Serve as highest-level escalation point for complex P1/P0 incidents
Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers
Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds
Design and improve node validation, burn-in processes, performance baselining, and release readiness