This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As the Director of Engineering, Cloud Availability, you will lead our engineering organization in Dublin, serving as a critical bridge between our European operations and our US headquarters. This is a high-impact leadership role where you will oversee Site Reliability Engineering (SRE), Network Engineering, and Data Center Infrastructure Engineering to ensure the global resiliency of Crusoe Cloud. You will be the primary "culture carrier" for our Dublin office, fostering a high-trust, high-performance environment that remains deeply integrated with our global mission to build the future of sustainable computing.
Job Responsibility:
Organizational Leadership: Partner closely with Data Center, Network, and SRE teams to build and scale a world-class engineering organization in Dublin
Site Leadership & Culture: Serve as the primary point of contact and face of Crusoe leadership in Dublin, proactively managing office sentiment and ensuring the team remains focused on high-impact objectives
Global Strategic Alignment: Build high-trust partnerships with US-based leadership to ensure local priorities are perfectly synchronized with the global business roadmap
Operational Excellence: Implement and refine "follow-the-sun" protocols to enable smooth hand-offs between time zones, ensuring zero customer disruption and 24/7 reliability
Unified Team Vision: Foster a "one-team" mindset across geographic boundaries, breaking down silos and promoting deep collaboration between Dublin and US offices
Talent Development: Level up the Dublin engineering team by identifying individual strengths and establishing a culture of mentorship to grow the next generation of Engineering Leads and ICs
Reliability Initiatives: Lead the development of SRE functions for IaaS and managed services, including Inference, SLURM, and automated cluster management
Requirements:
10+ years of engineering leadership experience with a proven track record of managing high-performing technical teams
Deep technical knowledge of public cloud infrastructure and experience building or operating large-scale platforms (Public, Private, or Hybrid)
Expert-level understanding of availability, observability, SLIs/SLOs, and modern incident management frameworks
Proven ability to lead remote teams and successfully collaborate with US-based engineering organizations
Demonstrated success navigating and leading within a matrix organizational structure
Strong familiarity with virtual and managed Kubernetes platforms, such as EKS, GKE, or AKS
The ability to balance long-term organizational strategy with the immediate tactical needs of a fast-growing engineering site
Nice to have:
AI/ML Infrastructure: Prior experience working with or building infrastructure platforms specifically tailored for AI and Machine Learning workloads
Startup Scaling: Experience navigating the rapid growth phases of a high-scale startup environment
Large-Scale Infrastructure: A background managing massive-scale infrastructure projects that exceed standard enterprise requirements
Advanced Reliability Architectures: Experience designing automated recovery systems and "self-healing" infrastructure at scale