This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join the Substrate Edge Team at Palantir, where we are responsible for mission-critical production infrastructure — encompassing hundreds of Kubernetes clusters across on-premises deployments, from large data centres to small-footprint edge devices. We are now seeking a Senior Infrastructure Engineer with specialised experience in Ceph to boost the scale and reliability of our ruggedised Kubernetes offerings under novel operating constraints.
Job Responsibility:
Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints
Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling
Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community
Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects
Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure
Requirements:
4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence
2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems
1+ year of being operationally responsible for production-grade Ceph clusters
Bachelor’s degree in Computer Science or equivalent practical experience
Nice to have:
Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances, ideally using Rook
Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent
Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages
Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools
Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred
On-premises Data Centre Experience: Experience working with on-premises hardware, or as sysadmin/SRE in data centres