CrawlJobs Logo

Ceph Infrastructure Engineer

palantir.com Logo

Palantir Technologies

Location Icon

Location:
United Kingdom , London

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Join the Substrate Edge Team at Palantir, where we are responsible for mission-critical production infrastructure — encompassing hundreds of Kubernetes clusters across on-premises deployments, from large data centres to small-footprint edge devices. We are now seeking a Senior Infrastructure Engineer with specialised experience in Ceph to boost the scale and reliability of our ruggedised Kubernetes offerings under novel operating constraints.

Job Responsibility:

  • Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints
  • Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling
  • Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community
  • Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects
  • Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure

Requirements:

  • 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence
  • 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems
  • 1+ year of being operationally responsible for production-grade Ceph clusters
  • Bachelor’s degree in Computer Science or equivalent practical experience

Nice to have:

  • Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances, ideally using Rook
  • Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent
  • Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages
  • Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools
  • Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred
  • On-premises Data Centre Experience: Experience working with on-premises hardware, or as sysadmin/SRE in data centres

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Ceph Infrastructure Engineer

Senior Infrastructure Engineer – Hosting

As a Senior Infrastructure Engineer – Hosting you will be responsible for the de...
Location
Location
United States
Salary
Salary:
150000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in Linux system administration, virtualization, and cloud infrastructure
  • Experience with Proxmox or other hypervisors (VMware, KVM, Xen, Hyper-V)
  • Experience with Ceph or SAN storage solutions for virtualization
  • Ability to manage kernel tuning, system performance, and process optimization
  • Hands-on experience with Ceph storage, ZFS, iSCSI, NFS, RAID, and SAN architectures
  • Understanding of storage performance metrics (IOPS, throughput, latency)
  • Ability to work on projects solo or with a team
  • Love for learning and improving code
  • Strong communication and collaboration skills
  • Experience with WordPress hosting, database replication, and caching techniques
Job Responsibility
Job Responsibility
  • Develop and design robust and scalable hardware solutions
  • Take ownership of projects from conception to deployment, ensuring timely delivery and meeting the specified requirements
  • Work closely with cross-functional teams, including IT, product management, and other software teams, to ensure seamless integration and alignment with business objectives
  • Deploy, configure, and maintain Proxmox VE clusters for virtualization or other hypervisors
  • Implement high-availability (HA) and failover solutions for virtual machines
  • Manage resource allocation (CPU, memory, disk, network) to optimize performance for hosted applications
  • Automate VM deployment and configuration using Ansible, Terraform, or SaltStack
  • Maintain backups and disaster recovery plans for virtualized environments
  • Design and manage Ceph clusters or SAN storage (iSCSI, NFS, ZFS, etc.) for high-performance, redundant storage
  • Monitor and optimize storage performance, including IOPS, latency, and throughput
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days. Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • The 4 holidays are: New Year’s Day, Fourth of July, Thanksgiving, and Christmas Day
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Use to make your remote work set up more comfortable, for continuing education classes, a plant for your desk, coffee for your coworker, a massage for yourself... really, whatever
  • Open concept office with friendly coworkers
  • Fulltime
Read More
Arrow Right

Platform Engineer

At evroc, we are building a secure, sovereign, and sustainable hyperscale cloud ...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
evroc.com Logo
Evroc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in distributed systems and Linux systems engineering
  • Strong understanding of various infrastructure technologies, including virtualization, containerization, and cloud computing
  • Coding in programming languages such as (but not exclusively) Golang or Rust
  • Experience in building and enhancing compute, storage, and data platforms with exposure to open source products like Kubernetes, Knative, Ceph, Rook and the like
  • Hands-on with infrastructure-as-code tools and automation, such as Terraform, Ansible, or Helm
  • Familiarity with software build processes and secure supply systems, like OpenSSF
  • Strong problem-solving and communication skills to effectively address complex platform engineering challenges
  • Applicants must possess a valid work permit.
Job Responsibility
Job Responsibility
  • Build and design the foundational infrastructure for other engineering teams and customers to build on
  • Create Infrastructure-as-Code deployments and large scale cluster configurations for managing our networking, storage, and compute resources
  • Seamlessly integrate and upkeep open-source components within our evolving tech stack
  • Team up with fellow engineers to craft tailored solutions meeting our unique challenges
  • Forge and refine tools that power team efficiency - this includes CI/CD, local development setups, build toolchains, and essential infrastructure
  • Plot the roadmap for software component development, aligning with team priorities and vision
  • Lead the charge in defining and achieving our technical benchmarks.
What we offer
What we offer
  • We offer a competitive salary and an equity package to attract the best
  • At evroc, diversity is our strength. We champion an inclusive environment where every background - ethnicity, age, gender identity, beliefs, and culture - is celebrated.
  • Fulltime
Read More
Arrow Right

Sr. Network Data Center Engineer

If you live and breathe networking, virtualization, and high-availability system...
Location
Location
United States
Salary
Salary:
150000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with Proxmox or other hypervisors (VMware, KVM, Xen, Hyper-V)
  • 5+ years of network engineering, data center operations, or cloud infrastructure
  • Experience with Ceph or SAN-based storage solutions (iSCSI, NFS, ZFS)
  • Experience with containers and networking
  • Excellent problem-solving skills and a keen eye for detail
  • Ability to work on projects solo or with a team
  • Love for learning and improving code
  • Strong communication and collaboration skills
  • Understanding of Ceph storage architecture (OSDs, MONs, MDS, RADOS, etc.)
  • Experience in iSCSI/NFS/ZFS SAN setups and performance tuning
Job Responsibility
Job Responsibility
  • Develop and design robust and scalable software solutions
  • Take ownership of projects from conception to deployment, ensuring timely delivery and meeting the specified requirements
  • Work closely with cross-functional teams, including IT, product management, and other software teams, to ensure seamless integration and alignment with business objectives
  • Stay updated with the latest industry trends, technologies, and best practices to bring innovative solutions to the table
  • Design, implement, and maintain a robust network architecture that supports Proxmox virtualization, Ceph/SAN storage, and container networking
  • Manage firewalls (iptables, pfSense, UFW, etc.) to secure access to virtualized environments and hosting services
  • Configure and optimize VLANs, subnets, and routing to ensure isolated and secure network segments for virtual machines, storage, and frontend applications
  • Configure and maintain VPNs, BGP, OSPF, or other routing protocols to ensure proper network redundancy and failover
  • Set up and maintain bridged, NAT, and VXLAN networking in Proxmox for efficient VM communication
  • Implement high-availability (HA) networking for Hypervisor networks and Ceph/SAN clusters
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days. Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • The 4 holidays are: New Year’s Day, Fourth of July, Thanksgiving, and Christmas Day
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Use to make your remote work set up more comfortable, for continuing education classes, a plant for your desk, coffee for your coworker, a massage for yourself... really, whatever
  • Open concept office with friendly coworkers
  • Fulltime
Read More
Arrow Right
New

Ceph Infrastructure Engineer

Join the Substrate Edge Team at Palantir, where we are responsible for mission-c...
Location
Location
United States , New York
Salary
Salary:
135000.00 - 200000.00 USD / Year
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence
  • 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems
  • 1+ year of being operationally responsible for production grade Ceph clusters
  • Bachelor’s degree in Computer Science or equivalent practical experience
  • Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances ideally using rook
  • Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent
  • Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages
  • Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools
  • Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred
  • On-premise datacenter experience: Experience working with on-premise hardware, or sysadmin/SRE in data centers
Job Responsibility
Job Responsibility
  • Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints
  • Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling
  • Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community
  • Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects
  • Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure
What we offer
What we offer
  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Fulltime
Read More
Arrow Right

HPC Principal Federal Technical Consultant

In this role, you will serve as a trusted technical advisor for customers, guidi...
Location
Location
United States
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience, with at least 3+ in HPC architecture, systems engineering, or large-scale infrastructure design
  • advanced degree in Computer Science, Engineering, Physics, or related technical field (or equivalent experience)
  • proven ability to design and deliver complex, multi-vendor HPC solutions at scale
  • demonstrated ability to independently complete solution implementations and application design deliverables
  • must be United States Citizen due to the responsibilities and requirements of the role as this will be supporting a Federal site
  • Top Secret Clearance, TS/SCI with Full Scope Polygraph (FSP)
  • must be willing to travel as the business dictates
  • expertise in one or more of the following: parallel computing, MPI/OpenMP, GPU acceleration, workload schedulers (Slurm, Altair PBS Pro, Torque/MOAB, etc.), or large-scale data storage systems (Lustre, GPFS, Ceph)
  • experience with Network boot technologies (PXE or gPXE/Etherboot etc)
  • storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
Job Responsibility
Job Responsibility
  • Lead the technical implementation design and delivery of world-class scale HPC solutions, from requirements gathering to implementation
  • provide architectural guidance on compute, storage, networking, and workload management tailored to customer use cases
  • configure, deploy, and maintain Linux-based HPC clusters, associated storage, and network infrastructure
  • work in close collaboration with customers on finalizing and deploying HPC software applications, hosting platforms, and management systems that enable customer research and production workloads
  • provide technical support and troubleshooting for HPC implementation in secure locations
  • work on both operational support and strategic HPC projects
  • actively participate in customer user group environments
  • evaluate and implement new tools, middleware, and methodologies to improve operations and service delivery
  • ensure compliance with enterprise IT security and technology controls
  • act as principal consultant in customer engagements, often leading cross-functional project teams
What we offer
What we offer
  • comprehensive suite of benefits that supports physical, financial, and emotional wellbeing
  • programs catered to helping employees reach any career goals
  • inclusive work environment.
  • Fulltime
Read More
Arrow Right

HPC Principal Federal Technical Consultant

Principal Consultant to join our High-Performance Computing (HPC) team. In this ...
Location
Location
United States
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience, with at least 3+ in HPC architecture, systems engineering, or large-scale infrastructure design
  • Advanced degree in Computer Science, Engineering, Physics, or related technical field (or equivalent experience)
  • Proven ability to design and deliver complex, multi-vendor HPC solutions at scale
  • Demonstrated ability to independently complete solution implementations and application design deliverables
  • Must be United States Citizen due to the responsibilities and requirements of the role as this will be supporting a Federal site
  • Top Secret Clearance, TS/SCI with Full Scope Polygraph (FSP)
  • Must be willing to travel as the business dictates
  • Expertise in one or more of the following: parallel computing, MPI/OpenMP, GPU acceleration, workload schedulers (Slurm, Altair PBS Pro, Torque/MOAB, etc.), or large-scale data storage systems (Lustre, GPFS, Ceph)
  • Experience with Network boot technologies (PXE or gPXE/Etherboot etc)
  • Storage specific knowledge: LVM, RAID, iSCSI, Disk partitioning (GPT, MBR)
Job Responsibility
Job Responsibility
  • Lead the technical implementation design and delivery of world class scale HPC solutions, from requirements gathering to implementation
  • Provide architectural guidance on compute, storage, networking, and workload management tailored to customer use cases
  • Configure, deploy, and maintain Linux-based HPC clusters, associated storage, and network infrastructure
  • Work in close collaboration with customers on finalizing and deploying HPC software applications, hosting platforms, and management systems that enable customer research and production workloads
  • Provide technical support and troubleshooting for HPC implementation in secure locations
  • Work on both operational support and strategic HPC projects
  • actively participate in customer user group environments
  • Evaluate and implement new tools, middleware, and methodologies to improve operations and service delivery
  • Ensure compliance with enterprise IT security and technology controls
  • Act as principal consultant in customer engagements, often leading cross-functional project teams (including customer staff)
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Infrastructure Engineer

We are looking for an experienced Infrastructure Engineer to join a multi-discip...
Location
Location
United Kingdom , Cheltenham; London; Ipswich
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Infrastructure Architecture/Design
  • Infrastructure Configuration
  • Troubleshooting
  • Programming/Scripting
  • Performance Monitoring
  • Experience in: Operating Systems: Linux
  • Networking (Design, Configuration, Testing, Security): Dell Networking, Cisco ASA, Juniper SRX
  • Storage solutions: Ceph, iSCSI, Dell PowerVault
  • Virtualisation: VMware, Proxmox
  • Host provisioning: Foreman, PXE boot
Job Responsibility
Job Responsibility
  • Support the maintenance of existing IT infrastructure
  • System administration tasks
  • Automating repeated manual tasks
  • Troubleshooting issues
  • System health monitoring
  • Developing monitoring solutions to identify system inefficiencies
  • Patching vulnerabilities
  • Be hands on in the transformation/modernisation of existing infrastructure, or creation of new
  • Evaluating new technologies against requirements
  • Developing proof of concepts
What we offer
What we offer
  • Competitive salary
  • BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
  • On-call allowance (Depending on job role requirements)
  • 25 days annual leave (not including bank holidays), increasing with service
  • Huge range of flexible benefits including cycle to work, healthcare, season ticket loan
  • World-class training and development opportunities
  • From January 2025, equal family leave: receive 18 weeks at full pay, 8 weeks at half pay and 26 weeks at the statutory rate
  • Enhanced women’s health support: including help with menopause symptoms, cancer screenings, period care and more
  • 24/7 private virtual GP appointments for UK colleagues
  • 2 weeks paid carer’s leave
  • Fulltime
Read More
Arrow Right
New

Staff Engineer, Distributed Storage, HPC & AI Infrastructure

In this role, you will design and deliver multi-petabyte storage systems purpose...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in storage engineering with 3+ years managing distributed storage at multi-petabyte scale
  • Proven track record deploying and operating high-performance storage for GPU/HPC clusters
  • Deep Kubernetes and cloud-native storage experience in production environments
  • Strong coding skills in Go and Python with demonstrated ability to build production-grade tools
  • BS/MS in Computer Science, Engineering, or equivalent practical experience
  • History of technical leadership: designing systems that significantly improved performance (>3x), reliability (99.9%+ uptime), or cost efficiency
  • Distributed Storage Systems: Deep expertise in WekaFS, Lustre, GPFS, BeeGFS, or similar parallel filesystems at multi-petabyte scale
  • Object Storage: Production experience with S3, MinIO, Ceph, or R2 including performance optimization and cost management
  • Kubernetes Storage: CSI drivers, StatefulSets, PersistentVolumes, storage operators, and custom controllers
  • Storage optimization for GPU workloads, RDMA/InfiniBand networking, parallel filesystem optimization (100+ GB/s aggregate cluster throughput)
Job Responsibility
Job Responsibility
  • Design multi-petabyte AI/ML storage systems
  • integrate WekaFS, Ceph, etc.
  • lead capacity planning and cost optimization (30-50% savings via tiering, lifecycle policies, right-sizing)
  • Design/optimize RDMA, InfiniBand, 400GbE networks
  • tune for max throughput/min latency
  • implement NVMe-oF/iSCSI
  • troubleshoot bottlenecks
  • optimize TCP/IP for storage
  • Build Kubernetes storage operators/controllers
  • enable automated provisioning, self-service abstractions, multi-tenant isolation, quotas
Read More
Arrow Right