CrawlJobs Logo

Ceph Infrastructure Engineer

United States, New York 135000.00 - 200000.00 USD / Year · Job Posted February 18, 2026
Apply Position
Job Link Share

Job Description

Join the Substrate Edge Team at Palantir, where we are responsible for mission-critical production infrastructure — encompassing hundreds of Kubernetes clusters across on-premise deployments - from large datacenter to small footprint edge devices. We are now seeking a Senior Infrastructure Engineer with specialized experience in Ceph to boost the scale, and reliability of our ruggedized Kubernetes offerings under novel operating constraints. As a Senior Infrastructure Engineer, you will leverage your expertise in Ceph to manage and optimize storage solutions at scale, ensuring seamless integration with our Kubernetes infrastructure. You will play a key role in automating hundreds of deployments of Ceph clusters on heterogeneous hardware, and making core contributions to the broader Ceph and CNCF communities.

Job Responsibility

  • Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints
  • Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling
  • Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community
  • Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects
  • Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure

Requirements

  • 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence
  • 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems
  • 1+ year of being operationally responsible for production grade Ceph clusters
  • Bachelor’s degree in Computer Science or equivalent practical experience
  • Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances ideally using rook
  • Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent
  • Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages
  • Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools
  • Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred
  • On-premise datacenter experience: Experience working with on-premise hardware, or sysadmin/SRE in data centers

What we offer

  • Employees (and their eligible dependents) can enroll in medical, dental, and vision insurance as well as voluntary life insurance
  • Employees are automatically covered by Palantir’s basic life, AD&D and disability insurance
  • Commuter benefits
  • Take what you need paid time off, not accrual based
  • 2 weeks paid time off built into the end of each year (subject to team and business needs)
  • 10 paid holidays throughout the calendar year
  • Supportive leave of absence program including time off for military service and medical events
  • Paid leave for new parents and subsidized back-up care for all parents
  • Fertility and family building benefits including but not limited to adoption, surrogacy, and preservation
  • Stipend to help with expenses that come with a new child
  • Employees can enroll in Palantir’s 401k plan

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Ceph Infrastructure Engineer

8 matching positions

Ceph Infrastructure Engineer

Join the Substrate Edge Team at Palantir, where we are responsible for mission-c...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
palantir.com Logo
Palantir Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence
  • 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems
  • 1+ year of being operationally responsible for production-grade Ceph clusters
  • Bachelor’s degree in Computer Science or equivalent practical experience
Job Responsibility
Job Responsibility
  • Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints
  • Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling
  • Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community
  • Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects
  • Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure
  • Fulltime
Read More
Arrow Right

Senior System Administrator II [Ceph Engineer]

This team is part of our Platform Engineering team in India. The purpose of Plat...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Linux systems administration with a focus on storage (e.g., Ceph)
  • Proven experience with enterprise storage systems (e.g., Purestorage, NetApp, EMC, Dell, IBM, Hitachi)
  • Strong understanding of RAID, LVM, NFS, iSCSI, multipathing, and file systems (ext4, XFS, ZFS, etc.)
  • Hands-on experience with enterprise storage systems and clustering technologies
  • Proficiency in shell scripting
  • Python, Puppet or Ansible knowledge
  • Experience with virtualization technologies such as VMware or KVM
  • Familiarity with cloud storage solutions like AWS S3 and EBS
  • Experience with monitoring tools (Nagios, Prometheus, Zabbix, etc.)
Job Responsibility
Job Responsibility
  • Design and deploy storage infrastructure (SAN, NAS, DAS, Object Storage) in accordance with organizational needs
  • Monitor and maintain storage systems to guarantee optimal performance, availability, and reliability
  • Perform capacity planning and forecasting for future storage needs
  • Develop and implement strategies for backup, replication, and disaster recovery
  • Troubleshoot and resolve issues related to storage systems
  • Maintain documentation of configurations, procedures, and standards
  • Collaborate with other Platform Engineering teams on system upgrades, migrations, and integrations
  • Automate storage provisioning and management using scripting or infrastructure-as-code tools
Read More
Arrow Right

System Administrator II [Ceph Engineer]

As a storage engineer of our Storage team in Bengaluru, you will be responsible ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of experience in Linux systems administration with a focus on storage (e.g., Ceph) {MANDATE}
  • Proven experience with enterprise storage systems (e.g., Purestorage, NetApp, EMC, Dell, IBM, Hitachi). {MANDATE}
  • You have a strong understanding of RAID, LVM, NFS, iSCSI, multipathing, and file systems (ext4, XFS, ZFS, etc.) {MANDATE}
  • Hands-on experience with enterprise storage systems and clustering technologies
  • Proficiency in shell scripting
  • Python, Puppet or Ansible knowledge experience with virtualization technologies such as VMware or KVM
  • Familiarity with cloud storage solutions like AWS S3 and EBS
  • Experience with monitoring tools (Nagios, Prometheus, Zabbix, etc.)
Job Responsibility
Job Responsibility
  • Design and deploy storage infrastructure (SAN, NAS, DAS, Object Storage) in accordance with organizational needs
  • Monitor and maintain storage systems to guarantee optimal performance, availability, and reliability
  • Perform capacity planning and forecasting for future storage needs
  • Develop and implement strategies for backup, replication, and disaster recovery
  • Troubleshoot and resolve issues related to storage systems
  • Maintain documentation of configurations, procedures, and standards
  • Collaborate with other Platform Engineering teams on system upgrades, migrations, and integrations
  • Automate storage provisioning and management using scripting or infrastructure-as-code tools
  • Fulltime
Read More
Arrow Right

Storage Engineer (Ceph)

As a storage engineer of our Storage team in Chicago, you will be responsible fo...
Location
Location
United States , Chicago
Salary
Salary:
180000.00 - 243000.00 USD / Year
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5+ years of experience in Linux systems administration with a focus on storage (e.g., Ceph)
  • Proven experience with enterprise storage systems (e.g., Purestorage, NetApp, EMC, Dell, IBM, Hitachi)
  • You have a strong understanding of RAID, LVM, NFS, iSCSI, multipathing, and file systems (ext4, XFS, ZFS, etc.)
  • Hands-on experience with enterprise storage systems and clustering technologies
  • Proficiency in shell scripting
  • You have Python, Puppet or Ansible knowledge
  • Experience with virtualization technologies such as VMware or KVM
  • You have familiarity with cloud storage solutions like AWS S3 and EBS
  • Experience with monitoring tools (Nagios, Prometheus, Zabbix, etc.)
Job Responsibility
Job Responsibility
  • Design and deploy storage infrastructure (SAN, NAS, DAS, Object Storage) in accordance with organizational needs
  • Monitor and maintain storage systems to guarantee optimal performance, availability, and reliability
  • Perform capacity planning and forecasting for future storage needs
  • Develop and implement strategies for backup, replication, and disaster recovery
  • Troubleshoot and resolve issues related to storage systems
  • Maintain documentation of configurations, procedures, and standards
  • Collaborate with other Platform Engineering teams on system upgrades, migrations, and integrations
  • Automate storage provisioning and management using scripting or infrastructure-as-code tools.
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer – Hosting

As a Senior Infrastructure Engineer – Hosting you will be responsible for the de...
Location
Location
United States
Salary
Salary:
150000.00 USD / Year
corporatetools.com Logo
Corporate Tools
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-5 years of experience in Linux system administration, virtualization, and cloud infrastructure
  • Experience with Proxmox or other hypervisors (VMware, KVM, Xen, Hyper-V)
  • Experience with Ceph or SAN storage solutions for virtualization
  • Ability to manage kernel tuning, system performance, and process optimization
  • Hands-on experience with Ceph storage, ZFS, iSCSI, NFS, RAID, and SAN architectures
  • Understanding of storage performance metrics (IOPS, throughput, latency)
  • Ability to work on projects solo or with a team
  • Love for learning and improving code
  • Strong communication and collaboration skills
  • Experience with WordPress hosting, database replication, and caching techniques
Job Responsibility
Job Responsibility
  • Develop and design robust and scalable hardware solutions
  • Take ownership of projects from conception to deployment, ensuring timely delivery and meeting the specified requirements
  • Work closely with cross-functional teams, including IT, product management, and other software teams, to ensure seamless integration and alignment with business objectives
  • Deploy, configure, and maintain Proxmox VE clusters for virtualization or other hypervisors
  • Implement high-availability (HA) and failover solutions for virtual machines
  • Manage resource allocation (CPU, memory, disk, network) to optimize performance for hosted applications
  • Automate VM deployment and configuration using Ansible, Terraform, or SaltStack
  • Maintain backups and disaster recovery plans for virtualized environments
  • Design and manage Ceph clusters or SAN storage (iSCSI, NFS, ZFS, etc.) for high-performance, redundant storage
  • Monitor and optimize storage performance, including IOPS, latency, and throughput
What we offer
What we offer
  • 100% employer-paid medical, dental and vision for employees
  • Annual review with raise option
  • 22 days Paid Time Off accrued annually, and 4 holidays
  • After 3 years, PTO increases to 29 days. Employees transition to flexible time off after 5 years with the company—not accrued, not capped, take time off when you want
  • The 4 holidays are: New Year’s Day, Fourth of July, Thanksgiving, and Christmas Day
  • Paid Parental Leave
  • Up to 6% company matching 401(k) with no vesting period
  • Quarterly allowance
  • Use to make your remote work set up more comfortable, for continuing education classes, a plant for your desk, coffee for your coworker, a massage for yourself... really, whatever
  • Open concept office with friendly coworkers
  • Fulltime
Read More
Arrow Right

Infrastructure Engineer

We are looking for an experienced Infrastructure Engineer to join a multi-discip...
Location
Location
United Kingdom , Cheltenham; London; Ipswich
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Infrastructure Architecture/Design
  • Infrastructure Configuration
  • Troubleshooting
  • Programming/Scripting
  • Performance Monitoring
  • Experience in: Operating Systems: Linux
  • Networking (Design, Configuration, Testing, Security): Dell Networking, Cisco ASA, Juniper SRX
  • Storage solutions: Ceph, iSCSI, Dell PowerVault
  • Virtualisation: VMware, Proxmox
  • Host provisioning: Foreman, PXE boot
Job Responsibility
Job Responsibility
  • Support the maintenance of existing IT infrastructure
  • System administration tasks
  • Automating repeated manual tasks
  • Troubleshooting issues
  • System health monitoring
  • Developing monitoring solutions to identify system inefficiencies
  • Patching vulnerabilities
  • Be hands on in the transformation/modernisation of existing infrastructure, or creation of new
  • Evaluating new technologies against requirements
  • Developing proof of concepts
What we offer
What we offer
  • Competitive salary
  • BT Pension scheme, minimum 5% Employee contribution, BT contribution 10%
  • On-call allowance (Depending on job role requirements)
  • 25 days annual leave (not including bank holidays), increasing with service
  • Huge range of flexible benefits including cycle to work, healthcare, season ticket loan
  • World-class training and development opportunities
  • From January 2025, equal family leave: receive 18 weeks at full pay, 8 weeks at half pay and 26 weeks at the statutory rate
  • Enhanced women’s health support: including help with menopause symptoms, cancer screenings, period care and more
  • 24/7 private virtual GP appointments for UK colleagues
  • 2 weeks paid carer’s leave
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Engineer

As a Senior Infrastructure Engineer in the IT department, you provide the critic...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
iceye.com Logo
ICEYE
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Core Technical Profile: The T-Shaped Engineer
  • Virtualization & Compute: Experience with VMware or Proxmox is essential
  • Familiarity with shared storage backends (SAN, iSCSI, Ceph) and disaster recovery planning
  • Hybrid Cloud (AWS): Understand VPC architecture (Transit Gateways, Peering), IAM security boundaries, and hybrid connectivity (VPN/Direct Connect)
  • Advanced Networking: Comfortable configuring L2/L3 switching, debugging routing issues, and managing firewall rulesets and VPN configurations
  • Experience with Palo Alto and Cisco
  • Infrastructure as Code (IaC): Experience with Terraform (state management, modules) and Ansible (playbook optimization)
  • OS Administration: Deep internal knowledge of Windows Server or Linux environments
  • Capabilities in performance tuning, kernel diagnostics, Active Directory integration, and automated patching strategies
  • Hybrid Requirement: A hybrid presence is required
Job Responsibility
Job Responsibility
  • Own, build, and evolve our core infrastructure to enable scalable global satellite operations
  • Own and modernize our corporate infrastructure
  • Managing the physical and virtual backbone
  • Building resilient on-premise clusters
  • Optimizing hybrid networks
  • Transitioning our systems administration into a disciplined, code-driven practice
  • Infrastructure Ownership: Assume responsibility for core segments, including on-prem clusters and network segments, identifying technical debt and executing a plan to address it
  • Build & Stabilize: Architect and deploy new on-premise clusters while simultaneously stabilizing legacy systems
  • Cloud Expansion: Lead the initiative to build new infrastructure on public cloud services, ensuring seamless integration with our existing on-premise footprint
  • Modernization: Drive the transition from manual administrative tasks to automated, reproducible workflows using Terraform and Ansible
What we offer
What we offer
  • Occupational healthcare, occupational, and accident insurance
  • A yearly benefit budget to spend as you wish (i.e. on sport, transport, bike benefit, wellness, lunch, etc.)
  • Phone subscription with iPhone of choice
  • Relocation support (i.e. flight tickets, accommodation, relocation agency support)
  • Time for self-development, research, training, conferences, or certification schemes
  • Inspiring and collaborating offices and silent workspaces enable you to focus
Read More
Arrow Right

Staff Engineer, Distributed Storage, HPC & AI Infrastructure

In this role, you will design and deliver multi-petabyte storage systems purpose...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in storage engineering with 3+ years managing distributed storage at multi-petabyte scale
  • Proven track record deploying and operating high-performance storage for GPU/HPC clusters
  • Deep Kubernetes and cloud-native storage experience in production environments
  • Strong coding skills in Go and Python with demonstrated ability to build production-grade tools
  • BS/MS in Computer Science, Engineering, or equivalent practical experience
  • History of technical leadership: designing systems that significantly improved performance (>3x), reliability (99.9%+ uptime), or cost efficiency
  • Distributed Storage Systems: Deep expertise in WekaFS, Lustre, GPFS, BeeGFS, or similar parallel filesystems at multi-petabyte scale
  • Object Storage: Production experience with S3, MinIO, Ceph, or R2 including performance optimization and cost management
  • Kubernetes Storage: CSI drivers, StatefulSets, PersistentVolumes, storage operators, and custom controllers
  • Storage optimization for GPU workloads, RDMA/InfiniBand networking, parallel filesystem optimization (100+ GB/s aggregate cluster throughput)
Job Responsibility
Job Responsibility
  • Design multi-petabyte AI/ML storage systems
  • integrate WekaFS, Ceph, etc.
  • lead capacity planning and cost optimization (30-50% savings via tiering, lifecycle policies, right-sizing)
  • Design/optimize RDMA, InfiniBand, 400GbE networks
  • tune for max throughput/min latency
  • implement NVMe-oF/iSCSI
  • troubleshoot bottlenecks
  • optimize TCP/IP for storage
  • Build Kubernetes storage operators/controllers
  • enable automated provisioning, self-service abstractions, multi-tenant isolation, quotas
Read More
Arrow Right