SRE- Clickhouse Team Job at PostHog

Job Description

We run one of the largest self-managed ClickHouse installations on AWS, already at petabyte scale, and we’re actively preparing it for the next 10–50× of growth. This role sits at the centre of that effort. You won’t be in a typical “keep the lights on” SRE role. The work is about turning a fast-growing, stateful system into a predictable, well-automated platform. You’ll work on the kind of problems that only show up at large scale (petabytes of data, thousands of cores, constant ingestion). You’ll have room to design and automate, not just respond to alerts.

Job Responsibility

Turning a fast-growing, stateful system into a predictable, well-automated platform (provisioning, scaling, rebalancing, recovery)
Reducing operational stress, designing safe automation for data-heavy workloads, and building tooling and patterns for scale
Managing large fleets of EC2-based VMs, disks, and networking for data-intensive workloads
Improving operational tooling around deploys, schema changes, backups, restores, and incident response
Working closely with ClickHouse engineers to turn database-level needs into infra-level solutions
Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation
Participating in on-call and incident response, with a focus on making incidents rarer over time

Requirements

Strong experience operating production infrastructure on AWS
Hands-on experience with VM-based systems (EC2), not just managed PaaS
Experience automating infrastructure using tools like Terraform, Ansible, or similar
Solid understanding of Linux systems (disk, memory, networking, failure modes)
Experience supporting stateful systems (databases, queues, storage systems, etc.)
Ability to debug and reason about performance and reliability issues in production
Comfortable owning systems end-to-end, including on-call responsibilities

Nice to have

Prior experience with ClickHouse or other analytical databases
Experience operating systems at very large data scale
Familiarity with Kubernetes (helpful, but not the core of this role)

PostHog - All Job Offers

Select Country

SRE- Clickhouse Team

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

SRE- Clickhouse Team

DevOps Engineer / SRE

Senior Software Engineer - Cloud Infrastructure & Observability

Senior Software Engineer - Cloud Infrastructure & Observability

Team Lead (Infrastructure)

Infra Team Lead

Senior Software Engineer, AI

DevOps Engineer / SRE

DevOps Engineer / SRE

Our AI answers in your language