CrawlJobs Logo

SRE- Clickhouse Team

United States · Job Posted February 21, 2026
Apply Position
Job Link Share

Job Description

We run one of the largest self-managed ClickHouse installations on AWS, already at petabyte scale, and we’re actively preparing it for the next 10–50× of growth. This role sits at the centre of that effort. You won’t be in a typical “keep the lights on” SRE role. The work is about turning a fast-growing, stateful system into a predictable, well-automated platform. You’ll work on the kind of problems that only show up at large scale (petabytes of data, thousands of cores, constant ingestion). You’ll have room to design and automate, not just respond to alerts.

Job Responsibility

  • Turning a fast-growing, stateful system into a predictable, well-automated platform (provisioning, scaling, rebalancing, recovery)
  • Reducing operational stress, designing safe automation for data-heavy workloads, and building tooling and patterns for scale
  • Managing large fleets of EC2-based VMs, disks, and networking for data-intensive workloads
  • Improving operational tooling around deploys, schema changes, backups, restores, and incident response
  • Working closely with ClickHouse engineers to turn database-level needs into infra-level solutions
  • Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation
  • Participating in on-call and incident response, with a focus on making incidents rarer over time

Requirements

  • Strong experience operating production infrastructure on AWS
  • Hands-on experience with VM-based systems (EC2), not just managed PaaS
  • Experience automating infrastructure using tools like Terraform, Ansible, or similar
  • Solid understanding of Linux systems (disk, memory, networking, failure modes)
  • Experience supporting stateful systems (databases, queues, storage systems, etc.)
  • Ability to debug and reason about performance and reliability issues in production
  • Comfortable owning systems end-to-end, including on-call responsibilities

Nice to have

  • Prior experience with ClickHouse or other analytical databases
  • Experience operating systems at very large data scale
  • Familiarity with Kubernetes (helpful, but not the core of this role)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

SRE- Clickhouse Team

8 matching positions

Database Reliability Engineer - Core Team

We are committed to providing our customers with reliable and secure services at...
Location
Location
United Kingdom
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI

LogicMonitor is advancing observability through AI‑driven data intelligence, con...
Location
Location
India , Pune
Salary
Salary:
Not provided
logicmonitor.com Logo
LogicMonitor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Data Engineering, or a related field
  • 4-5 years of experience in backend or data systems engineering
  • Experience building streaming data pipelines (Kafka / Spark or any similar technology)
  • Strong programming background in Java and Python, including microservice design
  • Experience with ETL, data modeling, and distributed storage systems
  • Familiarity with LLM pipelines, embeddings, and vector retrieval
  • Understanding of Kubernetes, containerization, and CI/CD workflows
  • Awareness of data governance, validation, and lineage best practices
  • Strong communication and collaboration across AI, Data, and Platform teams
Job Responsibility
Job Responsibility
  • Design and build streaming and batch data pipelines that process metrics, logs, and events for AI workflows
  • Develop ETL and feature‑extraction pipelines using Python and Java microservices
  • Integrate data ingestion and enrichment from multiple observability sources into AI‑ready formats
  • Build resilient data orchestration using Kafka, Airflow, and Redis Streams
  • Develop data indexing and semantic search for large‑scale observability and operational data
  • Work with structured and unstructured data lakes and warehouses (Delta Lake, Iceberg, ClickHouse)
  • Collaborate with the AI Platform team to manage embeddings, metadata, and model context storage
  • Optimize latency and throughput for retrieval, query expansion, and AI response generation
  • Build and maintain Java microservices (Spring Boot) that serve AI and analytics data to Edwin and AIOps applications
  • Develop Python APIs (FastAPI / LangGraph) for LLM orchestration, summarization, and correlation reasoning
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
United States
Salary
Salary:
140000.00 - 208000.00 USD / Year
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
Canada
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Postgres

ClickHouse is launching a strategic Postgres initiative to extend our developer-...
Location
Location
India
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years in software engineering, ideally with experience building or operating database or cloud platform systems
  • Deep understanding of Postgres — configuration, extensions, operations, and performance tuning
  • Strong programming experience in Ruby, Go, or Python (or willingness to work across languages)
  • Familiarity with cloud infrastructure, APIs, and automation tools (Terraform, Kubernetes, CI/CD)
  • Understanding of distributed systems, data replication, and service orchestration patterns
  • Pragmatic, detail-oriented, and comfortable with both greenfield development and operational ownership
  • Happy to contribute where needed — from backend APIs and platform automation to Postgres internals and debugging
  • Strong communicator who works effectively across teams in a fast-paced, cross-functional environment
  • Operate with a founder’s mindset — take initiative, move quickly, and care deeply about outcomes
Job Responsibility
Job Responsibility
  • Design and build backend services that orchestrate and manage database clusters in ClickHouse Cloud
  • Extend our platform control plane — written in Ruby, Go, and TypeScript — to support new Postgres capabilities
  • Contribute to automation and tooling that simplify cluster provisioning, scaling, and lifecycle management
  • Collaborate with infrastructure, SRE, and product teams to ensure operational excellence, performance, and reliability
  • Develop APIs and integrations that expose new Postgres functionality to customers and internal systems
  • Improve observability, deployment safety, and debugging workflows for database services
  • Participate in design discussions, code reviews, and on-call rotations, contributing to the overall reliability and velocity of the team
  • Operate with autonomy — identifying opportunities, driving execution, and delivering meaningful impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

Location
Location
India , Bengaluru
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands‑on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure‑as‑code (Terraform) and experience in multi‑cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross‑functional communication
  • effective collaboration with both technical and non‑technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open‑source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre‑aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi‑tenant, multi‑cluster, and multi‑cloud designs
  • Contribute improvements back to open source and CNCF‑aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off in accordance with local leave policies
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Cloud Infrastructure & Observability

We are building a next-generation observability and cloud platform that is high-...
Location
Location
United Kingdom , Cambridge
Salary
Salary:
Not provided
roku.com Logo
Roku
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience with software engineering with a track record of architecting distributed systems or platforms at scale
  • Strong hands-on experience in Golang and one scripting language (e.g., Python or Shell)
  • Experience operating observability at pb-scale ingestion and hundreds of millions of series
  • Expertise in observability platforms and tooling (Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse) and standards (OpenTelemetry, OpenMetrics)
  • Deep experience building systems of scale and operating cloud infrastructure with Kubernetes
  • strong proficiency with service mesh technologies (Istio/Envoy), infrastructure-as-code (Terraform) and experience in multi-cloud (AWS, GCP)
  • Demonstrated ability to evolve storage and query architectures for cost, scale, and latency (e.g., TSDB, Parquet, distributed processing)
  • Proven experience integrating security as part of infrastructure and platform development
  • Exceptional cross-functional communication
  • effective collaboration with both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Architect and lead Roku’s observability platform across metrics, logs, and traces
  • evolve data pipelines and storage layers optimized for high throughput, performance, and cost at Roku scale (TSDBs, Parquet, distributed processing)
  • Extend and harden open-source observability systems
  • overhaul core components (e.g., storage layers, query paths) to improve performance, reliability, and usability at scale
  • Implement features such as pre-aggregation, down-sampling, and sampling to reduce load and accelerate queries across the platform
  • Collaborate across platform, SRE, and product teams to migrate hundreds of workloads to our common platform
  • augment and automate CI/CD flows and onboarding
  • Integrate security into infrastructure and platform services
  • ensure robust multi-tenant, multi-cluster, and multi-cloud designs
  • Contribute improvements back to open source and CNCF-aligned projects
What we offer
What we offer
  • Global access to mental health and financial wellness support and resources
  • healthcare (medical, dental, and vision)
  • life, accident, disability, commuter, and retirement options (401(k)/pension)
  • time off work for vacation and other personal reasons
  • Fulltime
Read More
Arrow Right