CrawlJobs Logo

Database Reliability Engineer - Core Team

United Kingdom · Job Posted December 07, 2025
Apply Position
Job Link Share

Job Description

We are committed to providing our customers with reliable and secure services at ClickHouse. To continue this, we are building out our Site Reliability Engineering team in ClickHouse Core. As one of the first members of our Reliability Engineering Team at Core, you will be responsible for building and leading processes to ensure and improve the reliability, availability, scalability, and performance of ClickHouse. You will collaborate with different teams like Control Plane, Dataplane,Security, Support and Operations and guide them to implement ClickHouse in the best way for our customers. You will also own the areas of managing engineering escalation management and response, investigations, post-mortem analysis including running blameless postmortems, and continuous improvement of how Clickhouse is run and optimized in the cloud. This role is a unique opportunity to make a significant impact on our elastic, limitless scale, high-performance ClickHouse in ClickHouse Cloud.

Job Responsibility

  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills

What we offer

  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Database Reliability Engineer - Core Team

8 matching positions

Senior Database Reliability Engineer

We are looking for a Senior Database Reliability Engineer to join the Doctolib D...
Location
Location
France , Nantes
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience as a Database Reliability Engineer, or SRE with strong database expertise, or DBA with strong inclination towards site reliability practices
  • Strong expertise in at least one of our core datastores (PostgreSQL, Couchbase or OpenSearch) or an equivalent technology, in a high-load and high-availability environment
  • Hands-on experience with Kafka or an equivalent technology
  • Hands-on experience with Infrastructure as Code (Terraform), containerization technologies (Docker, Kubernetes), and cloud environments (AWS, Azure or Google Cloud) in a production environment
  • Fluent in English
Job Responsibility
Job Responsibility
  • Reliability Engineering: Ensure high availability of our datastores and our own operational efficiency through observability, automation, tooling, backup strategies, disaster recovery planning and process improvement
  • Database maintenance: Handle database infrastructure maintenance including upgrades, performance optimization, cost optimization and capacity planning
  • Database-as-a-service: Empower feature teams with database tooling, guidelines, training and support to manage and use their databases efficiently
  • Incident mitigation: Participate in incident response for datastore-related issues in production during business hours, when the issue cannot be fixed by the responsible feature team
  • Datastore expertise sharing: Stay up-to-date with datastore technology evolution and share knowledge with the team on new features, best practices, and industry trends
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Staff Database Reliability Engineer

We are looking for a Staff Database Reliability Engineer to join the Doctolib Da...
Location
Location
France , Nantes
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience as a Database Reliability Engineer, or SRE with strong database expertise, or DBA with strong inclination towards site reliability practices
  • Strong expertise in at least one of our core datastores (PostgreSQL, Couchbase or OpenSearch) or an equivalent technology, in a high-load and high-availability environment
  • Hands-on experience with Kafka or an equivalent technology
  • Hands-on experience with Infrastructure as Code (Terraform), containerization technologies (Docker, Kubernetes), and cloud environments (AWS, Azure or Google Cloud) in a production environment
  • Fluent in English
Job Responsibility
Job Responsibility
  • Reliability Engineering: Ensure high availability of our datastores and our own operational efficiency through observability, automation, tooling, backup strategies, disaster recovery planning and process improvement
  • Database maintenance: Handle database infrastructure maintenance including upgrades, performance optimization, cost optimization and capacity planning
  • Database-as-a-service: Empower feature teams with database tooling, guidelines, training and support to manage and use their databases efficiently
  • Incident mitigation: Participate in incident response for datastore-related issues in production during business hours, when the issue cannot be fixed by the responsible feature team
  • Datastore expertise sharing: Stay up-to-date with datastore technology evolution and share knowledge with the team on new features, best practices, and industry trends
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Staff Database Reliability Engineer

We are looking for a Staff Database Reliability Engineer to join the Doctolib Da...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience as a Database Reliability Engineer, or SRE with strong database expertise, or DBA with strong inclination towards site reliability practices
  • Strong expertise in at least one of our core datastores (PostgreSQL, Couchbase or OpenSearch) or an equivalent technology, in a high-load and high-availability environment
  • Hands-on experience with Kafka or an equivalent technology
  • Hands-on experience with Infrastructure as Code (Terraform), containerization technologies (Docker, Kubernetes), and cloud environments (AWS, Azure or Google Cloud) in a production environment
  • Fluent in English
Job Responsibility
Job Responsibility
  • Reliability Engineering: Ensure high availability of our datastores and our own operational efficiency through observability, automation, tooling, backup strategies, disaster recovery planning and process improvement
  • Database maintenance: Handle database infrastructure maintenance including upgrades, performance optimization, cost optimization and capacity planning
  • Database-as-a-service: Empower feature teams with database tooling, guidelines, training and support to manage and use their databases efficiently
  • Incident mitigation: Participate in incident response for datastore-related issues in production during business hours, when the issue cannot be fixed by the responsible feature team
  • Datastore expertise sharing: Stay up-to-date with datastore technology evolution and share knowledge with the team on new features, best practices, and industry trends
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Senior Database Reliability Engineer

We are looking for a Senior Database Reliability Engineer to join the Doctolib D...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience as a Database Reliability Engineer, or SRE with strong database expertise, or DBA with strong inclination towards site reliability practices
  • Strong expertise in at least one of our core datastores (PostgreSQL, Couchbase or OpenSearch) or an equivalent technology, in a high-load and high-availability environment
  • Hands-on experience with Kafka or an equivalent technology
  • Hands-on experience with Infrastructure as Code (Terraform), containerization technologies (Docker, Kubernetes), and cloud environments (AWS, Azure or Google Cloud) in a production environment
  • Fluent in English
Job Responsibility
Job Responsibility
  • Reliability Engineering: Ensure high availability of our datastores and our own operational efficiency through observability, automation, tooling, backup strategies, disaster recovery planning and process improvement
  • Database maintenance: Handle database infrastructure maintenance including upgrades, performance optimization, cost optimization and capacity planning
  • Database-as-a-service: Empower feature teams with database tooling, guidelines, training and support to manage and use their databases efficiently
  • Incident mitigation: Participate in incident response for datastore-related issues in production during business hours, when the issue cannot be fixed by the responsible feature team
  • Datastore expertise sharing: Stay up-to-date with datastore technology evolution and share knowledge with the team on new features, best practices, and industry trends
What we offer
What we offer
  • Company health insurance through our partner Allianz
  • Minimum 28 days of paid leave
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • A flexible workplace policy offering both hybrid and office-based mode
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Reimbursement of public transportation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer - Core

We are looking for a Site Reliability Engineer to join our Core team to encourag...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
blockchain.com Logo
Blockchain
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with containerization and service orchestration, including best practices and security
  • Strong knowledge of at least one programming language
  • Linux, including an understanding of resource allocation, network and/or internals
  • Experience working with cloud solutions (GCP or AWS)
  • Deep understanding and demonstrable experience with modern monitoring tools such as Prometheus, Datadog, Grafana, Telegraf
  • Experience with infrastructure as code tools
  • Solid background with configuration management tools
  • Experience with using GitOps and CI to make changes, preferably Github Actions
  • Experience with messaging systems such as Kafka
  • Experience with database management
Job Responsibility
Job Responsibility
  • Play a critical role in evolving our infrastructure as we develop solutions to complex technical problems involving reliability, latency, bandwidth and most importantly security
  • Be an integral part of improving observability, monitoring and alerting throughout the platform
  • Help co-ordinate work across different areas of the company to ensure the most efficient path of execution
  • Centralize wherever possible common streams of work that are currently duplicated across developer teams
  • Focus heavily on writing tooling to replace manual, repetitive work in a scalable way
  • Work in a fast paced, and dynamic environment complementing our existing high calibre team
What we offer
What we offer
  • Full-time salary based on experience and meaningful equity in an industry-leading company
  • Hybrid model working from home & awesome office location in the heart of London
  • Unlimited vacation policy
  • work hard and take time when you need it
  • Work from Anywhere Policy: You can work remotely from anywhere in the world for up to 20 days per year
  • Apple equipment
  • The opportunity to be a key player and build your career at a rapidly expanding, global technology company in an emerging field
  • Flexible work culture
  • Fulltime
Read More
Arrow Right

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Netherlands
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
  • Fulltime
Read More
Arrow Right

Database Reliability Engineer

We are committed to providing our customers with reliable and secure services at...
Location
Location
Germany
Salary
Salary:
Not provided
clickhouse.com Logo
ClickHouse
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science or a related field
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
  • Previous experience operating ClickHouse or other SQL databases in production
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • You are a strong problem-solver and have solid production debugging skills
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
  • You have a high level of responsibility, ownership, and accountability
  • Excellent communication skills
Job Responsibility
Job Responsibility
  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
  • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
  • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
What we offer
What we offer
  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries
  • Healthcare - Employer contributions towards your healthcare
  • Equity in the company - Every new team member who joins our company receives stock options
  • Time off - Flexible time off in the US, generous entitlement in other countries
  • A $500 Home office setup if you’re a remote employee
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites
Read More
Arrow Right

Site Reliability Engineer

NetApp is looking for a Senior TechOps Engineer - Cassandra to join our growing ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in Apache Cassandra administration and architecture, with a desire to continuously learn and develop to an expert level
  • Experience in diagnosing and recommending mitigation strategies for Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modeling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries
  • Hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, schema disagreement resolution, and configuration management
  • Experience handling Cassandra maintenance activities, including upgrades and migrations
  • Ability to investigate and research Cassandra issues by reviewing the Apache Cassandra codebase
  • Strong knowledge and experience with Linux, with the ability to work comfortably from the command line
  • Exceptional ability to communicate clearly and professionally in written and verbal English
  • Experience working with at least one public cloud platform, preferably AWS
  • Prior IT customer service or support experience within an ITIL-based environment
  • Strong fundamental computer science and software engineering skills, particularly in operating system internals, memory management, and networking
Job Responsibility
Job Responsibility
  • Your work will ensure the security, reliability, and performance of world-class systems and databases
  • You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups
What we offer
What we offer
  • Volunteer time off
  • Well-being
  • Time away
  • Fulltime
Read More
Arrow Right