Senior Site Reliability Engineer

Job Description

We are looking for a Senior Site Reliability Engineer to join the Core Reliability & Observability team in Platform Engineering. Your mission will be to shape Doctolib's observability strategy and ensure our platform remains reliable, debuggable, and scalable at a European scale. You will work in a feature team developing logging, metrics, tracing, and alerting capabilities, contributing directly to supporting 400,000 health professionals and 80 million patients in their daily healthcare journey.

Job Responsibility

Lead the observability strategy across the platform, with an emphasis on building scalable, developer-friendly logging and tracing capabilities
Identify and lead large-scale cross-cutting reliability initiatives, including improvements to our incident detection, response, and postmortem analysis capabilities
Take part in the on-call rotation, and actively contribute to improving our on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry

Requirements

Have a solid hands-on experience (3y+) on a large-scale production platform
Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
Have a strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
Have deep expertise in observability tooling and architecture, such as: Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector, Tracing: OpenTelemetry or proprietary APMs, Metrics: Prometheus, Thanos, Datadog, or equivalent
Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
Have experience with monitoring and observability tools
Like troubleshooting performance issues in complex environments
Are fluent in English

Nice to have

Have experience contributing to open-source observability projects
Have worked in a high-growth tech environment
Are passionate about developer experience and platform engineering

What we offer

A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib
28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)
Work from abroad for up to 10 days per year thanks to our flexibility days policy
Company health insurance with great supplementary benefits through our partner Allianz
Company pension scheme (bAV) through Allianz with an employer subsidy of 40% (15% within the probationary period)
The Doctolib Parent Care program, which includes one month additional parental leave and much more
Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
Free mental health and coaching services through our partner Moka.care
Subsidized sports membership through our partner Urban Sports Club
A flexible workplace policy offering both hybrid and office-based mode
Alongside healthy snacks and our regular breakfast buffet, we provide a subsidized meal benefit
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Relocation support in case of international mobility
Access to the best AI tools for coding, development and dedicated training

Doctolib - All Job Offers

Select Country

Senior Site Reliability Engineer - Observability

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?