This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Senior Site Reliability Engineer to join the Core Reliability & Observability team in Platform Engineering. Your mission will be to shape Doctolib's observability strategy and ensure our platform remains reliable, debuggable, and scalable at a European scale. You will work in a feature team developing logging, metrics, tracing, and alerting capabilities, contributing directly to supporting 400,000 health professionals and 80 million patients in their daily healthcare journey.
Job Responsibility:
Lead the observability strategy across the platform, with an emphasis on building scalable, developer-friendly logging and tracing capabilities
Identify and lead large-scale cross-cutting reliability initiatives, including improvements to our incident detection, response, and postmortem analysis capabilities
Take part in the on-call rotation, and actively contribute to improving our on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry
Requirements:
Have a solid hands-on experience (3y+) on a large-scale production platform
Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
Have a strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
Have deep expertise in observability tooling and architecture, such as: Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector, Tracing: OpenTelemetry or proprietary APMs, Metrics: Prometheus, Thanos, Datadog, or equivalent
Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
Have experience with monitoring and observability tools
Like troubleshooting performance issues in complex environments
Are fluent in English
Nice to have:
Have experience contributing to open-source observability projects
Have worked in a high-growth tech environment
Are passionate about developer experience and platform engineering
What we offer:
A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib
28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)
Work from abroad for up to 10 days per year thanks to our flexibility days policy
Company health insurance with great supplementary benefits through our partner Allianz
Company pension scheme (bAV) through Allianz with an employer subsidy of 40% (15% within the probationary period)
The Doctolib Parent Care program, which includes one month additional parental leave and much more
Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
Free mental health and coaching services through our partner Moka.care
Subsidized sports membership through our partner Urban Sports Club
A flexible workplace policy offering both hybrid and office-based mode
Alongside healthy snacks and our regular breakfast buffet, we provide a subsidized meal benefit
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Relocation support in case of international mobility
Access to the best AI tools for coding, development and dedicated training