CrawlJobs Logo

Director, Site Reliability Engineering

Germany, Berlin · Job Posted June 29, 2026
Apply Position
Job Link Share

Job Description

As our Director of Site Reliability Engineering, reporting to our VP of Platform Engineering, you'll own the core infrastructure layers that everything at Doctolib runs on: cloud infrastructure, database operations, network infrastructure, and observability. You will also lead the Doctolib Operations Center (DOC) and drive a decisive shift from reactive operations to a proactive, world-class reliability culture. This is a rare opportunity to shape the infrastructure backbone of Europe's leading healthtech company, at a moment when Doctolib is actively expanding multi-cloud capabilities, scaling to new countries, and building the reliability culture that will define the next decade of healthcare innovation.

Job Responsibility

  • Build and run a world-class SRE org of 25+ engineers across Cloud Infrastructure, Database & Storage, Network Infrastructure, Observability Tooling, and the Doctolib Operations Center
  • Own the infrastructure strategy and roadmap — cloud, database, network, observability — and deliver against company OKRs
  • Lead the Doctolib Operations Center: set incident response standards, drive MTTR reduction, embed blameless post-mortem culture across engineering
  • Architect and execute our multi-cloud strategy — reducing vendor lock-in, cutting migration costs, and enabling international expansion
  • Own network infrastructure at scale: load balancing, CDN/WAF, VPCs, peering, zero-trust networking across a high-traffic, multi-country platform
  • Drive observability as a product — give 700+ engineers true visibility into system health and turn observability maturity into an operational excellence lever
  • Lead from the front as a senior technical voice in the Platform org and broader Tech leadership team

Requirements

  • 12+ years in software engineering, including 5+ years leading managers and running infrastructure or SRE organisations at scale
  • Track record of taking SRE practices from reactive to proactive — with measurable reductions in incidents and MTTR
  • Strong multi-cloud and network infrastructure experience: load balancing, CDN/WAF, VPCs, peering, at high-traffic scale
  • Deep database operations background: large-scale transactional systems (PostgreSQL, Aurora), streaming/CDC (Kafka), data layer FinOps
  • Experience building observability platforms that give teams genuine visibility — metrics, logs, traces, alerting
  • Sharp process thinking: SLOs, error budgets, incident management, blameless post-mortems
  • Outcome-driven: you track reliability, cost efficiency, and engineering velocity as business metrics, not just technical ones
  • Strong communicator and influencer at executive level — equally credible with senior engineers and business stakeholders
  • Builder of high-performing, people-first engineering cultures
  • Fluent in English
  • comfortable in fast-paced, international environments
  • You recognise yourself in our playbook values

Nice to have

  • Experience in healthcare, regulated, or high-compliance industries (HDS, ISO 27001, SOC2, GDPR, data sovereignty)
  • Familiarity with our stack: Ruby on Rails, Node.js, Go, Python, React, AWS, GCP, Kubernetes, PostgreSQL, Datadog, GitHub Actions
  • French language proficiency
  • Experience with AI-augmented infrastructure tooling or ML platform operations
  • M&A or post-acquisition infrastructure integration experience

What we offer

  • A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib
  • 28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)
  • Work from abroad for up to 10 days per year thanks to our flexibility days policy
  • Company health insurance with great supplementary benefits through our partner Allianz
  • Company pension scheme (bAV) through Allianz with an employer subsidy of 40% (15% within the probationary period)
  • Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
  • The Doctolib Parent Care program, which includes one month additional parental leave and much more
  • Free mental health and coaching services through our partner Moka.care
  • Subsidized sports membership through our partner Urban Sports Club
  • A flexible workplace policy offering both hybrid and office-based mode
  • Alongside healthy snacks and our regular breakfast buffet, we provide a subsidized meal benefit
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Director, Site Reliability Engineering

8 matching positions

Director, Site Reliability Engineering

As our Director of Infrastructure platform, you will be a key driver of Doctolib...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years in software engineering, including 6+ years leading large (30+) distributed, international platform or infrastructure teams
  • Proven experience driving platform-as-a-product transformations and modularizing large monolithic architectures at scale
  • Demonstrated ability to architect, deliver, and operate secure, reliable, and scalable developer platforms in SaaS, multi-product, or regulated environments
  • Strong process orientation: experience implementing OKRs, robust monitoring/observability, and best-in-class incident management
  • Measurable impact on developer productivity, platform adoption, reliability, and cost-efficiency
  • Effective communicator and influencer, with the ability to align and inspire cross-functional stakeholders
  • Experience leading change and building high-performing, people-first engineering cultures
  • Fluent in English and comfortable in fast-paced, international environments
Job Responsibility
Job Responsibility
  • Lead and scale a high-performing infrastructure organization of 30+ engineers across Infrastructure, Automation, SRE, and Database teams, while maintaining strong engagement and fostering a culture of excellence and ownership
  • Own the infrastructure platform strategy and roadmap that enables Doctolib's modularization journey, delivers on company OKRs, and ensures predictable execution across all infrastructure and automation initiatives
  • Champion platform-as-a-product by building self-service capabilities (infrastructure provisioning, CI/CD, observability, database management) that transform developer experience and unlock team autonomy across the engineering organization
  • Be the guardian of quality and reliability by establishing world-class incident management, driving measurable improvements in availability and performance, and ensuring infrastructure components operate at the highest standards of security and resilience
  • Accelerate engineering velocity by reducing platform friction, enabling faster modularization, and leveraging AI-augmented development tools to multiply productivity across feature teams
  • Drive the infrastructure transformation from monolith-supporting infrastructure to a modular, multi-service platform architecture - enabling international expansion, product velocity, and operational excellence at scale
  • Act as a senior technical leader within the Platform organization and broader Tech leadership team, bringing strong technical opinions and challenging architectural decisions while clearly articulating how infrastructure investments contribute to company strategy and business outcomes
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive additional leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from abroad for up to 10 days per year thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Director, Site Reliability Engineering

We are seeking a Director of Site Reliability Engineering to lead a global organ...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
aiven.io Logo
Aiven Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading and scaling global SRE or infrastructure organizations through managers, ideally across multiple regions and time zones
  • Strong track record of defining and executing reliability strategy at scale, including ownership of SLIs/SLOs, incident management frameworks, and operational excellence programs
  • Demonstrated ability to build, develop, and mentor senior leaders, creating high-performing, inclusive teams and strong leadership pipelines
  • Experience operating in a 24/7/365 production environment, with deep understanding of follow-the-sun models, on-call design, and large-scale incident response
  • Ability to partner cross-functionally at the executive level (Engineering, Product, Support) to influence architecture, prioritization, and long-term platform investments
  • Strong data-driven leadership approach, with experience defining SLI/SLOs and using metrics to drive prioritization, accountability, and continuous improvement
  • Solid technical foundation in distributed systems, cloud infrastructure, and automation, with the ability to engage credibly with senior engineers and influence technical direction
  • Experience driving large-scale change and organizational design, including scaling teams, evolving operating models, and improving efficiency and reliability at company level
Job Responsibility
Job Responsibility
  • Define and drive global SRE operating strategy in partnership with regional SRE leaders across EMEA, AMER and APAC, ensuring alignment on reliability goals, operating models, and execution across a 24/7/365 follow-the-sun organization
  • Build and lead a multi-regional SRE organization through managers, developing leadership capability, mentoring team, and ensuring consistent performance, culture, and delivery across geographies
  • Set the vision and roadmap for reliability engineering, enabling teams to deliver high-impact tools, automation, and process initiatives that improve platform resilience, scalability, and efficiency
  • Own global incident management strategy and operating model, including on-call design, coverage, and escalation frameworks, ensuring seamless coordination and high availability across regions
  • Establish a metrics-driven operating cadence, defining KPIs/SLIs/SLOs/Error Budget, driving data-informed prioritization, and embedding operational rigor and continuous improvement across the SRE organization
What we offer
What we offer
  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast
  • Fulltime
Read More
Arrow Right

Director, Site Reliability Engineering

The Director of Site Reliability Engineering (SRE) will provide strategic leader...
Location
Location
United States , Mountain View
Salary
Salary:
315000.00 - 385000.00 USD / Year
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, MS, or PhD degree in Computer Science, Engineering, or related field, or related experience
  • 7+ years of experience in the field, including 3+ years leading SRE teams or a team in a similar role
  • Strong experience with container orchestration (Kubernetes), infrastructure as code (Terraform), and CI/CD pipelines
  • Hands-on experience with observability platforms (e.g., Datadog, Prometheus, Grafana) and incident management tools (e.g., incident.io, PagerDuty)
  • Proficiency in at least one programming language (Python, Go, or Java) with the ability to review code and guide system design decisions
  • Proven experience in architecting and managing highly available, scalable, and fault-tolerant systems
  • Ability to define a clear reliability vision and inspire teams and stakeholders toward long‑term reliability goals
  • Demonstrated sound judgment and calm decision‑making under pressure, particularly during high‑severity incidents
  • Strong people leadership skills, with experience coaching and mentoring engineering talent, developing future leaders, and aligning peer engineering managers and leaders on reliability best practices
  • Strategic planning skills with a track record of aligning technical direction with organizational objectives
Job Responsibility
Job Responsibility
  • Drive organizational transformation toward SRE principles and own the strategic direction for reliability maturity, cultivating a culture centered on reliability, efficiency, and continuous improvement
  • Develop and oversee automation strategies, tools, and frameworks that improve system reliability, reduce operational toil, and enhance team productivity
  • Architect and evolve robust observability, monitoring, and alerting systems
  • champion chaos engineering and resilience testing practices to proactively validate system behavior under failure conditions
  • Partner with engineering, product, and operations teams to embed SRE practices throughout the development lifecycle and influence architectural decisions for reliability
  • Build, mentor, and develop a high‑performing global SRE organization, fostering technical excellence, career growth, and a strong culture of knowledge sharing
  • Oversee capacity planning, scalability assessments, and future‑state demand forecasting across critical systems
  • Lead and govern high‑severity incident response practices—ensuring rapid triage, thorough root cause analysis, and follow‑through on corrective and preventative actions
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Site Engineering Director

DS Smith Paper is seeking an outstanding Engineering Director to lead the Engine...
Location
Location
United Kingdom , Kemsley
Salary
Salary:
Not provided
dssmith.com Logo
DS Smith
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Chartered member of relevant professional body or equivalent experience
  • Extensive engineering experience within a busy organisation is essential
  • Significant successful experience in management in a dynamic complex environment
  • Recognised maintenance leadership capability
  • Able to demonstrate in depth knowledge and experience of health and safety management and compliance management processes
  • Proven ability to influence decision making and build cross functional / cross company relationships
  • Experience managing budgets, operations, and forecasting
  • Strong problem-solving skills with a focus on continuous improvement
Job Responsibility
Job Responsibility
  • Lead and role-model a strong safety culture, ensuring all engineering work is carried out safely and in line with risk assessments and legal requirements
  • Build the right team structure and ensure all engineers are trained, competent and fully compliant
  • Develop and deliver the site’s long-term engineering strategy, with a clear focus on improving reliability and reducing unplanned downtime
  • Embed continuous improvement across the Engineering function and support wider operational improvements
  • Create engineering investment plans and provide accurate financial forecasts for budgets and long-term planning
  • Deliver clear, concise communication and reporting to support effective decision-making across the site
  • Strengthen maintenance and asset management practices to achieve world-class levels of reliability and asset availability
  • Improve planned and preventive maintenance systems, including SAP utilisation, spares management and workshop standards
  • Build strong relationships across the DS Smith Paper Division and with key suppliers to give the site rapid access to expertise and industry best practice
  • Ensure all engineering activities meet statutory requirements and follow DS Smith standards
What we offer
What we offer
  • Competitive salary
  • Qualifying Sick Pay scheme
  • Pension scheme & Life insurance
  • Share Save scheme
  • Income Protection
  • 25 days holiday plus Bank Holidays
  • Employee Assistance Programme
  • Virtual GP, Occupational Health & free Flu vaccine
  • Cycle to Work and shopping discounts
  • Fulltime
Read More
Arrow Right

Director of Engineering & Reliability

Crusoe is expanding our hyperscale AI and high-performance computing (HPC) data ...
Location
Location
United States , San Francisco
Salary
Salary:
216000.00 - 260000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of engineering experience in mission-critical facilities or hyperscale data centers
  • Strong technical expertise in mechanical and electrical systems (MV distribution, UPS, generators, cooling plants, CRAC/CRAH, liquid cooling)
  • Experience implementing RCM, FMEA, RCA, and reliability engineering programs
  • Ability to govern engineering standards across multi-site portfolios
  • Strong analytical, modeling, and systems-thinking capabilities
Job Responsibility
Job Responsibility
  • Build and govern Crusoe’s enterprise engineering design standards for mechanical, electrical, and critical infrastructure systems
  • Lead reliability engineering programs including FMEA, RCM, RCA, uptime strategy, and risk modeling
  • Develop asset lifecycle strategies, predictive maintenance programs, and long-term capital planning
  • Model power, cooling, airflow, and liquid-loop performance to optimize system capacity and readiness
  • Serve as L3 escalation for complex MEP issues and major incidents
  • Lead technical audits, quality assurance programs, and engineering evaluations across all campuses
  • Partner with Construction, Commissioning, and Operations to enable scalable, high-density AI workloads
  • Build and lead a team of MEP and reliability engineers
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Director of Engineering & Reliability

Crusoe is expanding our hyperscale AI and high-performance computing (HPC) data ...
Location
Location
United States , San Francisco
Salary
Salary:
216000.00 - 260000.00 USD / Year
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of engineering experience in mission-critical facilities or hyperscale data centers
  • Strong technical expertise in mechanical and electrical systems (MV distribution, UPS, generators, cooling plants, CRAC/CRAH, liquid cooling)
  • Experience implementing RCM, FMEA, RCA, and reliability engineering programs
  • Ability to govern engineering standards across multi-site portfolios
  • Strong analytical, modeling, and systems-thinking capabilities
Job Responsibility
Job Responsibility
  • Build and govern Crusoe’s enterprise engineering design standards for mechanical, electrical, and critical infrastructure systems
  • Lead reliability engineering programs including FMEA, RCM, RCA, uptime strategy, and risk modeling
  • Develop asset lifecycle strategies, predictive maintenance programs, and long-term capital planning
  • Model power, cooling, airflow, and liquid-loop performance to optimize system capacity and readiness
  • Serve as L3 escalation for complex MEP issues and major incidents
  • Lead technical audits, quality assurance programs, and engineering evaluations across all campuses
  • Partner with Construction, Commissioning, and Operations to enable scalable, high-density AI workloads
  • Build and lead a team of MEP and reliability engineers
What we offer
What we offer
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Fulltime
Read More
Arrow Right

Site Director

Lead, Influence, and Shape the Heart of LANXESS Operations in the Netherlands. R...
Location
Location
Netherlands , Rotterdam
Salary
Salary:
Not provided
lanxess.com Logo
LANXESS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Chemical Engineering or a related field
  • 10+ years of leadership experience in chemical or process manufacturing environments
  • Proven track record managing complex industrial operations with a strong focus on HS&E
  • Solid understanding of production technologies, operational risk, and safety management systems
  • Experience operating in regulated environments and working with authorities, unions, and works councils
  • Strong financial acumen, including budget ownership and capital project oversight
  • Ability to lead large, multidisciplinary teams and influence stakeholders in a matrix organisation
  • Demonstrated commitment to continuous improvement and operational excellence
  • Excellent communication skills and the credibility to represent LANXESS with external stakeholders
  • Fluent in Dutch and English
Job Responsibility
Job Responsibility
  • Lead and execute the site manufacturing strategy in alignment with global BU and LANXESS objectives
  • Ensure all site operations meet the highest standards for HS&E, quality, and regulatory compliance
  • Act as the legal line manager for the site and functional matrix leader within the global organisation
  • Set targets and KPIs, ensure robust performance reporting, and continuously optimise site results
  • Drive continuous improvement through technological, procedural, and cultural change initiatives
  • Manage the full site operating budget and oversee the capital investment portfolio, including large projects
  • Ensure that management systems, emergency response structures, and governance processes are in place and effective
  • Serve as the primary point of contact for authorities, works council, unions, and industry bodies
  • Negotiate permits, agreements, and regulatory requirements relevant to site operations
  • Build strong internal networks across functions, sites, and global teams
What we offer
What we offer
  • Permanent contract from day one
  • Management grade MGIII, including 20% annual performance pay (APP)
  • Lease car according to LANXESS policy
  • 30 vacation days, with the option to buy additional days
  • Competitive salary aligned with experience and scope
  • Strong pension plan
  • Extensive leadership and professional development opportunities
  • Unlimited access to LinkedIn Learning
  • Vitality budget to support wellbeing
  • Travel reimbursement and support for on-site presence
  • Fulltime
Read More
Arrow Right

Director, Engineering Services

Regional Engineering Lead for Account Management – Workplace Management. Respons...
Location
Location
Singapore
Salary
Salary:
Not provided
jll.com Logo
JLL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least five (5) years of experience in managing and delivering asset management and maintenance strategies for industrial environments
  • Demonstrated skills in rigorous, fact-based analyses that drive creative problem solving, proposal preparation, and negotiations
  • Persuasive written and verbal communication skills
  • Strong multi-tasking and organisational capabilities
  • High level of attention to detail
  • Able to problem solve and to think strategically
  • Ability to prioritise workload in a high pressure, deadline driven work environment
  • Electrical, mechanical and/or a related engineering/trade field
  • Computer literate
  • Evidence of strong interpersonal skills
Job Responsibility
Job Responsibility
  • Responsible for the strategic oversight and operational management of multiple engineering teams across the region
  • Ensure the highest level of professionalism while meeting diverse client needs and maintaining a commitment to achieving 100% uptime across all managed facilities
  • Responsible for protecting and improving the value of clients' assets region-wide
  • Implement and standardize reliability-based maintenance functions across multiple properties
  • Execute comprehensive equipment inspection and monitoring programs across the regional portfolio
  • Define and implement maintenance best practices to improve overall mechanical equipment uptime
  • Continuously evaluate maintenance, operations and reliability methods
  • Collaborate closely with site teams to ensure adherence to JLL and client safety measures and procedures
  • Work with Account Director and act as Single point of contact for client and account management leadership for overall engineering including critical environment deliverables
  • Manage the Asset register accuracy and keep up to date using the JLL CMMS (Corrigo)
  • Fulltime
Read More
Arrow Right