CrawlJobs Logo

Manager, Reliability Engineering

Netherlands, Amsterdam · Job Posted April 23, 2026
Apply Position
Job Link Share

Job Description

As a Team Manager in Reliability Engineering at Optimizely, you will oversee the day-to-day operations of an international team of four engineers. You will manage team performance, support career development, and contribute on a technical level to ensure the reliability and scalability of our platforms. Your role will also involve applying agile methodologies to enhance team productivity and collaboration. Please note, this role includes participation in an on-call rotation.

Job Responsibility

  • Team Leadership and Management: Lead and manage an international team of reliability engineers. Oversee the team's daily activities, ensuring alignment with organizational goals and objectives
  • Technical Contribution: Actively contribute to technical projects and initiatives. Support the team with your expertise in system design, implementation, and troubleshooting
  • Agile Methodologies: Apply agile methods such as Scrum and Kanban to manage workflows and improve team productivity. Facilitate agile ceremonies and encourage continuous improvement
  • Performance Management: Monitor and evaluate team performance, providing regular feedback and guidance. Conduct performance reviews and set clear objectives for team members
  • Career Development: Support the professional growth and development of team members. Identify training and development opportunities to enhance team skills and capabilities
  • Collaboration and Communication: Foster a collaborative team environment. Communicate effectively with stakeholders and cross-functional teams to ensure alignment and transparency

Requirements

  • Proven experience in a leadership role within a reliability engineering or similar technical team
  • Strong technical background with experience in reliability engineering, system design, and troubleshooting
  • Strong understanding of cloud computing, networking, and system architecture (preferably GCP)
  • Proficiency in scripting and automation tools (e.g., Python, Bash, Terraform)
  • Experience with observability tools (e.g., Datadog, Prometheus, Grafana, ELK Stack)
  • Demonstrated experience in designing, deploying, and managing applications in Kubernetes environments. Proficiency in configuring and optimizing Kubernetes clusters for scalability, reliability, and performance
  • Proficiency in version control software, particularly Git/Github, is required
  • Experience with agile methodologies such as Scrum and Kanban
  • Excellent leadership, communication, and interpersonal skills
  • Ability to manage team performance and support career development
  • Proficiency in English is required
  • Familiarity with Istio service mesh architecture and its components is a plus
  • Experience working with international teams is a plus

Nice to have

  • Familiarity with Istio service mesh architecture and its components is a plus
  • Experience working with international teams is a plus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Manager, Reliability Engineering

8 matching positions

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
  • 3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
  • Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
  • Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
  • Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
  • Create a culture of operational excellence, continuous improvement, and psychological safety within the team
  • Conduct regular 1:1s, performance reviews, and career development conversations
  • Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
  • Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
  • Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
  • Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
  • Ensure alignment between team objectives and broader engineering and business goals
  • Advocate for and allocate resources toward reducing technical debt and improving developer experience
  • Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
  • 3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
  • Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
  • Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
  • Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
  • Create a culture of operational excellence, continuous improvement, and psychological safety within the team
  • Conduct regular 1:1s, performance reviews, and career development conversations
  • Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
  • Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
  • Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
  • Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
  • Ensure alignment between team objectives and broader engineering and business goals
  • Advocate for and allocate resources toward reducing technical debt and improving developer experience
  • Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement
What we offer
What we offer
  • Free comprehensive health insurance for you and your children
  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • A subsidy from the work council to refund part of the membership to a sport club or a creative class
  • Lunch voucher with Swile card
  • Fulltime
Read More
Arrow Right
New

Senior Manager, Reliability Engineering & Work Management

Join Amgen’s Mission of Serving Patients. At Amgen, if you feel like you’re part...
Location
Location
United States , Holly Springs
Salary
Salary:
148715.15 - 201202.85 USD / Year
amgen.com Logo
Amgen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High school diploma / GED and 12 years of engineering, maintenance, reliability, or CMMS experience OR Associate’s degree and 10 years of engineering, maintenance, reliability, or CMMS experience OR Bachelor’s degree and 8 years of engineering, maintenance, reliability, or CMMS experience OR Master’s degree and 6 years of engineering, maintenance, reliability, or CMMS experience OR Doctorate degree and 2 years of engineering, maintenance, reliability, or CMMS experience
  • minimum of 2 years experience directly managing people and/or leadership experience leading teams, projects, programs, or directing the allocation or resources
Job Responsibility
Job Responsibility
  • Lead and develop a multi-functional engineering operations organization responsible for work order administration, CMMS governance, maintenance planning and scheduling, reliability engineering, central inventory management, and KPI reporting
  • Lead the Central Inventory (equipment spare parts) program working closely with the Internal Service Provider as it relates to the overall MRO process supporting the site
  • Lead the site reliability program, aligning on the network approach for the Amgen North Carolina reliability roadmap, implementing initiatives and own engineering KPIs and business reporting on program and site equipment health
  • Establish long-range maintenance and asset reliability strategies aligned with site operational goals, regulatory compliance requirements, and enterprise engineering standards
  • Identify opportunities for continuous improvement, recommend solutions, and manage implementation of initiatives to improve planning and scheduling accuracy and adherence and overall asset management program
  • Collaborate with key customers and support groups regarding preventative maintenance activities and on time completion as well as manage aging work order backlog
  • Collaborate with corporate network groups to improve planning and scheduling performance in addition to improving programs within CMMS, Central Inventory, and Reliability to meet business needs
  • Provide direct supervision and oversight of work order safety programs and planning procedures
  • Experience working in a regulated environment (e.g. cGMP, OSHA, EPA, etc.), experience interacting with regulatory agencies and inspectors, and familiarity with GMP quality systems/processes such as change control, non-conformances, corrective and preventative actions, and qualifications/validation
  • Strong leadership, technical writing, and communication/presentation skills
What we offer
What we offer
  • A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts
  • A discretionary annual bonus program
  • Stock-based long-term incentives
  • Award-winning time-off plans
  • Flexible work models where possible
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements required for this role
  • This role requires access to Microsoft Government cloud environments, including GCC Moderate (GCCM), GCC High (GCCH), and Department of Defense (DoD) environments
  • For access to GCCH and DoD environments, this role requires the ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, this role requires the ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Candidates may be considered without currently holding these background investigations, provided they are eligible for and able to successfully obtain them
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineering Manager

Microsoft Substrate is the foundational cloud platform that powers many of Micro...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Ability to obtain and maintain appropriate background investigations and customer screenings for access to GCC Moderate, GCC High, and Department of Defense environments
  • For access to GCCH and DoD environments, ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation
  • For access to GCCM environments, ability to meet Criminal Justice Information Services (CJIS) eligibility requirements
  • For manager-level roles, a Tier 5 (T5) background investigation is preferred
  • Pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Lead and develop a team of Site Reliability Engineer ICs, providing clear expectations, regular coaching, and career guidance across senior and principal levels
  • Own the operational health and reliability posture of Substrate services running in regulated environments
  • Drive change and influence across the org as you establish and drive SLOs, SLIs, and operational metrics
  • Lead effective incident management and post-incident reviews
  • Serve as an actively engaged on-call engineer (OCE) and participate in an on-call rotation
  • Own reliability, resilience, and disaster recovery, including driving and coordinating DR and game day exercises
  • Drive engineering led operational excellence at scale
  • Partner with engineering and product teams to embed reliability, security, and compliance considerations early in service design
  • Influence technical and operational strategy beyond your immediate team
  • Represent your team’s work clearly to leadership and partners
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

Are you a Principal Site Reliability Engineering Manager interested in improving...
Location
Location
United States , Redmond
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • 3+ years of people management experience
  • 5+ years of experience planning, designing, implementing, and delivering large initiatives spanning multiple engineers as the primary owner, including operating and improving production services at scale
  • Experience leading reliability engineering for developer-facing or platform services, including incident response, automation/toil reduction, and observability (metrics/logs/tracing) built on top of mature observability platforms and practices
  • Experience working across disciplines, groups, and teams to align reliability priorities and delivery plans
  • Experience architecting, deploying, and operating enterprise scale distributed cloud services (Azure preferred), including containerization and orchestration
  • Experience operating engineering systems outer loop processes (CI/CD, build, and release platforms) with reliability, safety, and governance practices
Job Responsibility
Job Responsibility
  • Partner with engineers, product managers, and partner teams to design, operate, and maintain reliable and resilient services, with clear operational requirements (monitoring, alerting, runbooks, capacity, and failure modes)
  • Drive cross-org alignment through partnerships and co-development following the “One Microsoft” philosophy, including shared reliability standards and operational tooling
  • Build, grow, and retain a team of Site Reliability Engineers
  • Provide mentorship and coaching on reliability engineering, incident response, and pragmatic automation—within and beyond your team
  • Define, implement, and operate SLOs/SLIs and error budgets for critical engineering systems services
  • use them to guide prioritization and continuous improvement
  • Lead incident management for your services, including on-call health, escalation paths, blameless post incident reviews, modeling follow-through on corrective and preventive actions
  • Drive automation to reduce toil and improve operational efficiency across build, validation, and deployment systems (e.g., self-healing, safe rollouts, and automated remediation)
  • Establish observability (metrics, logs, traces), capacity planning, and performance management to meet reliability and latency goals at scale
  • Foster a diverse and inclusive culture where everyone can bring their full and authentic self, while holding a high bar for customer impact and reliability
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineering Manager

The Principal SRE Manager leads the team responsible for durable, high quality h...
Location
Location
Australia , Perth
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
  • equivalent experience
  • Proven experience leading teams through high severity production incidents in large, distributed systems
  • Demonstrated people leadership experience managing senior engineers or technical incident leaders
  • Strong understanding of incident management, reliability engineering, and live site operations at scale
  • Ability to drive clarity, accountability, and results in ambiguous, time critical situations
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Own execution quality for Substrate high severity incidents, ensuring clear command, decisive leadership, and forward momentum during high impact events
  • Act as the senior incident leader or sponsor for long running, high stakes, or cross service incidents, ensuring alignment on impact, risk, and recovery priorities
  • Partner closely with Incident Managers, Subject Matter Experts, and service leaders to ensure effective diagnosis, escalation, and mitigation when ownership is unclear or action is blocked
  • Ensure high quality post incident reviews and drive accountability for repair items that reduce recurrence and systemic risk
  • Ensure consistent application of severity and priority models, outage declaration criteria, and executive escalation paths
  • Lead, coach, and develop a team of Site Reliability Engineers serving as incident responders
  • Build a culture of calm execution, accountability, psychological safety, and continuous learning during and after incidents
  • Hire and grow senior talent capable of operating as trusted leaders in high pressure, executive visible situations
  • Serve as a trusted advisor to engineering leaders and executives on live site risk, readiness, and incident response maturity
  • Communicate clearly and credibly with senior leadership during customer impacting events
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right