This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Director of Site Reliability Engineering to lead a global organization responsible for the reliability and operational excellence of the Aiven platform globally. You will lead a high-performing SRE team, setting the vision and strategy to ensure resilient, scalable, and highly automated systems across our 24/7/365 operations. Your team will proactively manage platform health, lead incident response and cross-functional coordination, and drive continuous improvement in reliability and performance. As a senior leader, you will partner closely with engineering, product, and support teams worldwide, influence system architecture, and invest in tooling and automation to reduce toil and enhance production reliability. This role combines strategic leadership, customer centricity, and deep operational accountability, with a focus on delivering reliable services at global scale while developing strong technical leaders within your organization.
Job Responsibility:
Define and drive global SRE operating strategy in partnership with regional SRE leaders across EMEA, AMER and APAC, ensuring alignment on reliability goals, operating models, and execution across a 24/7/365 follow-the-sun organization
Build and lead a multi-regional SRE organization through managers, developing leadership capability, mentoring team, and ensuring consistent performance, culture, and delivery across geographies
Set the vision and roadmap for reliability engineering, enabling teams to deliver high-impact tools, automation, and process initiatives that improve platform resilience, scalability, and efficiency
Own global incident management strategy and operating model, including on-call design, coverage, and escalation frameworks, ensuring seamless coordination and high availability across regions
Establish a metrics-driven operating cadence, defining KPIs/SLIs/SLOs/Error Budget, driving data-informed prioritization, and embedding operational rigor and continuous improvement across the SRE organization
Requirements:
Proven experience leading and scaling global SRE or infrastructure organizations through managers, ideally across multiple regions and time zones
Strong track record of defining and executing reliability strategy at scale, including ownership of SLIs/SLOs, incident management frameworks, and operational excellence programs
Demonstrated ability to build, develop, and mentor senior leaders, creating high-performing, inclusive teams and strong leadership pipelines
Experience operating in a 24/7/365 production environment, with deep understanding of follow-the-sun models, on-call design, and large-scale incident response
Ability to partner cross-functionally at the executive level (Engineering, Product, Support) to influence architecture, prioritization, and long-term platform investments
Strong data-driven leadership approach, with experience defining SLI/SLOs and using metrics to drive prioritization, accountability, and continuous improvement
Solid technical foundation in distributed systems, cloud infrastructure, and automation, with the ability to engage credibly with senior engineers and influence technical direction
Experience driving large-scale change and organizational design, including scaling teams, evolving operating models, and improving efficiency and reliability at company level
What we offer:
Participate in Aiven’s equity plan
Balance work and life with our hybrid work policy
Choose the equipment you need to set yourself up for success
Use your Professional Development Plan budget for learning opportunities
Receive holistic wellbeing support through our global Employee Assistance Program
Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
Enjoy country-specific benefits for our global cast