This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This role involves designing, operating, and enhancing a secure, scalable, and cost-efficient multi-cloud platform. The ideal candidate will possess a strong technical background, a passion for automation and observability, and a commitment to improving system reliability and efficiency.
Job Responsibility:
Design, implement, and manage reliable and scalable systems across multi-cloud environments, including AWS and Azure
Develop and refine service level objectives (SLOs), service level indicators (SLIs), and error budgets to support system reliability
Lead root cause analyses for incidents and implement measures to prevent recurrence
Enhance platform observability by creating and maintaining metrics, logs, traces, and alerts
Drive cloud cost optimization initiatives by implementing cost visibility, forecasting, and accountability measures
Collaborate with security teams to ensure compliance with regulatory standards and embed security into platform operations
Automate operational workflows using Infrastructure as Code and CI/CD pipelines
Utilize AI tools to improve incident analysis, capacity planning, and operational efficiency
Mentor and guide engineering teams on reliability practices and cost-efficient architectures
Partner with cross-functional teams to influence technical direction and improve operational maturity
Requirements:
At least 5 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering
Hands-on experience managing production systems in AWS and Azure environments
Comprehensive knowledge of cloud architecture and automation tools, including Terraform and Kubernetes
Proven ability to implement observability solutions, such as metrics, logging, tracing, and alerting platforms
Demonstrated success in driving cost optimization initiatives within cloud environments
Familiarity with compliance frameworks such as SOC 2 or similar high-regulation standards
Proficiency in scripting or programming languages, such as Python, Go, or Bash
Experience with Infrastructure as Code tools, including Terraform or CloudFormation