This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace to automate deployment and management of infrastructure
Build and manage CI/CD pipelines to ensure efficient and reliable application deployments
Improve infrastructure provisioning and configuration through automation, minimizing manual interventions and reducing human error
Monitor the health, performance, and reliability of production systems and applications
Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets to ensure system reliability and availability meet customer expectations
Implement effective alerting systems to identify and address potential issues before they impact users
Lead root cause analysis (RCA) and post-mortem investigations after incidents to identify improvements and avoid recurrence
Respond to production incidents, diagnose root causes, and implement corrective actions
Create and maintain playbooks and documentation for incident response, troubleshooting, and recovery processes
Collaborate closely with development teams during the post-deployment phase to ensure smooth rollouts and address any production issues
Work alongside software engineers to design, deploy, and scale applications that are highly available, resilient, and fault tolerant
Provide guidance and support in ensuring that code is written with an operational mindset, enabling easy deployment, monitoring, and debugging
Act as a bridge between development, operations, and business teams, ensuring that infrastructure and software align with business goals
Experience working with cloud platforms such as AWS, Microsoft Azure and/or GCP
Expertise with Git, Jenkins, CircleCI, GitLab CI, or similar CI/CD platforms
Stay current with emerging technologies, tools, and trends in site reliability engineering, DevOps, and cloud computing
Lead or contribute to internal initiatives aimed at improving system performance, reliability, and operational efficiency
Propose and lead process improvements, optimizations, and innovations in automation and system design
Strong written and verbal communication skills, able to collaborate with cross-functional teams, write documentation, and explain technical concepts to non-technical stakeholders
Ability to work effectively in a fast-paced environment, collaborating with software developers, other SREs, operations teams, and business stakeholders
Requirements:
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible, Dynatrace
Build and manage CI/CD pipelines
Improve infrastructure provisioning and configuration through automation
Monitor the health, performance, and reliability of production systems and applications
Design, implement, and maintain automated monitoring solutions, using tools such as Datadog
Define and monitor service level objectives (SLOs), service level indicators (SLIs), and error budgets
Implement effective alerting systems
Lead root cause analysis (RCA) and post-mortem investigations
Respond to production incidents, diagnose root causes, and implement corrective actions
Create and maintain playbooks and documentation for incident response
Collaborate closely with development teams
Work alongside software engineers to design, deploy, and scale applications
Provide guidance and support in ensuring that code is written with an operational mindset
Act as a bridge between development, operations, and business teams
Experience working with cloud platforms such as AWS, Microsoft Azure and/or GCP
Expertise with Git, Jenkins, CircleCI, GitLab CI, or similar CI/CD platforms
Stay current with emerging technologies, tools, and trends in site reliability engineering, DevOps, and cloud computing
Lead or contribute to internal initiatives aimed at improving system performance, reliability, and operational efficiency
Propose and lead process improvements, optimizations, and innovations in automation and system design
Strong written and verbal communication skills
Ability to work effectively in a fast-paced environment, collaborating with software developers, other SREs, operations teams, and business stakeholders