This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of enterprise applications and services across cloud and on-premises environments. This role focuses on automation, monitoring, and incident response to minimize downtime and enhance operational efficiency. The position requires close collaboration with development, quality assurance, and operations teams to deliver secure and resilient systems.
Job Responsibility:
Design, build, and maintain secure, compliant infrastructure using Infrastructure as Code tools such as Terraform and Ansible
Automate provisioning and management of servers, storage, networks, Kubernetes clusters, and related systems across cloud and on-premises environments
Develop tools and processes for automated deployment, configuration, monitoring, and alerting
Collaborate with cross-functional teams to implement scalable and reliable cloud and data center solutions
Participate in incident response, on-call rotations, and post-incident reviews to improve system resilience
Monitor system performance and availability using service-level agreements (SLAs), objectives (SLOs), and indicators (SLIs)
proactively troubleshoot and resolve reliability, performance, or security issues
Create and maintain disaster recovery and business continuity plans for critical systems
Continuously analyze and improve infrastructure efficiency, scalability, and performance
Stay current with emerging technologies and recommend tools or practices to enhance platform capabilities
Share technical expertise and mentor team members to strengthen internal capabilities
Requirements:
Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
Proven experience as a Site Reliability Engineer or Systems Engineer
Strong proficiency in Terraform and Ansible for infrastructure automation
Hands-on experience with Kubernetes, Docker, or other container orchestration tools
Proficiency in scripting languages such as Python or Bash
In-depth knowledge of Google Cloud Platform (GCP) services including compute, networking, storage, Kubernetes, and security
Solid understanding of VMware virtualization and enterprise storage systems (e.g., Pure Storage)
Experience with networking technologies including VLANs, VPNs, and routing protocols
Strong grasp of IT infrastructure and operations principles, including systems integration and automation best practices
Excellent communication and collaboration skills
Ability to manage multiple priorities under pressure with strong problem-solving skills
Nice to have:
Terraform Associate certification
GCP certification (e.g., Cloud Architect)
Relevant certifications such as ITIL, PMP, or CISSP
Experience in regulated or enterprise environments