This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of critical systems and applications. The candidate will work closely with development and operations teams to automate processes, monitor system health, and improve system resilience.
Job Responsibility:
Maintain and improve system reliability, availability, and performance
Implement monitoring, alerting, and incident response processes
Automate infrastructure and operational tasks using scripting and DevOps tools
Collaborate with development teams to ensure scalable and reliable system design
Manage CI/CD pipelines and deployment processes
Perform root cause analysis and implement solutions to prevent recurring incidents
Support cloud infrastructure and containerized environments
Document system architecture, procedures, and operational practices
Requirements:
Experience in Site Reliability Engineering or DevOps roles
Strong knowledge of Linux systems and cloud platforms (AWS, Azure, or GCP)
Experience with monitoring tools such as Prometheus, Grafana, or similar
Knowledge of containerization technologies like Docker and Kubernetes
Proficiency in scripting languages such as Python, Bash, or Go
Strong troubleshooting and problem-solving skills
Nice to have:
Experience with infrastructure as code (Terraform, Ansible, or similar)
Knowledge of CI/CD tools such as Jenkins, GitHub Actions, or GitLab CI
Familiarity with microservices architecture and distributed systems