This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Site Reliability Engineer to support the stability, scalability, and performance of critical user-facing applications and production systems. This role blends software engineering and operational excellence, focusing on automation, cloud infrastructure, and proactive reliability practices. You will contribute to ensuring high availability, efficient deployments, and continuous improvement across application operations within a complex, distributed environment.
Job Responsibility:
Maintain the availability, reliability, and performance of production applications and infrastructure
Participate in on-call rotations, using proactive monitoring to prevent incidents before they occur
Design, build, and operate infrastructure using tools such as Terraform, Kubernetes, Chef, and container technologies
Implement monitoring and alerting focused on service health and user impact using platforms such as CloudWatch, Splunk, AppDynamics, Dynatrace, Grafana, Kibana, Prometheus, and Datadog
Automate repetitive operational tasks through scripting and tooling to improve efficiency and consistency
Debug and resolve production issues across multiple services and layers of the technology stack
Support deployment activities, including CMS enhancements and new application releases
Collaborate with internal teams and third-party providers to investigate and resolve complex incidents
Document solutions, root causes, and procedures to build reusable knowledge and enable continuous improvement
Support infrastructure growth planning to meet future scalability and performance needs
Requirements:
Bachelor’s degree in Engineering, Computer Science, or a related discipline
1–2 years of experience in IT support, system administration, or web/software development
hands-on experience with UNIX/Linux administration and cloud platforms, particularly AWS
familiar with DevOps and CI/CD tools such as Jenkins, Ansible, Git, and GitLab
working knowledge of containerisation, microservices, and infrastructure-as-code practices
comfortable working with monitoring, alerting, and logging tools
communicate clearly, document thoroughly, and collaborate effectively with diverse stakeholders
adaptable, detail-oriented, and able to manage multiple priorities in a dynamic environment
open to shift-based work and on-call responsibilities
What we offer:
Exposure to large-scale, cloud-based systems supporting multiple international Vodafone markets
Opportunities to work with modern DevOps, automation, and reliability engineering practices
A collaborative environment that values learning, documentation, and continuous improvement
The chance to contribute to systems used by hundreds of thousands of users globally