This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Operational Reliability: Responsible for maintaining production platforms to meet strict telco performance targets and 24x7 readiness. Incident Leadership: Lead on-call rotations and act as the primary technical lead for high-severity or complex cloud incidents. Service Restoration: Drive rapid recovery within agreed SLAs/MTTR while coordinating across network, security, and app teams. Lifecycle Management: Perform Root Cause Analysis (RCA) and manage the full cycle of Incident, Problem, and Change Management. Efficiency & Mentorship: Enhance resilience via automation and self-healing tools while providing technical guidance to junior engineers.
Job Responsibility:
Maintaining production platforms to meet strict telco performance targets and 24x7 readiness
Lead on-call rotations and act as the primary technical lead for high-severity or complex cloud incidents
Drive rapid recovery within agreed SLAs/MTTR while coordinating across network, security, and app teams
Perform Root Cause Analysis (RCA) and manage the full cycle of Incident, Problem, and Change Management
Enhance resilience via automation and self-healing tools while providing technical guidance to junior engineers
Requirements:
Bachelor’s degree in IT/CS/Engineering with 3–6 years in cloud operations or infrastructure roles
Hands-on experience managing production environments in AWS, Azure, or GCP
Expert usage of Terraform, Bicep, or CloudFormation to standardize and deploy resources
Ability to resolve complex platform issues using logs, metrics, and alerts in multi-cloud setups
Solid understanding of Cloud Networking, Security, IAM, and on-call service management (ITSM)