This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Level 3 AWS Infrastructure Support Engineer, you will own overnight monitoring and response for Electronikmedia’s Clients' AWS-based production environment. This role is ideal for someone who thrives in high-trust, high-impact operational environments.
Job Responsibility:
Monitor system health using Datadog and AWS-native tools
Investigate alerts and anomalies using established runbooks
Resolve production incidents when possible
Escalate complex issues quickly and accurately
Maintain clean, auditable incident documentation
Provide initial response within 15 minutes for all high-priority production alerts
Investigate, mitigate, and resolve production outages when feasible
Escalate unresolved or complex issues using the defined escalation matrix
Act as the owner of the production system stability
Analyze and respond to Datadog monitor alerts across infrastructure and application layers
Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
Proactively notify stakeholders of significant performance or stability concerns
Contribute insights for preventive and corrective actions
Track recurring alerts and incidents
Provide analysis and recommendations to reduce alert noise and improve system resilience
Participate in weekly validation of Datadog alert configurations and thresholds
Maintain clear, concise, and timely communication during incidents
Document all incidents, alarms, and observations in Jira during each shift
Ensure handoff notes are complete and actionable for daytime engineering teams
Requirements:
5+ years of hands-on AWS infrastructure administration and support
Proven experience supporting production-grade, high-availability systems
Strong background in incident response within enterprise or scale-up environments
Deep operational knowledge of AWS services and distributed systems
Strong troubleshooting and root-cause analysis skills under tight SLAs
Ability to follow runbooks while also knowing when to think beyond them
Calm, structured decision-making during production incidents
Nice to have:
AWS Certified Solutions Architect – Associate or Professional