This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our client is seeking a highly experienced Senior Cloud Engineer to drive the operational excellence, stability, and security of enterprise AWS cloud environments. This role is instrumental in ensuring high availability, performance optimization, proactive maintenance, and scalable infrastructure design across development, staging, and production ecosystems. The ideal candidate brings deep expertise in AWS architecture, CI/CD automation, container orchestration, Infrastructure as Code, and incident response. You will partner closely with Development, DevOps, Security, and Operations teams to maintain resilient, compliant, and high-performing cloud-native applications. This is a strategic, hands-on engineering role supporting mission-critical systems in a Public Trust environment.
Job Responsibility:
Deploy and maintain applications across development, staging, and production environments, ensuring consistency and operational stability
Build reusable CI/CD pipeline templates, jobs, and stages to standardize deployment practices across teams
Configure and manage GitLab Runners, environment variables, and secure secret handling
Implement version control strategies and rollback procedures during deployments
Troubleshoot and resolve deployment failures, optimizing pipeline performance and reliability
Architect and maintain AWS-based applications and infrastructure
Deploy and optimize AWS Lambda functions, including scripting, monitoring, and performance tuning
Containerize and deploy applications using ECS, EKS, and Kubernetes
Define and deploy readiness and liveness probes for containerized workloads
Orchestrate regional failover and restoration of ECS, EKS, Lambda, databases, and other infrastructure services
Develop and test disaster recovery playbooks and recovery runbooks
Ensure compliance with RTO and RPO requirements
Develop custom CloudWatch metrics and alarms based on application-specific probes
Monitor system health using CloudWatch, AWS CLI, and scheduled Lambda scripts
Configure dashboards, thresholds, and alerting frameworks
Perform root cause analysis and incident correlation using performance monitoring tools
Participate in 24/7 on-call rotation to support production systems
Conduct patch assessment and maintenance for infrastructure and third-party software
Develop structured patch testing schedules and rollout plans, including rollback strategies
Maintain centralized inventory of licensed software deployed across AWS environments
Create and manage change records in alignment with Agile and PI planning processes
Ensure cloud environments adhere to security best practices and compliance standards
Develop automation scripts using Python, Bash, YAML, JSON, and Node.js
Create pre-check and post-check scripts to validate deployment health and infrastructure stability
Enhance operational efficiency through Infrastructure as Code, using Terraform and CloudFormation
Requirements:
Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent experience
8+ years of IT experience, including 5+ years supporting cloud infrastructure or IT operations
Hands-on experience with AWS services, including Lambda, ECS, EKS, and CloudWatch
Experience implementing Infrastructure as Code using Terraform and CloudFormation
Strong proficiency with CI/CD tools such as GitHub, GitLab, and Kubernetes-based DevOps pipelines
Scripting experience using Bash and Python for automation and operational maintenance
Cloud certifications such as AWS DevOps Engineer or AWS Solutions Architect Associate
Strong analytical, troubleshooting, and problem-solving skills
Ability to communicate effectively with both technical and non-technical stakeholders
U.S. Citizenship required
Ability to obtain and maintain Public Trust clearance
Nice to have:
Advanced experience diagnosing cloud performance and scalability issues
Expertise in container orchestration platforms including Docker, ECS, and Kubernetes
Familiarity with ITIL frameworks and incident management best practices
Experience supporting regulated or federal environments