CrawlJobs Logo

Cloud Resilience & Disaster Recovery Engineer

https://www.randstad.com Logo

Randstad

Location Icon

Location:
Australia , Melbourne

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided
Save Job
Save Icon
Job offer has expired

Job Description:

We are seeking a senior Cloud Resilience & Disaster Recovery Engineer to fortify and automate the recovery capabilities of a large-scale government cloud environment. This role is dedicated to ensuring business continuity through sophisticated multi-region architectures and automated recovery workflows across AWS and Azure.

Job Responsibility:

  • Design and manage automated infrastructure recovery patterns, multi-region designs, and failover orchestration
  • Lead the transition of legacy systems into fully managed, version-controlled Terraform environments
  • Implement and audit immutable backup policies and air gapped recovery solutions
  • Develop and maintain scalable, secure, and highly available (HA) cloud infrastructure
  • Create comprehensive DR runbooks, automated test plans, and recovery strategies
  • Operate within highly structured environments, adhering to strict change control and government governance models

Requirements:

  • Active AGSVA Baseline, NV1, or NV2 security clearance
  • 5–10+ years of hands-on experience in cloud resilience and disaster recovery
  • Strong functional understanding of DR services and high-availability configurations in both AWS and Azure
  • Proficiency in Terraform or ARM templates
  • Deep experience with multi-region and multi-zone network architecture
  • Familiarity with cloud-native DR monitoring tools
  • A solid grasp of ITIL processes (Incident, Change, and Problem Management) in a mission-critical context
  • AWS or Azure Engineering/Architecture certifications (highly regarded)
  • Proven experience working in government or highly regulated environments
  • Exceptional analytical skills

Nice to have:

  • AWS or Azure Engineering/Architecture certifications
  • Proven experience working in government or highly regulated environments
  • Exceptional analytical skills and the ability to collaborate with cross-functional teams

Additional Information:

Job Posted:
April 22, 2026

Expiration:
April 22, 2026

Employment Type:
Fulltime
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Cloud Resilience & Disaster Recovery Engineer

Lead Rubrik Backup Engineer IV

The Rubrik Backup Engineer IV is a senior technical specialist responsible for t...
Location
Location
India
Salary
Salary:
Not provided
rackspace.com Logo
Rackspace
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 18-20 years of experience in IT infrastructure with at least 4 years of hands-on experience in Rubrik
  • Proven track record of managing complex enterprise-scale backup environments
  • Experience with backup and recovery for databases (MSSQL, Oracle), file servers, and virtual machines
  • Bachelor's degree in Computer Science, Information Technology, or equivalent work experience
  • Expert knowledge of Rubrik CDM architecture, RBS, Polaris, and Rubrik APIs
  • Advanced skills in backup for virtualized environments (VMware, Hyper-V)
  • Strong understanding of file-level, database-level, and VM-level backup and restore operations
  • Deep knowledge of cloud-native backups and cloud archiving using AWS S3, Azure Blob, and GCP storage
  • Hands-on experience with integration and automation (e.g., Python, PowerShell, REST API, Terraform, Ansible)
  • Proficiency in disaster recovery design, planning, and orchestration (DR runbooks)
Job Responsibility
Job Responsibility
  • Serve as the highest level of technical escalation for Rubrik-related incidents and issues
  • Architect and implement Rubrik backup solutions across hybrid, on-premises, and multi-cloud environments (AWS, Azure, GCP)
  • Lead backup and recovery strategy design sessions for customers, including air-gapped, immutable, and ransomware-resilient architectures
  • Integrate Rubrik with external systems (e.g., ServiceNow, Splunk, vSphere, Azure AD) using REST APIs and automation tools (Python, Ansible, Terraform)
  • Design and maintain Rubrik SLA Domains, archival policies (cloud/tape), replication, and compliance workflows
  • Collaborate with Engineering, Storage, Security, and Application teams to ensure backup consistency and performance
  • Manage large-scale Rubrik clusters, capacity planning, and software upgrades
  • Proactively identify and resolve systemic issues across infrastructure that impact backup performance or restore SLAs
  • Document architectures, runbooks, and SOPs
  • contribute to technical training and playbooks
  • Fulltime
Read More
Arrow Right

Systems Engineering Lead

My client are seeking a highly skilled and motivated Cloud Technical Lead with a...
Location
Location
Ireland , Dublin 1
Salary
Salary:
Not provided
solasit.ie Logo
Solas IT Recruitment
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Information Technology, or related field
  • 8 years of experience in cloud engineering, with a strong background in designing and deploying cloud solutions
  • Expertise with Kubernetes, including hands-on experience in managing and orchestrating containerized applications
  • Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud, and related services (e.g., EC2, S3, Lambda, GKE, AKS)
  • Experience with Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, or similar
  • Experience with multi-cloud environments and hybrid cloud architecture
  • Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK stack, or others
  • Knowledge of container registries and service meshes (e.g., Istio, Linkerd)
  • Experience with Agile development methodologies and working in a DevOps culture
  • Strong proficiency in scripting and automation tools (e.g., Python, Bash, Ansible)
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable, reliable, and cost-efficient cloud-based solutions using AWS, Azure, Google Cloud, or other cloud platforms
  • Drive the adoption of Kubernetes and containerization best practices for microservices architecture, including the orchestration, deployment, and management of Kubernetes clusters
  • Provide technical leadership and mentorship to a team of cloud engineers, ensuring adherence to cloud engineering best practices
  • Collaborate with software developers, DevOps engineers, and other teams to implement cloud-native applications, automation, and CI/CD pipelines
  • Ensure cloud infrastructure is secure, resilient, and meets compliance requirements, working closely with security teams to mitigate risks
  • Optimize cloud infrastructure performance and costs, providing recommendations for improvements and helping track usage
  • Troubleshoot and resolve technical issues related to cloud infrastructure, Kubernetes clusters, and services
  • Participate in architecture and design reviews to ensure solutions meet high availability, disaster recovery, and scalability requirements
  • Stay up to date with the latest cloud technologies, trends, and innovations, and propose enhancements to the cloud infrastructure strategy
  • Ensure proper documentation of cloud systems, architectures, and processes
Read More
Arrow Right

Staff Security Engineer, Business Continuity & Disaster Recovery

We're seeking a Business Continuity and Disaster Recovery (BCP/DR) Senior Engine...
Location
Location
India
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of hands-on experience with cloud infrastructure (AWS required
  • GCP/Azure beneficial)
  • Deep expertise in enterprise backup and recovery solutions (Veeam, Commvault, AWS Backup, or similar)
  • Strong understanding of cloud storage services (S3, EBS, EFS, RDS, DynamoDB, etc.)
  • Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi)
  • Experience with containerized environments (ECS, EKS, Docker) and their backup/recovery patterns
  • Knowledge of database backup and recovery procedures (PostgreSQL, MySQL, MongoDB, etc.)
  • Understanding of storage technologies, replication methods, and data protection architectures
  • 3+ years of experience in Business Continuity Planning and Disaster Recovery
  • Proven track record of designing and implementing BCP/DR programs for technology organizations
Job Responsibility
Job Responsibility
  • Design and implement comprehensive BCP/DR programs aligned with industry frameworks (ISO 22301, NIST SP 800-34, ISO 27001)
  • Conduct Business Impact Analyses (BIA) to identify critical business functions, dependencies, and recovery priorities
  • Define and maintain Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for all critical systems and services
  • Develop and maintain disaster recovery playbooks and runbooks for various incident scenarios
  • Create and manage crisis communication frameworks for security incidents and business disruptions
  • Lead tabletop exercises and disaster recovery drills to validate recovery procedures
  • Design and implement backup and recovery solutions for AWS cloud infrastructure (primary focus)
  • Build automated backup workflows for databases, storage systems, applications, and configurations
  • Implement immutable backup strategies and offsite replication for ransomware resilience
  • Monitor backup operations, validate recovery procedures, and maintain backup integrity
Read More
Arrow Right

Middleware Support Engineer

The Middleware Support Engineer in Allianz Technology Malaysia, Regional Deliver...
Location
Location
Malaysia , Kuala Lumpur
Salary
Salary:
Not provided
https://www.allianz.com Logo
Allianz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, information technology, or a related field
  • Proven experience in a middleware support or system administration role
  • Strong understanding of middleware technologies such as IBM WebSphere, Oracle WebLogic, JBOSS, Apache Kafka, or similar
  • Familiarity with cloud-based middleware solutions and integration platforms
  • Strong analytical and problem-solving skills
  • Excellent communication and interpersonal skills
  • Ability to coordinate and work independently and as part of a team
  • Certifications in middleware technologies or related areas are a plus
Job Responsibility
Job Responsibility
  • System Monitoring: Supervising and monitor middleware environments to ensure optimal performance and availability
  • Issue Resolution: Diagnose and resolve middleware-related issues, including performance bottlenecks, connectivity problems, integration failures and come out RCA and implement solutions
  • Maintenance and Upgrades: Supervising on regular maintenance, patching and upgrades of middleware systems to ensure they are up-to-date and secure
  • Support and Troubleshooting: Supervising and provide technical support to application developers and IT teams regarding middleware-related queries and issues
  • Vendor Management: Collaborate with vendors and service providers to evaluate new technologies and manage procurement processes
  • Documentation: Maintain accurate documentation of middleware configurations, processes, and issue resolutions
  • Collaboration: Supervising and work closely with application development teams and IT support staff to ensure seamless integration and operation of middleware solutions
  • Security Management: Implement and maintain security measures to protect middleware environments and ensure data integrity
  • Continuous Improvement: Identify opportunities to optimize middleware performance and improve support processes
  • Cloud Management: Plan, manage, and monitor cloud-based infrastructure. Implement and manage cloud security measures to protect data and systems
  • Fulltime
Read More
Arrow Right

Director, North America Infrastructure Operations & Reliability

Alimentation Couche-Tard (Circle K) seeks a highly experienced, driven, and dyna...
Location
Location
United States of America , Tempe
Salary
Salary:
Not provided
https://www.circlek.com Logo
Circle K
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 10 years of demonstrated progressively responsible experience and successful Infrastructure and operations management of distributed global platforms
  • strong ability to identify needs, take initiative, and prioritize work efforts, balancing operational tasks with longer-term strategic security efforts
  • proven success in establishing key performance indicators, metrics, and focus to drive operational/service delivery best practices
  • meticulous planning skills with a balance of risk management and efficient execution
  • establish and balance priorities between new initiatives and sustaining operations engineering work
  • ability to establish and maintain trust and rapport with the team and external constituents
  • experience leading and developing multiple team members and managed service providers
  • strong knowledge and understanding of infrastructure operations and reliability best practices in a high-volume and critical production service environment
  • experience managing vendor relationships for all infrastructure services and solutions and reviewing vendor contracts, statements of work, and related documents
  • experience in DevOps and Infrastructure and Application migration to cloud
Job Responsibility
Job Responsibility
  • Lead a multi-disciplinary North America focused team, in close partnership with managed service providers, to establish roadmaps and successful implementation of technology standards, including hosting, network, storage, workplace, desktop, and other datacenter infrastructure
  • build strong relationships with company leaders and departments across the organization to understand the business, share knowledge, and foster a collaborative, supportive environment when recommending technology solutions to meet business objectives
  • partner with cybersecurity and risk management teams to ensure the infrastructure meets security requirements and evolves over time to meet changing needs and best practices
  • drive application migration to the cloud, embedding DevOps and observability tooling to enhance delivery and monitoring
  • implement observability best practices and tooling to monitor the effectiveness of the delivery of application and infrastructure services
  • work closely with the Operational Resiliency team, develop and implement infrastructure disaster recovery protocols to minimize disruption to business operations in the event of emergency situations
  • develop and report on relevant KPIs and metrics to drive operational maturity, improved customer experience, and aid in transparency and understanding across the business of the infrastructure organization’s contributions
  • strong focus on leadership and development of team members and extended team members of managed service partners
  • ensure professional growth, setting direction/priorities, delegating tasks, resolving conflicts, and fostering a winning culture with high-performance-oriented team members
What we offer
What we offer
  • Reasonable accommodation under the terms of the ADA and certain state or local laws
Read More
Arrow Right

Senior Technology Resilience and Operations Leader

Senior Program Manager, Technology Resilience & Operations Leader responsible fo...
Location
Location
United States , Iselin
Salary
Salary:
175000.00 - 230000.00 USD / Year
citizensbank.com Logo
Citizens Bank
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ten or more years of experience in technology program management, operational resilience, technology risk, cloud engineering, or enterprise technology leadership
  • Demonstrated experience leading complex, cross functional enterprise programs with regulatory and operational impact
  • Strong knowledge of technology resilience testing, cloud architecture principles, and observability practices
  • Experience working with third party risk frameworks, regulatory expectations, and contract control requirements
  • Prior experience supporting or managing mission critical operational centers such as NOC, TOC, or SOC
  • Proven ability to influence and drive execution across matrixed organizations without direct authority
  • Strong communication, stakeholder management, and executive reporting skills
Job Responsibility
Job Responsibility
  • Lead the enterprise technology resilience program, including strategy, roadmap, execution cadence, and governance
  • Develop and maintain technology resilience testing frameworks aligned with regulatory, industry, and internal standards
  • Coordinate with engineering, infrastructure, and application teams to plan and execute resilience, failover, and chaos testing exercises
  • Establish centralized program oversight for critical asset mapping, scenario design, testing schedules, issue tracking, and remediation management
  • Define, track, and report resilience metrics, dashboards, test coverage, and issue aging to senior leadership and governance forums
  • Drive continuous improvement initiatives across disaster recovery, high availability, and fault tolerant design practices
  • Lead cloud governance and resilience guardrail initiatives in partnership with enterprise architecture, cloud engineering, and risk teams
  • Define minimum resilience design requirements for cloud native and hybrid solutions, including multi availability zone patterns, automated failover, observability, and dependency management
  • Program manage the integration of resilience controls into reference architectures, delivery pipelines, and automated policy enforcement
  • Develop and maintain standards, playbooks, and guidance to support consistent and resilient cloud adoption
What we offer
What we offer
  • medical, dental, and vision coverage
  • retirement benefits
  • parental leave
  • flexible work arrangements
  • education reimbursement
  • wellness programs
  • paid time off
  • Fulltime
Read More
Arrow Right

Senior Principal Technical Program Manager - Data Platform

You will shape a modern technical program management organization. The goal is t...
Location
Location
United States , San Francisco
Salary
Salary:
191600.00 - 307800.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years experience working with software teams
  • 4+ years recent experience leading platform and software teams in a similar Technical Program Management or Technical Product Management role
  • Experience building commercial Cloud Services / Platforms
  • Experience in designing and building back end software systems, including tradeoffs, launching and scaling
  • Experience leading strategy and execution on complex, cross divisional, technical programs, including analysing business priorities, customer needs, industry trends and articulating a long-term roadmap
  • Experience driving projects spanning multiple teams, including reaching agreements with your engineering partners and stakeholders, shepherding the projects while identifying and mitigating risks, making trade-off decisions optimising the outcome
  • Able to translate customer and/or product requirements into technical requirements
Job Responsibility
Job Responsibility
  • Shape a modern technical program management organization
  • Steer through growth and remove barriers in the journey towards long-term goals
  • Find agreement by creating guardrails and removing barriers to help teams accelerate
  • Build resilience into systems to ensure service and data availability for customers in the event of failures in system components
  • Define specific systems programs and create a plan of action for realizing those programs
  • Partner with and influence engineers and architects in making progress on problems
  • Take a systematic approach to engineering problems
  • Be accountable for the success of technical programs by managing the entire lifecycle from initiation to forecasting, budgeting, scheduling, etc.
  • Manage complex dependencies and projects with a broad scope across the company
  • Collaborate with functions across the company to create reliable and cost-effective disaster recovery solutions for all of Atlassian’s services
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Bcp engineer

Lead Infrastructure and Application Disaster Recovery testing and Data Center Po...
Location
Location
Mexico , Guadalajara
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • Minimum 4-5 years of experience in technology stack including infrastructure and application
  • Experience in Managing Resiliency testing for On-Prem Database, NAS, Object Storage, Block Storage etc.,
  • Understanding of disaster recovery procedures
  • Understanding of RTO, RPO and how these metrics are calculated
  • Knows differences between resiliency testing and cyber-attack recovery/Repave test.
  • Background in cyber-attack recovery
  • Background in disaster recovery.
  • Strong analytical, communication, interpersonal, problem solving, organizational and time management skills
  • Basic understanding of excel and the ability to manipulate data using excel Knowledge of basic excel formulas used in data manipulation
Job Responsibility
Job Responsibility
  • Lead Infrastructure and Application Disaster Recovery testing and Data Center Power-down events
  • Drive adoption of the mandated controls which are in place with application teams.
  • Provide guidance to application owners on how they can adapt a recovery procedure to adhere to the uplifted controls in place.
  • Disaster Recovery tests scope events to include the interdependencies of shared services, up-steam and downstream application dependencies, Order of recovery, etc.
  • Cyber Attack Recovery Testing Driving teams to become resilient and have the ability to recover during a cyber-attack, Test the cyber-attack recovery procedures.
  • Power-down events establish critical milestones, establish order of recovery, verify dependency of various infrastructure components
  • Coordinate and manage regulatory resiliency recovery tests, such as SIFMA's industry-wide exercises, SPOOR-related tests, and those guided by the Monetary Authority of Singapore (MAS), to ensure compliance with industry standards and regulatory requirements. This involves liaising with various internal & external teams, scheduling test activities, monitoring progress, and documenting outcomes to support robust audit and risk management processes
  • Identify gaps in process and procedures and enhance those processes.
  • Identify opportunities for automation
  • Oversee and manage the execution plans
Read More
Arrow Right