CrawlJobs Logo

Resilience Engineer

Portugal, Lisboa · Job Posted January 21, 2026
Apply Position
Job Link Share

Job Description

We are seeking a senior Resilience Engineer to own and evolve the stability, availability, and recoverability of our IoT platforms. This role operates at the intersection of system architecture, reliability engineering, and operational excellence, with end-to-end accountability for designing resilience into our services. You will define and govern resilience strategies, influence platform architecture, and partner across product, infrastructure, and engineering teams to ensure our systems continue to perform under failure, scale, and unexpected disruption.

Job Responsibility

  • Developing and governing resilience strategies across system architecture, deployment, monitoring, and incident response
  • Defining and tracking stability KPIs (e.g., MTTD, MTTR, error budgets), partnering with performance and operations teams to meet or exceed targets
  • Designing and implementing fault injection testing, chaos engineering practices, and scenario-based simulations to validate platform robustness
  • Collaborating with product, infrastructure, architecture and development teams to re-design services with built-in redundancy, failover, and graceful degradation
  • Driving automation and observability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation
  • Contributing to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BCDR), ensuring IoT systems remain resilient and recoverable in the face of unexpected disruptions
  • Owning the resilience roadmap and continuously assessing emerging threats, technologies, and architectural shifts to guide evolution of stability practices
  • Evangelizing a culture of resilience through internal communication, workshops, and post-incident learning programs
  • Deliver high-quality engineering solutions while continuously strengthening the resilience, scalability, and cost efficiency of our IoT platform
  • Consistently meet or exceed delivery expectations by prioritizing the highest-leverage resilience initiatives that improve customer experience, business outcomes, and financial performance
  • Build trusted, transparent, and outcome-driven relationships by providing clear technical direction and trade-off recommendations to business and engineering stakeholders.

Requirements

  • Educated to BSc degree level in Software Engineer or related discipline with Computer Science
  • Strong scripting and automation experience (e.g., Python, Bash, Go, PowerShell), with a demonstrated ability to replace manual processes with reliable, scalable automation
  • Proven experience designing and operating high-availability, fault-tolerant systems, including the use of chaos engineering techniques and proactive failure-mitigation strategies
  • Experience applying Business Continuity and resilience standards (e.g., ISO 22301) in the context of real-world platform design and operational readiness
  • Hands-on experience designing or integrating monitoring, alerting, and automated testing frameworks to support early fault detection and system validation
  • Broad experience working with Linux-based platforms across on-premises and cloud environments, with an understanding of how infrastructure choices impact reliability, scalability, and recovery
  • Deep expertise in Site Reliability Engineering principles, including SLOs/SLIs, error budgets, observability, toil reduction, and automation, with the ability to apply them at platform and system scale to guide architectural decisions and long-term resilience strategy
  • Proven ability to balance long-term platform stability with delivery velocity by making clear, data-driven trade-offs
  • Strong understanding of security principles, practices, and standards, and the ability to incorporate them into resilient, real-world technical solutions
  • Deep command of telemetry, logging, and alerting ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, Splunk), with the ability to design signals that enable early fault detection and informed decision-making
  • Experience defining meaningful SLIs and building dashboards that drive architectural insight, prioritization, and corrective action
  • Proven experience leading blameless post-incident reviews, root cause analysis, and systemic improvements across multiple teams
  • Expertise in identifying and addressing system bottlenecks, latency issues, and throughput constraints in distributed environments
  • Proficiency in forecasting demand, planning capacity, and managing system growth in a cost-efficient and sustainable manner
  • Strong track record of partnering with software engineering, infrastructure, product, and business teams to embed resilience into the full development lifecycle
  • Fluency in English.

What we offer

  • Hybrid Work Model - Flexible hybrid work model with 8-10 in-office days per month, managed by team leaders
  • Vodafone Products and Services - Employees get a mobile phone, free communication plan, data card, and various discounts on services and products
  • Recognition - Recognition programs for innovative, creative, high-potential employees and exemplary behaviors
  • Health and Well-being - Well-being Program offers nutrition and psychological consultations, webinars, workshops, and discounts on various services and products
  • Learning - Access to Communities of Practice and a customizable digital training platform with high-quality content (namely Harvard Business Publishing, Skillsoft and Speexx)
  • Local and International Mobility - Internal recruitment with local and international rotation opportunities across departments and roles.

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Resilience Engineer

8 matching positions

Systems Resilience Engineer, Lead

Utilize your technical expertise in support of cyberspace operations. Build your...
Location
Location
United States , Fort Meade
Salary
Salary:
99000.00 - 225000.00 USD / Year
boozallen.com Logo
Booz Allen Hamilton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience designing, implementing, and maintaining Linux environments with Ansible products
  • 5+ years of experience building and maintaining virtual environments using VMware
  • 2+ years of experience maintaining Layer 2 and Layer 3 networking devices
  • Experience supporting national cyber missions in offensive or defensive capacities, including USCC or the service cyber components
  • Ability to be flexible for shift work
  • TS/SCI clearance with a polygraph
  • Bachelor’s degree and 8+ years of experience within the cyber field, Master’s degree and 5+ years of experience within the cyber field, or 15+ years of experience within the cyber field in lieu of a degree
  • DoD 8570 IAT Level II Certification, including Security+ CE Certification
Job Responsibility
Job Responsibility
  • Utilize your technical expertise in support of cyberspace operations
  • Build your expertise and solve technical problems in a fast-paced, agile environment
  • Develop, implement, and maintain customer-focused database solutions in support of our warfighters
  • Participate in development, testing, and delivery of software products or components to maintain existing systems
  • Provide shift work for software or hardware support at government data centers
  • Serve as a member of an Agile software development team
  • Apply leading-edge principles, theories, and concepts and contribute to the development of new principles and concepts
  • Work on unusually complex problems and provide highly innovative solutions
  • Operate with substantial latitude for unreviewed action or decision
  • Mentor or supervise employees in both company and technical competencies
What we offer
What we offer
  • Health, life, disability, financial, and retirement benefits
  • Paid leave
  • Professional development
  • Tuition assistance
  • Work-life programs
  • Dependent care
  • Recognition awards program
Read More
Arrow Right

Senior Software Engineer, Enterprise Resilience

At Vanta, our mission is to help businesses earn and prove trust. We believe tha...
Location
Location
United States
Salary
Salary:
207000.00 - 244000.00 USD / Year
vanta.com Logo
Vanta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience operating services in multiple environments requiring strict compliance including FedRAMP
  • Technical lead in successfully driving large scale reliability initiatives across an entire product engineering organization
  • Played technical leadership roles on Infrastructure or platform teams
  • Experience with infrastructure, AWS services, and scaling platforms in fast-growing environments
  • Cares deeply about empowering other teams to build highly resilient and scalable production services
  • Thoughtful about trade-offs and has good product sense when creating highly available infrastructure/services
  • Open to using AI to amplify their skills and strengthen their work - demonstrating curiosity, a willingness to learn, and sound judgment in applying AI responsibly to improve efficiency and impact
Job Responsibility
Job Responsibility
  • Build and operate the systems that power Vanta’s FedRAMP environments, including automated release, vulnerability remediation, and evidence generation pipelines that meet strict compliance timelines
  • Design and maintain Vanta’s vulnerability management platform, automating detection, remediation, and compliance reporting across both FedRAMP and non-FedRAMP environments
  • Define and evolve Vanta’s production reliability framework, including SLOs, incident response patterns, observability standards, service catalog, metrics dashboards, and the Vanta SLA definition
  • Improve incident response workflows and systems for faster recovery
  • Engineer reliability improvements for CI and deploy workflows, reducing production friction and operational load, while maintaining deployment velocity
  • Collaborate with product teams to embed reliability best practices, guiding operational readiness reviews and helping teams design for resilience
  • Lead design and improvement of datacenter and environment build-outs for future FedRAMP levels and regional expansion
  • Identify and solve complex scalability and performance challenges, particularly related to service reliability and data throughput
  • Work with talented and kind engineers to make a significant impact on our customer base, enabling them to improve their security and prove it
  • Contribute to building Vanta’s engineering culture as we grow
What we offer
What we offer
  • Offers Equity
  • Medical benefits
  • 401(k) plan
  • Other company perk programs
  • Comprehensive medical, dental, and vision coverage, with 100% of employee-only benefit premiums covered for most medical plans
  • 16 weeks fully-paid Parental Leave for all new parents
  • Health & wellness stipend
  • Remote workspace, internet, and cellphone stipend
  • Commuter benefits for team members who report to the SF and NYC office
  • Family planning benefits
  • Fulltime
Read More
Arrow Right

Senior AWS DevOps Engineer (Test & Infrastructure Resilience)

A top-tier consultancy firm is looking for an experienced AWS DevOps Engineer wi...
Location
Location
United Kingdom , City of London
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong background in testing (including APIs)
  • Self-motivated and results-driven
  • Capable of thriving in an Agile team
  • Comfortable with Confluence for documentation and Jira for project tracking and story management
  • Must be eligible for SC Clearance
  • Experience with AWS services (Compute, Identity)
  • Experience with Vault, Consul, Kubernetes, Prometheus, ELK, Jenkins, Python, Ansible, and/or Bash
  • Networking advantage with proxies and firewalls including Fortinet and Palo Alto
Job Responsibility
Job Responsibility
  • Develop and structure a comprehensive test framework
  • Create approaches and plans, gain approval for those approaches
  • Deliver a complex set of functional and performance tests for software components running within an AWS multi-account model
  • Work alongside client teams and support in a 3rd line capacity as required
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you...
Location
Location
United States , New York, New York; Richmond, Virginia
Salary
Salary:
179400.00 - 245600.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects with deep experience in platform engineering, machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of customers achieve financial empowerment
  • Utilize programming languages like Python, and Golang, along with container orchestration tools including Docker and Kubernetes, configuration management tools including Ansible and Terraform, and a variety of AWS tools and services
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Senior Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Senior Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)....
Location
Location
United States , McLean; Richmond
Salary
Salary:
209000.00 - 262400.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 6 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 6 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Work within and across Agile teams to design, develop, test, implement, and support technical solutions across full-stack development tools and technologies
  • Lead the craftsmanship, availability, resilience, and scalability of your solutions
  • Bring a passion to stay on top of tech trends, experiment with and learn new technologies, participate in internal & external technology communities, and mentor other members of the engineering community
  • Encourage innovation, implementation of cutting-edge technologies, inclusion, outside-of-the-box thinking, teamwork, self-organization, and diversity
  • Work across boundaries to improve the velocity of your and other teams
  • Lead efforts to enable and simplify the use of new and existing AWS services
  • Work with product managers to understand desired application and platform capabilities and testing scenarios
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps (Cloud Operations Resilience Engineering). Do you...
Location
Location
United States , McLean; Plano; Richmond
Salary
Salary:
179400.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like Java, Python, SQL, Ruby and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Ansible and Terraform, and a variety of AWS tools and services
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, DevOps (Azure)(Cloud Operations Resilience Engineering)

Lead Software Engineer, DevOps ( Azure)(Cloud Operations Resilience Engineering)...
Location
Location
United States , McLean;Plano;Richmond
Salary
Salary:
179400.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
  • At least 3 years of experience in Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • At least 4 years of Unix or Linux system administration experience
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like Python and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Terraform, and a variety of AWS and Azure tools and services
What we offer
What we offer
  • Performance based incentive compensation
  • Health, financial and other benefits
  • Fulltime
Read More
Arrow Right

Lead Software Engineer, Full Stack (Cloud Operations Resilience Engineering)

Do you love building and pioneering in the technology space? Do you enjoy solvin...
Location
Location
United States , McLean, Virginia; Richmond, Virginia
Salary
Salary:
197300.00 - 225100.00 USD / Year
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree
  • At least 4 years of experience in software engineering (Internship experience does not apply)
  • At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)
Job Responsibility
Job Responsibility
  • Lead a portfolio of diverse technology projects and a team of developers with deep experience in distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, mentoring other members of the engineering community
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
  • Utilize programming languages like JavaScript, Java, HTML/CSS, TypeScript, SQL, Python, and Go, Open Source RDBMS and NoSQL databases, Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services
What we offer
What we offer
  • Performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • health, financial and other benefits
  • Fulltime
Read More
Arrow Right