Lead Reliability Engineer Job at Ryanair - Europe's Favourite Airline (Dublin)

Lead Reliability Engineer

This is a key leadership role, responsible for driving reliability, asset perfor...

Location

United Kingdom , Hereford

Salary:

41000.00 - 44000.00 GBP / Year

Avara Foods

Expiration Date

Until further notice

Requirements

HND or above in Engineering (Mechanical, Electrical, or related discipline)
Proven experience in reliability, maintenance, or engineering leadership in an FMCG or manufacturing environment
Strong understanding of maintenance systems (CMMS), asset management, and performance metrics (OEE, MTBF, MTTR)
Demonstrable leadership, coaching, and influencing skills
Excellent analytical, problem-solving, and communication abilities
Ability to manage multiple priorities and work effectively across teams

Job Responsibility

Lead and manage the Reliability Team, ensuring effective delivery of asset performance, maintenance planning, and reliability projects
Act as lead for reliability and asset care, championing continuous improvement across site
Develop and sustain proactive maintenance strategies, including predictive and condition-based maintenance, to improve equipment availability and reduce unplanned downtime
Analyse performance and downtime data to identify and eliminate root causes of equipment failure
Collaborate with the wider Engineer Team to coordinate planned maintenance, improvement activities, and engineering support during production
Support the Engineering Reliability Manager in the development and execution of the site’s maintenance and reliability roadmap
Maintain the office, reliability- and outside areas to high standard, ensuring regular checks are conducted and satisfactory feedback is received from GMP/WPW audits
Lead cross-functional reliability reviews, ensuring effective communication between Engineering, Operations, Planning, and technical teams
Manage contractor and OEM support, ensuring all work complies with site safety, technical, and legislative standards
Ensure all rectification actions identified on service reports are followed up and completed in a timely manner

What we offer

6% Pension
31 Days Holiday
Life Assurance
Private Medical Health Cover
Subsidised Canteen
Free Staff Parking
Wellbeing and lifestyle benefits, including discounts with major retailers and access to health resources

Fulltime

Site Reliability Engineer (Lead)

10Pearls is an award-winning end-to-end digital innovation company that helps bu...

Location

Pakistan , Islamabad

Salary:

Not provided

10Pearls

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science or related field
5–8 years in SRE or production-engineering roles running distributed systems at scale
Deep Kubernetes expertise — operators, RBAC, network policy, storage, upgrades
Hands-on with Keycloak / Vault / MinIO / Harbor / Kong or equivalent identity/secrets/storage/registry/gateway stacks
Strong Linux fundamentals and at least one systems language (Go, Rust) or shell/Python for tooling
Proven SLO/SLI authorship and error-budget-driven decision-making
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, Loki, Tempo)
Calm, clear communication during incidents
strong post-mortem writing
Hands-on with infra-as-code — Helm, Kustomize, Terraform

Job Responsibility

Substrate operation — own the Kubernetes cluster plus Keycloak (identity), Vault (secrets), MinIO (object storage), Harbor (registry), Kong (gateway) — from bootstrap to day-2 operations
SLO framework — define, publish, and defend SLOs for every tier-1 service
own error budgets and burn-rate alerting
Incident response — build the on-call rotation, paging, runbook library, and post mortem culture
lead incident command during P1/P2 events
Release operations — co-own the blue-green / canary release model with L6 Delivery
sign off production-bound releases
Air-gap operations — ensure every operational runbook works in a fully offline environment — no assumption of external dependencies
Lead the Platform squad — technically lead 1 Infrastructure Engineer, 1 Observability Engineer, 2 DevOps Engineers
set standards for infra-as-code and automation

Fulltime

New

Lead Site Reliability Engineer

Trimble is looking for a Site Reliability Engineering Lead to join Business Syst...

Location

India , Chennai

Salary:

Not provided

Trimble Inc.

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field
7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles with at least 2+ years in a leadership or mentoring capacity
Deep AWS expertise (EC2, S3, RDS, IAM, VPC, Lambda, CloudFormation/Terraform, etc.)
Strong knowledge of Infrastructure-as-Code (IaC) using Terraform, AWS CDK, or CloudFormation
Proven experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, or similar)
Proficiency in containerization and orchestration (Docker, Kubernetes, ECS, or EKS)
Expertise in monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch, etc.)
Strong scripting or programming background (Python, Bash, or Go)
Sound understanding of networking, security, and identity/access management in the cloud
Experience designing high-availability and disaster recovery strategies for critical workloads

Job Responsibility

Become well-versed in the opportunities and challenges of the business and Trimble's customers
Become an expert in Business Systems services, especially the interfaces—APIs, protocols (e.g. OAuth), and user interfaces
Establish, then utilize tight working relationships with stakeholders across the company, especially Trimble's engineering community
Prototype and create proofs of concept as required
Scope and deploy new integrations
Investigate, diagnose, and solve customer integration issues
Effectively communicate technical issues with stakeholders in non-technical language
Contribute to utilities and SDKs to help integration and migration efforts

Fulltime

New

Lead Site Reliability Engineer/ Expert

Responsible for ensuring highly reliable, scalable, and resilient production sys...

Location

Egypt; India , Cairo; Delhi

Salary:

Not provided

SITA

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field. Master’s degree preferred for senior roles
Relevant certifications such as ITIL, CCNP/CCIE, Palo Alto Security, SASE, SDWAN, Juniper Mist/Aruba, CompTIA Security+, or Certified Kubernetes Administrator (CKA)
Certifications in cloud platforms (AWS, Azure, Google Cloud) or DevOps methodologies
Certifications in automation and IaC tools (Ansible, Terraform)
Certifications in observability and monitoring platforms (Dynatrace, Prometheus, Grafana, ELK)
Certifications in ServiceNow, Jira, or other operational tooling
8+ years in IT operations, service management, or infrastructure reliability, including roles such as Site Reliability Engineer, Problem Manager, or DevOps Engineer
Strong experience with high availability systems, resilience engineering, and DR readiness
Deep expertise in RCA, incident management, PMIR, and implementing permanent fixes for recurring issues
Hands on experience with CI/CD, automation, IaC, and self healing/auto remediation workflows

Job Responsibility

Design & maintain resilient systems ensuring high availability, scalability, and fault tolerance
Ensure effective Disaster Recovery (DR), failover strategies, and resilience engineering across environments
Improve platform reliability, observability, and performance across cloud and on‑premises systems
Establish and maintain SLIs, SLOs, and error budgets to measure and govern service reliability
Take ownership of production availability, capacity planning, performance tuning, and long‑term reliability initiatives
Drive automation for infrastructure provisioning, deployment, monitoring, and operational workflows
Develop and implement auto‑remediation and self‑healing solutions to reduce manual intervention
Manage CI/CD pipelines and Infrastructure as Code (IaC) frameworks for secure, repeatable deployments
Implement and manage zero‑downtime deployment strategies (blue‑green, canary, rolling)
Support containerized and cloud‑native platforms including Kubernetes, Docker, and distributed systems

What we offer

Work from home up to 2 days/week (depending on your team's needs)
Make your workday suit your life and plans
Take up to 30 days a year to work from any location in the world
Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year
Champion Health - a personalized platform that supports a range of wellbeing needs
Access to world-class learning platforms and programs (LinkedIn Learning, Microsoft's Enterprise Skills Initiative, Airport Council International, Pluralsight, Harvard Business Publishing, Stanford)
Competitive benefits that make sense with both your local market and employment status

Fulltime

Technical Lead-Site Reliability Engineer

We are seeking an experienced Site Reliability Engineer to support Vodafone’s st...

Location

Egypt , Cairo

Salary:

Not provided

Vodafone

Expiration Date

Until further notice

Requirements

Experienced in Site Reliability Engineering, DevOps, or production support roles within complex, enterprise-scale environments
Skilled in Unix/Linux administration with strong shell scripting experience
Experienced with CI/CD tools such as Git, Jenkins, Nexus, SonarQube, and configuration or automation tools
Proficient in infrastructure as code using tools such as Terraform or CloudFormation
Comfortable working with public cloud platforms such as AWS or Azure
Able to develop using one or more high-level programming languages, including Python, Java, or JavaScript
Experienced in containerisation and orchestration technologies, including Docker and Kubernetes
Familiar with monitoring and observability tools such as Prometheus, Grafana, CloudWatch, or Centreon
Knowledgeable in microservices architecture, APIs, and web services (REST, SOAP, JSON, XML)
Experienced with relational and NoSQL data stores such as PostgreSQL, MariaDB, Redis, MongoDB, or similar technologies

Job Responsibility

Drive reliability, availability, and performance across IoT platforms through proactive monitoring, automation, and operational improvements
Design, deploy, review, and troubleshoot technical integrations with multiple platforms, services, and connected devices
Implement and enhance CI/CD practices to enable high levels of operational automation and zero-touch operations
Partner with development teams to improve services through rigorous testing, release management, and operational readiness
Act as a technical subject matter expert, supporting and coaching team members to build capability across relevant technologies
Lead and support incident and problem management activities, ensuring timely resolution, root cause analysis, and preventive actions in line with agreed SLAs
Contribute to system design reviews, including HLDs and LLDs, translating architectural decisions into operational requirements
Balance feature delivery speed with platform reliability through clearly defined service level objectives
Design, implement, and continuously enhance monitoring, alerting, and observability solutions to maintain a holistic view of system health
Manage production environments through proactive capacity planning, performance optimisation, and release deployments

What we offer

The opportunity to work on large-scale, business-critical IoT platforms with global reach
Exposure to modern cloud-native architectures, DevOps practices, and automation at enterprise scale
Collaboration with international teams across Vodafone Group and strategic partners
A role that blends hands-on engineering with system design, reliability strategy, and continuous improvement
A supportive environment that values learning, knowledge sharing, and professional growth

Lead Site Reliability engineer

Solution, Reliability and Monitoring Entity main objective is to define, provide...

Location

India , Bangalore

Salary:

Not provided

Airbus

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s degree in Computer Science, information technology or other related discipline with 7+ years of experience
Solid experience designing and building secure solutions in AWS (Amazon Web Services)
Extensive experience in systems administration or a combination of software/systems experience
Some experience in scripting and automation of asset
Solid knowledge of Operating Systems & ability to perform troubleshooting required
Extensive knowledge of Cloud Technology concepts & ability to perform complex troubleshooting required
Solid knowledge of networking for enterprise environments required
Solid knowledge of Virtual Machine concepts and management of infrastructure
Demonstrated ability to identify root cause of issues and to recommend permanent, long term, fixes
Demonstrated ability to perform complex troubleshooting in AWS environment and providing guidance to other teams

Job Responsibility

Define, implement, and manage cloud-based infrastructure
Work closely with the Software Factory’s (SWF) Solution Architects to facilitate the transition from Development to In-Support phase
Creating/Animating an hosting network with SWF
Representing Hosting Group in the different Trains
Coordinating with Solution Architects (SAs) to support the technical architecture decisions related to Hosting
Supporting SWF for new components onboarding
Coordinate with SWF Systems & Architecture team for future planning
Contribute to Prioritization Reviews for the different trains
Guide products in Service Level Objectives (SLO) definitions & monitoring based on Hosting Operations feedbacks
Define, share and broadcast Guidelines and Non-Functional Requirements (NFR) related to: hosting, deployment and monitoring

Fulltime

Lead Site Reliability Engineer

Glean is seeking a Site Reliability Engineering Lead to foster a culture of engi...

Location

United States , Palo Alto

Salary:

200000.00 - 260000.00 USD / Year

Glean

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, a related field, or equivalent practical experience
8+ years of experience in a senior-level role within Site Reliability Engineering or similar role, particularly in managing cloud-based services and infrastructure
5+ years of experience with software development in one or more programming languages
3+ years of experience managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems running in Cloud
Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure
Practical experience with containerization technologies, including Docker and Kubernetes
Familiarity with infrastructure as code tools like Terraform is essential
Solid understanding of networking, security principles, and best SRE and security practices
Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively

Job Responsibility

Foster a culture of engineering excellence, drive technical strategy, and develop a high-performing, collaborative team
Ensure services meet stringent Service Level Objectives (SLOs)
Build resilient, automated production environments in the cloud
Lead a team and be responsible for products globally
Provide technical leadership to key projects
Manage the complex challenges of scale and fast growth
Keep Glean applications up and running
Drive technical excellence and foster a culture of reliability across engineering teams
Set best practices for incident management, performance optimization, and automation
Influence best practices, drive cross-team collaborations, and contribute to the execution of key objectives

What we offer

Comprehensive benefits package
Medical, Vision, and Dental coverage
Generous time-off policy
Opportunity to contribute to 401k plan
Home office improvement stipend
Annual education and wellness stipends
Vibrant company culture through regular events
Healthy lunches daily

Fulltime

Lead Site Reliability Engineer

Our client is committed to building trust and making the world more agreeable fo...

Location

Salary:

Not provided

N-iX

Expiration Date

Until further notice

Requirements

8+ years of experience in a relevant programming language
Extensive knowledge of Cosmos DB management and optimization
Strong Terraform IaC deployment experience
Proven ability to interact with stakeholders and promote best practices
Dashboarding/data visualization experience

Job Responsibility

Identify and assess Cosmos DB resource utilization and recommend optimization strategies
Engage directly with resource owners to present findings and implement rightsizing
Design, build, and maintain dashboards to visualize Cosmos DB usage and opportunities for improvement
Develop Terraform-based solutions for efficient cloud database management
Stay updated on best practices around cloud cost optimization and security

What we offer

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

Select Country

Lead Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?