This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join us as a Site Reliability Engineer for CIAM at Barclays, where you will bring to life a new digital platform capability, transforming and modernizing our digital estate to build a market-leading digital offering with customer experience at its heart. This is an exciting and key role, partnering with business-aligned engineering and product teams to ensure a collaborative team culture is at the heart of what we do.
Job Responsibility:
Bring to life a new digital platform capability, transforming and modernizing our digital estate to build a market-leading digital offering with customer experience at its heart
Partner with business-aligned engineering and product teams to ensure a collaborative team culture is at the heart of what we do
Apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them
Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning
Resolve, analyze and respond to system outages and disruptions, and implement measures to prevent similar incidents from recurring
Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience
Monitor and optimise system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning
Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations
Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth
Requirements:
Experience in designing, implementing, deploying, and running highly available, fault-tolerant, auto-scaling, and auto-healing systems
Experience with AWS (essential)
Azure and GCP are a plus
Familiarity with Kubernetes (ECS is essential
Fargate and GCE are a plus) and serverless architectures
Experience in running disaster recovery and zero-downtime solutions
Experience in designing and implementing continuous delivery across large-scale, distributed, cloud-based microservices and API service solutions with 99.9%+ uptime
Exposure to coding in Python, Bash, and JSON/YAML (Configuration as Code)
The ability to drive reliability best practices across engineering teams
The ability to embed SRE principles into the DevSecOps lifecycle
The ability to partner with engineering, security, and product teams to balance reliability and feature velocity
Nice to have:
Experience in configuration, deployment, and operation of ForgeRock COTS-based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with embedded security gates
Experience with HTTP header signing, access token and data-at-rest encryption, PKI-based self-sovereign identity, or open-source equivalents