This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join us as a Senior Site Reliability Engineer for CIAM at Barclays, where you will bring to life a new digital platform capability, transforming and modernizing our digital estate to build a market-leading digital offering with customer experience at its heart. This is an exciting and key role, partnering with business-aligned engineering and product teams to ensure a collaborative team culture is at the heart of what we do.
Job Responsibility:
Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning
Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring
Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience
Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning
Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations
Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth
Requirements:
Experience in designing, implementing, deploying, and running highly available, fault-tolerant, auto-scaling, and auto-healing systems
Considerable expertise in AWS (essential), with Azure and GCP as a plus, including Kubernetes (ECS is essential
Fargate and GCE are a plus) and serverless architectures
Considerable experience in running disaster recovery and zero-downtime solutions, and in designing and implementing continuous delivery across large-scale, distributed, cloud-based microservice and API service solutions with 99.9%+ uptime
Hands-on experience coding in Python, Bash, and JSON/YAML (configuration as code)
The ability to drive reliability best practices across engineering teams, embed SRE principles into the DevSecOps lifecycle, and partner with engineering, security, and product teams to balance reliability and feature velocity
Nice to have:
Experience in hands-on configuration, deployment and operation of ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with embedded security gates
HTTP header signing, access token and data at rest encryption, PKI based self-sovereign identity, or open source