Internal Kubernetes Platform Lead SRE Job at HSBC

Head of Platform & Infrastructure

Prolific is not just another player in the AI space – we are the architects of t...

Location

United Kingdom

Salary:

Not provided

Prolific

Expiration Date

Until further notice

Requirements

Proven leadership experience in a senior infrastructure, SRE, or platform engineering role, with a strong track record of building and leading high-performing teams
Deep expertise across cloud platforms, Kubernetes, and modern DevOps and DevSecOps practices
A strategic mindset with the ability to define and execute a long-term technology roadmap
Exceptional communication and stakeholder management skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences
A passion for mentoring and developing team members, creating a positive and collaborative environment
Experienced in managing suppliers and negotiating costs, and expertise with cloud cost forecasting, monitoring and optimisation

Job Responsibility

Lead with Impact: Leading, coaching, and empowering your teams to consistently deliver outstanding value to the wider engineering organization
Define and Execute Strategy: Help define and execute the platform, infrastructure, site reliability and service management vision and roadmap, aligning with the company's long-term business goals
Hands on Leadership: Actively participate in technical direction, design and execution as well as problem-solving to unblock and mentor teams, respond to incidents and be an escalation point for reliability of our systems
Drive Automation and Cloud Operations: Oversee cloud infrastructure and drive GitOps practices, such as infrastructure-as-code. Own cloud infrastructure and operations, ensuring platforms are monitored, available, scale appropriately, and cost-efficient
Ensure Operational Excellence: Establish good systems reliability engineering, DevSecOps, and service management practices
Enable Cloud Development: Provide the tools, guardrails and cloud infrastructure self-service capabilities for engineering teams to develop in the cloud
Improve Developer Experience: Closely collaborate with product engineering, ensuring that our internal tools and pipelines enable our engineers to work with greater efficiency and autonomy
Embed Security and Compliance: Partner with the product engineering teams to embed security best practices and tools into the software development and release process. Ensure that the platform maintains a good security posture
Own IT: Own the internal IT function and tech stack, ensuring our business applications, software, systems, and hardware support the company's growth and operational efficiency. You will also manage key tech supplier relationships

What we offer

competitive salary
benefits
remote working

Intermediate Software Engineer SRE – AI

At PointClickCare our mission is simple: to help providers deliver exceptional c...

Location

Canada , Mississauga

Salary:

115000.00 - 128000.00 CAD / Year

PointClickCare

Expiration Date

Until further notice

Requirements

5+ years' experience in software engineering
Experience with SRE principles
Experience with AI/ML in production environments
A passion for automation, intelligent systems, and operational excellence
Strong debugging, problem-solving, and system design skills
Languages: Python, Java, Bash, Terraform
Platforms: Azure, Kubernetes, Docker
Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
CI/CD: Jenkins, ArgoCD, Spinnaker

Job Responsibility

Build ML-based anomaly detection and pattern recognition systems
Enhance telemetry with smart tagging and metadata for better AI insights
Develop event-driven workflows and self-healing systems using AI triggers
Automate incident response with generative AI and custom AI agent orchestration
Use time-series forecasting and predictive modelling to anticipate failures
Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
Build scalable, fault-tolerant systems in a cloud-native environment
Participate in on-call rotations and lead incident response for critical systems
Skilled in API integration for streamlined data exchange and system connectivity
Run internal AIOps workshops and help teams adopt AI maturity models

What we offer

Benefits starting from Day 1
Retirement Plan Matching
Flexible Paid Time Off
Wellness Support Programs and Resources
Parental & Caregiver Leaves
Fertility & Adoption Support
Continuous Development Support Program
Employee Assistance Program
Allyship and Inclusion Communities
Employee Recognition … and more

Fulltime

Platform Engineer DevOps

We are looking for an experienced Platform Engineer DevOps to ensure that the fo...

Location

France , Paris

Salary:

Not provided

cozycozy

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience in Platform Engineering, Infrastructure or DevOps
Expertise in operating and scaling Kubernetes and Docker in production environments
Proven experience managing hybrid cloud / on-premises infrastructure for high-traffic applications
A strong background in designing and implementing robust CI/CD pipelines (GitLab CI, Jenkins, etc.)
Experience with Infrastructure as Code (Terraform, Ansible, etc.)
Experience with monitoring, alerting, and reliability practices (SRE principles)
The mindset to mentor and guide other engineers, fostering a culture of automation and operational excellence
Excellent communication skills in English
The demonstrated ability to drive complex projects

Job Responsibility

Implement, maintain and secure infrastructure (cloud, bare-metal, Kubernetes clusters)
Automate environment configuration using Infrastructure as Code (e.g.,Terraform, Ansible) and adhere to GitOps principles
Implement full-stack observability (metrics, logs, traces), sophisticated alerting, and participate in the incident management lifecycle
Ensure compliance with Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all managed services
Implement and manage secrets management systems
Contribute to the design and evolution of hybrid infrastructure
Define, lead, and maintain engineering standards for security, reliability, and technology selection across the organization, supporting the Head of Engineering in defining the platform roadmap
Drive continuous improvement initiatives for cloud cost optimization, scalability, performance, and platform security posture
Maintain comprehensive, up-to-date documentation and best practices to foster self-service and cross-team enablement
Design, implement, and maintain CI/CD pipelines (using GitLab CI, Github, and/or Jenkins) tailored for microservice architectures built with Node.js

What we offer

Competitive salary
stock options
Alan health insurance
Swile card
unlimited coffee, tea, snacks, and drinks in the office

New

Software Engineering Manager - Typescript

You will join one of BT’s Platform Engineering teams and take ownership of a sma...

Location

United Kingdom , Birmingham; Manchester; Bristol; London

Salary:

Not provided

Plusnet

Expiration Date

Until further notice

Requirements

Strong experience in TypeScript (preferred) or another object-oriented programming language, with 4+ years of professional software development
Solid hands-on experience with AWS, including cloud-native architecture principles
Experience designing and operating GitLab CI/CD pipelines
At least 2 years’ experience leading engineering teams, including hiring, mentoring, and performance management
A strong automation mindset and passion for building robust internal tooling
Practical experience with Kubernetes in production environments
Experience with Pulumi or other modern IaC tools
Familiarity with Dynatrace or similar observability platforms
Experience building or operating internal developer platforms or shared platform services
Strong understanding of DevOps and SRE principles

Job Responsibility

Lead, grow, and hire engineers specialising in TypeScript (preferred), Java or another OOP language, AWS, Kubernetes, and GitLab CI/CD
Provide technical leadership and architectural direction, ensuring high engineering standards
Remain hands-on where it adds value — contributing to code, design reviews, and technical decision-making
Build and evolve cloud-native platforms using Kubernetes and AWS-managed services
Collaborate with stakeholders to shape a platform roadmap aligned to developer and business needs
Champion automation and DevOps practices, removing manual processes wherever possible
Mentor engineers and senior ICs, supporting both technical growth and career progression
Foster an agile, metrics-driven culture focused on reliability, flow, and continuous improvement

What we offer

Competitive salary
25 days annual leave (plus bank holidays)
10% on target bonus
Life Assurance
Pension scheme
Direct share scheme
Option to join the Healthcare Cash Plan or other benefits such as dental insurance, gym memberships etc.
50% off EE mobile pay monthly or SIM only plans
Exclusive colleague discounts on our latest and greatest BT broadband packages
BT TV with TNT Sports and NOW Entertainment & 50% discount for friends and family on EE SIM Only plans & airtime element off a Flex Pay plan

Fulltime

New

Staff Infrastructure Security Engineer

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...

Location

United States , San Francisco; Bellevue; Sunnyvale; Denver

Salary:

Not provided

Crusoe

Expiration Date

Until further notice

Requirements

6+ years (or equivalent) hands-on experience in cloud security, DevOps, or infrastructure engineering
Deep expertise and proven track record deploying and managing HashiCorp Vault in an enterprise environment (experience with the Enterprise edition is highly preferred)
Expert-level knowledge of Secrets Management, X.509 PKI (Public Key Infrastructure), Certificate Authority Operations, and Cryptography concepts
Strong experience with Google Cloud Platform (GCP) and cloud native identity and access management (IAM)
Proficiency with Infrastructure as Code (IaC) tools, especially Terraform, for automating the deployment and configuration of Vault and its dependent infrastructure
Fluent in at least one programming language (ideally Go or Python)
Demonstrable experience with Kubernetes and container security principles, especially integrating secrets into microservices architectures
Strong understanding of network security concepts (IP addressing, IP routing, firewalls, segmentation, Zero Trust)

Job Responsibility

Strategic Architecture & Governance: Architect a highly available, disaster-resilient, and scalable multi-cluster secrets management platform that serves as the foundation for the organization’s Zero Trust strategy
Technical Leadership: Drive consensus across Cloud Engineering, DevOps, and SRE teams to define standardized secret management workflows and integrate security patterns into the SDLC
Compliance & Governance: Ensure the platform design meets rigorous internal policies and external compliance frameworks (e.g., SOX, ISO 27001)
Policy as Code: Design and implement advanced governance controls, including Sentinel Policy as Code, to automate security guardrails and access decisions
Platform Engineering & Implementation: Infrastructure as Code (IaC): Lead the engineering of the Vault infrastructure using Terraform, ensuring all deployments are reproducible, version-controlled, and automated
Identity Integration: Architect the integration between the secrets platform, Identity Providers (Okta), and workload identities (Kubernetes Service Accounts) to establish robust machine-to-machine authentication
Advanced Secrets Capabilities: Configure and tune essential secrets engines (KV, Transit, KMIP) and Enterprise features (Performance Replication, Seal automation) to support diverse engineering use cases
Operational Excellence & Developer Enablement: Vault as a Service (VaaS): Operationalize the platform by building self-service mechanisms, distinct "paved road" onboarding procedures, and documentation that allows engineering teams to easily consume security services
Observability: Implement comprehensive monitoring, alerting, and audit logging to ensure platform health, provide visibility into usage patterns, and satisfy audit requirements
Lifecycle Management: Own the full operational lifecycle of the production environment, including patching, version upgrades, backup/restore procedures, and incident response runbooks

What we offer

Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement

Fulltime

Site Reliability Engineer

At Tote, we’re on a mission to deliver a seamless and reliable digital experienc...

Location

United Kingdom , Wigan

Salary:

Not provided

360 Resourcing Solutions

Expiration Date

Until further notice

Requirements

Deep understanding of system reliability, performance optimisation, and cloud-native architectures
Strong hands-on experience with modern observability tools such as Grafana, Prometheus, and OpenTelemetry
Solid grasp of distributed systems and networking fundamentals
Confident working with infrastructure-as-code tools (like Terraform) and container orchestration platforms such as Kubernetes
Experience in cloud environments, ideally AWS
Comfortable coding in at least one modern programming language such as Go or .NET
Calm, analytical mindset for high-pressure situations
Advocate for modern engineering practices, championing DevOps culture, CI/CD pipelines, and automation
Strong communication skills

Job Responsibility

Monitor live production systems, using observability tools to detect potential issues before they impact users
Take proactive steps to optimise system performance and stability
Analyse telemetry data, identify bottlenecks, and drive improvements across infrastructure and applications
Lead the development of SRE strategy, defining standards, best practices, and ways of working
Work closely with engineering, operations, and product teams to shape SLAs, SLOs, and error budgets
Design and implement performance testing strategies to simulate peak traffic
Build intuitive dashboards, refine alerting systems, and create tools that provide clear visibility into system health
Work alongside software engineers to design scalable solutions
Work with compliance teams to meet internal and regulatory standards
Work with operations to ensure smooth deployment and monitoring

What we offer

Competitive Basic Salary
Discretionary Bonus Scheme
Company Shares Option Plan
Contributory pension scheme
Life insurance (4 x basic salary)
Simply Health Cash Plan
Holiday entitlement (33 days inclusive of bank holidays)
Study Support and opportunity for progression and development
Confidential 24/7 365 employee assistance helpline
Agile and collaborative office environment with free parking, fruit, biscuits, and drinks

Fulltime

Staff Software Engineer I - Internal Access Management

We are seeking a Staff Software Engineer to lead the technical vision, architect...

Location

Salary:

225100.00 - 264500.00 CAD / Year

Confluent

Expiration Date

Until further notice

Requirements

10+ years of engineering experience
4+ years in security, IAM, or distributed systems
Deep expertise in Kubernetes, workload identity, cloud IAM (AWS, GCP, Azure), and zero-trust architectures
Strong understanding of authentication technologies: IAM, OAuth2, OIDC, policy engines, and modern zero-trust principles
Proven track record leading multi-team technical initiatives at a Staff or Senior Staff level
Strong knowledge of distributed systems, cloud infrastructure, container orchestration, and service mesh
Excellent communication and stakeholder-influence skills across engineering and security domains

Job Responsibility

Define and drive the long-term architecture and roadmap for Internal Access Management across Kubernetes and multi-cloud environments
Architect and implement least privilege, just-in-time access, and zero-trust models across Confluent services
Build and evolve scalable access-authorization workflows and lifecycle management systems using technologies such as SPIFFE/SPIRE, OPA, cloud IAM policies, workload identity, and internal enforcement engines
Strengthen security boundaries through threat modeling, defense-in-depth practices, and comprehensive access-auditing capabilities
Partner with cross-functional teams—including Platform, Kafka, Observability, Developer Productivity, Release Engineering, and SRE—to drive adoption of secure identity and access patterns
Mentor senior engineers, elevate engineering standards, and influence architectural decisions across the organization
Communicate complex technical decisions clearly and align stakeholders across engineering and security

What we offer

Remote-First Work
Robust Insurance Benefits
Flexible Time Away
The Best Teammates
Experience Ambassadors
Open and Honest Culture
Well-Being and Growth
Offers Equity

Fulltime

New

Site Reliability Engineer

AutoRABIT is looking for a Site Reliability/DevSecOps Engineer to help develop, ...

Location

United States

Salary:

150000.00 - 175000.00 USD / Year

AutoRABIT

Expiration Date

Until further notice

Requirements

Experience with deployment and maintenance of scalable, resilient, and secure infrastructure with AWS, GCP, and/or Azure based infrastructure cloud and services and automation
Knowledge of key DevSecOps tools for monitoring (ELK, AWS Azure CloudWatch etc.), Infrastructure management platforms (Kubernetes, Docker, Ansible, Jenkins, Terraform etc.)
Experience with Shell Scripting (Bash), Python or equivalent is required
Knowledge of programming languages such as Python, Go, or Java
Experience with configuration management tools such as Ansible or Chef
Solid understanding of CI/CD pipelines and tools such as Jenkins, GitLab CI, or CircleCI
Excellent troubleshooting skills in SaaS, or customer environments
Team player, receiving and giving feedback as well as sharing knowledge
Can-do attitude: challenging status, leading, and contributing to key improvements and innovations, while maintaining accountability
Excellent written and verbal US English communication skills for working across a global team environment

Job Responsibility

Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service
Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing
Assist with resolution of AutoRABIT service or customer issues as required
Participate in and practice sustainable incident response and blameless postmortems
Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments
Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve
Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays

Fulltime

Internal Kubernetes Platform Lead SRE

HSBC

Location:
Poland

Category:
IT - Administration

Contract Type:
Employment contract

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
November 18, 2025

Expiration:
February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Internal Kubernetes Platform Lead SRE

Head of Platform & Infrastructure

Intermediate Software Engineer SRE – AI

Platform Engineer DevOps

Software Engineering Manager - Typescript

Staff Infrastructure Security Engineer

Site Reliability Engineer

Staff Software Engineer I - Internal Access Management

Site Reliability Engineer

Internal Kubernetes Platform Lead SRE

HSBC

Location:Poland

Category:IT - Administration

Contract Type:Employment contract

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:November 18, 2025

Expiration:February 17, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Internal Kubernetes Platform Lead SRE

Head of Platform & Infrastructure

Intermediate Software Engineer SRE – AI

Platform Engineer DevOps

Software Engineering Manager - Typescript

Staff Infrastructure Security Engineer

Site Reliability Engineer

Staff Software Engineer I - Internal Access Management

Site Reliability Engineer

Location:
Poland

Category:
IT - Administration

Contract Type:
Employment contract

Job Posted:
November 18, 2025

Expiration:
February 17, 2026