CrawlJobs Logo

Internal Kubernetes Platform Lead SRE

https://www.hsbc.com Logo

HSBC

Location Icon

Location:
Poland

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

HSBC is seeking an IKP Support Engineer (SRE) to join the IKP Team within the Hybrid Integration Platform. Responsibilities include ensuring reliability, performance, and scalability of IKP infrastructure, automating processes, troubleshooting issues, and providing 24x7 support. Ideal candidates have expertise in Kubernetes, Unix administration, and ITIL processes with strong communication skills.

Job Responsibility:

  • Ensure the reliability, availability, and performance of the infrastructure platform
  • Collaborate in diagnosing and resolving IKP infrastructure issues
  • Support the deployment, configuration, and maintenance of Kubernetes platform
  • Troubleshoot and resolve incidents, performance issues, and integration failures
  • Perform root cause analysis and implement reliability improvements
  • Provide 24x7 support as part of an on-call Rota
  • Plan duties and the other administrative tasks for a team in line with Polish Labor Code.

Requirements:

  • Solid technical knowledge and experience with Kubernetes administration
  • 3+ years of hands-on experience with Kubernetes administration
  • Strong knowledge of Kubernetes concepts and operations and troubleshooting tools
  • Understanding of containerization and orchestration
  • Experience with Unix administration skills
  • Experience with Service Meshes is a plus
  • Understanding of ITIL processes and automation skills
  • Familiarity with infrastructure as a code
  • Strong analytical and communication skills
  • Proficiency in English.

Nice to have:

  • Experience with Service Meshes
  • Familiarity with infrastructure as a code.
What we offer:
  • Competitive salary
  • Annual performance-based bonus
  • Additional bonuses for recognition awards
  • Multisport card
  • Private medical care
  • Life insurance
  • One-time reimbursement of home office set-up (up to 800 PLN)
  • Corporate parties & events
  • CSR initiatives
  • Nursery discounts
  • Financial support with trainings and education
  • Social fund
  • Flexible working hours
  • Free parking.

Additional Information:

Job Posted:
November 18, 2025

Expiration:
February 17, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Internal Kubernetes Platform Lead SRE

Head of Platform & Infrastructure

Prolific is not just another player in the AI space – we are the architects of t...
Location
Location
United Kingdom
Salary
Salary:
Not provided
prolific.com Logo
Prolific
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven leadership experience in a senior infrastructure, SRE, or platform engineering role, with a strong track record of building and leading high-performing teams
  • Deep expertise across cloud platforms, Kubernetes, and modern DevOps and DevSecOps practices
  • A strategic mindset with the ability to define and execute a long-term technology roadmap
  • Exceptional communication and stakeholder management skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences
  • A passion for mentoring and developing team members, creating a positive and collaborative environment
  • Experienced in managing suppliers and negotiating costs, and expertise with cloud cost forecasting, monitoring and optimisation
Job Responsibility
Job Responsibility
  • Lead with Impact: Leading, coaching, and empowering your teams to consistently deliver outstanding value to the wider engineering organization
  • Define and Execute Strategy: Help define and execute the platform, infrastructure, site reliability and service management vision and roadmap, aligning with the company's long-term business goals
  • Hands on Leadership: Actively participate in technical direction, design and execution as well as problem-solving to unblock and mentor teams, respond to incidents and be an escalation point for reliability of our systems
  • Drive Automation and Cloud Operations: Oversee cloud infrastructure and drive GitOps practices, such as infrastructure-as-code. Own cloud infrastructure and operations, ensuring platforms are monitored, available, scale appropriately, and cost-efficient
  • Ensure Operational Excellence: Establish good systems reliability engineering, DevSecOps, and service management practices
  • Enable Cloud Development: Provide the tools, guardrails and cloud infrastructure self-service capabilities for engineering teams to develop in the cloud
  • Improve Developer Experience: Closely collaborate with product engineering, ensuring that our internal tools and pipelines enable our engineers to work with greater efficiency and autonomy
  • Embed Security and Compliance: Partner with the product engineering teams to embed security best practices and tools into the software development and release process. Ensure that the platform maintains a good security posture
  • Own IT: Own the internal IT function and tech stack, ensuring our business applications, software, systems, and hardware support the company's growth and operational efficiency. You will also manage key tech supplier relationships
What we offer
What we offer
  • competitive salary
  • benefits
  • remote working
Read More
Arrow Right

Intermediate Software Engineer SRE – AI

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Platform Engineer DevOps

We are looking for an experienced Platform Engineer DevOps to ensure that the fo...
Location
Location
France , Paris
Salary
Salary:
Not provided
cozycozy.com Logo
cozycozy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in Platform Engineering, Infrastructure or DevOps
  • Expertise in operating and scaling Kubernetes and Docker in production environments
  • Proven experience managing hybrid cloud / on-premises infrastructure for high-traffic applications
  • A strong background in designing and implementing robust CI/CD pipelines (GitLab CI, Jenkins, etc.)
  • Experience with Infrastructure as Code (Terraform, Ansible, etc.)
  • Experience with monitoring, alerting, and reliability practices (SRE principles)
  • The mindset to mentor and guide other engineers, fostering a culture of automation and operational excellence
  • Excellent communication skills in English
  • The demonstrated ability to drive complex projects
Job Responsibility
Job Responsibility
  • Implement, maintain and secure infrastructure (cloud, bare-metal, Kubernetes clusters)
  • Automate environment configuration using Infrastructure as Code (e.g.,Terraform, Ansible) and adhere to GitOps principles
  • Implement full-stack observability (metrics, logs, traces), sophisticated alerting, and participate in the incident management lifecycle
  • Ensure compliance with Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all managed services
  • Implement and manage secrets management systems
  • Contribute to the design and evolution of hybrid infrastructure
  • Define, lead, and maintain engineering standards for security, reliability, and technology selection across the organization, supporting the Head of Engineering in defining the platform roadmap
  • Drive continuous improvement initiatives for cloud cost optimization, scalability, performance, and platform security posture
  • Maintain comprehensive, up-to-date documentation and best practices to foster self-service and cross-team enablement
  • Design, implement, and maintain CI/CD pipelines (using GitLab CI, Github, and/or Jenkins) tailored for microservice architectures built with Node.js
What we offer
What we offer
  • Competitive salary
  • stock options
  • Alan health insurance
  • Swile card
  • unlimited coffee, tea, snacks, and drinks in the office
Read More
Arrow Right
New

Software Engineering Manager - Typescript

You will join one of BT’s Platform Engineering teams and take ownership of a sma...
Location
Location
United Kingdom , Birmingham; Manchester; Bristol; London
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in TypeScript (preferred) or another object-oriented programming language, with 4+ years of professional software development
  • Solid hands-on experience with AWS, including cloud-native architecture principles
  • Experience designing and operating GitLab CI/CD pipelines
  • At least 2 years’ experience leading engineering teams, including hiring, mentoring, and performance management
  • A strong automation mindset and passion for building robust internal tooling
  • Practical experience with Kubernetes in production environments
  • Experience with Pulumi or other modern IaC tools
  • Familiarity with Dynatrace or similar observability platforms
  • Experience building or operating internal developer platforms or shared platform services
  • Strong understanding of DevOps and SRE principles
Job Responsibility
Job Responsibility
  • Lead, grow, and hire engineers specialising in TypeScript (preferred), Java or another OOP language, AWS, Kubernetes, and GitLab CI/CD
  • Provide technical leadership and architectural direction, ensuring high engineering standards
  • Remain hands-on where it adds value — contributing to code, design reviews, and technical decision-making
  • Build and evolve cloud-native platforms using Kubernetes and AWS-managed services
  • Collaborate with stakeholders to shape a platform roadmap aligned to developer and business needs
  • Champion automation and DevOps practices, removing manual processes wherever possible
  • Mentor engineers and senior ICs, supporting both technical growth and career progression
  • Foster an agile, metrics-driven culture focused on reliability, flow, and continuous improvement
What we offer
What we offer
  • Competitive salary
  • 25 days annual leave (plus bank holidays)
  • 10% on target bonus
  • Life Assurance
  • Pension scheme
  • Direct share scheme
  • Option to join the Healthcare Cash Plan or other benefits such as dental insurance, gym memberships etc.
  • 50% off EE mobile pay monthly or SIM only plans
  • Exclusive colleague discounts on our latest and greatest BT broadband packages
  • BT TV with TNT Sports and NOW Entertainment & 50% discount for friends and family on EE SIM Only plans & airtime element off a Flex Pay plan
  • Fulltime
Read More
Arrow Right
New

Staff Infrastructure Security Engineer

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’r...
Location
Location
United States , San Francisco; Bellevue; Sunnyvale; Denver
Salary
Salary:
Not provided
crusoe.ai Logo
Crusoe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years (or equivalent) hands-on experience in cloud security, DevOps, or infrastructure engineering
  • Deep expertise and proven track record deploying and managing HashiCorp Vault in an enterprise environment (experience with the Enterprise edition is highly preferred)
  • Expert-level knowledge of Secrets Management, X.509 PKI (Public Key Infrastructure), Certificate Authority Operations, and Cryptography concepts
  • Strong experience with Google Cloud Platform (GCP) and cloud native identity and access management (IAM)
  • Proficiency with Infrastructure as Code (IaC) tools, especially Terraform, for automating the deployment and configuration of Vault and its dependent infrastructure
  • Fluent in at least one programming language (ideally Go or Python)
  • Demonstrable experience with Kubernetes and container security principles, especially integrating secrets into microservices architectures
  • Strong understanding of network security concepts (IP addressing, IP routing, firewalls, segmentation, Zero Trust)
Job Responsibility
Job Responsibility
  • Strategic Architecture & Governance: Architect a highly available, disaster-resilient, and scalable multi-cluster secrets management platform that serves as the foundation for the organization’s Zero Trust strategy
  • Technical Leadership: Drive consensus across Cloud Engineering, DevOps, and SRE teams to define standardized secret management workflows and integrate security patterns into the SDLC
  • Compliance & Governance: Ensure the platform design meets rigorous internal policies and external compliance frameworks (e.g., SOX, ISO 27001)
  • Policy as Code: Design and implement advanced governance controls, including Sentinel Policy as Code, to automate security guardrails and access decisions
  • Platform Engineering & Implementation: Infrastructure as Code (IaC): Lead the engineering of the Vault infrastructure using Terraform, ensuring all deployments are reproducible, version-controlled, and automated
  • Identity Integration: Architect the integration between the secrets platform, Identity Providers (Okta), and workload identities (Kubernetes Service Accounts) to establish robust machine-to-machine authentication
  • Advanced Secrets Capabilities: Configure and tune essential secrets engines (KV, Transit, KMIP) and Enterprise features (Performance Replication, Seal automation) to support diverse engineering use cases
  • Operational Excellence & Developer Enablement: Vault as a Service (VaaS): Operationalize the platform by building self-service mechanisms, distinct "paved road" onboarding procedures, and documentation that allows engineering teams to easily consume security services
  • Observability: Implement comprehensive monitoring, alerting, and audit logging to ensure platform health, provide visibility into usage patterns, and satisfy audit requirements
  • Lifecycle Management: Own the full operational lifecycle of the production environment, including patching, version upgrades, backup/restore procedures, and incident response runbooks
What we offer
What we offer
  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

At Tote, we’re on a mission to deliver a seamless and reliable digital experienc...
Location
Location
United Kingdom , Wigan
Salary
Salary:
Not provided
jobs.360resourcing.co.uk Logo
360 Resourcing Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep understanding of system reliability, performance optimisation, and cloud-native architectures
  • Strong hands-on experience with modern observability tools such as Grafana, Prometheus, and OpenTelemetry
  • Solid grasp of distributed systems and networking fundamentals
  • Confident working with infrastructure-as-code tools (like Terraform) and container orchestration platforms such as Kubernetes
  • Experience in cloud environments, ideally AWS
  • Comfortable coding in at least one modern programming language such as Go or .NET
  • Calm, analytical mindset for high-pressure situations
  • Advocate for modern engineering practices, championing DevOps culture, CI/CD pipelines, and automation
  • Strong communication skills
Job Responsibility
Job Responsibility
  • Monitor live production systems, using observability tools to detect potential issues before they impact users
  • Take proactive steps to optimise system performance and stability
  • Analyse telemetry data, identify bottlenecks, and drive improvements across infrastructure and applications
  • Lead the development of SRE strategy, defining standards, best practices, and ways of working
  • Work closely with engineering, operations, and product teams to shape SLAs, SLOs, and error budgets
  • Design and implement performance testing strategies to simulate peak traffic
  • Build intuitive dashboards, refine alerting systems, and create tools that provide clear visibility into system health
  • Work alongside software engineers to design scalable solutions
  • Work with compliance teams to meet internal and regulatory standards
  • Work with operations to ensure smooth deployment and monitoring
What we offer
What we offer
  • Competitive Basic Salary
  • Discretionary Bonus Scheme
  • Company Shares Option Plan
  • Contributory pension scheme
  • Life insurance (4 x basic salary)
  • Simply Health Cash Plan
  • Holiday entitlement (33 days inclusive of bank holidays)
  • Study Support and opportunity for progression and development
  • Confidential 24/7 365 employee assistance helpline
  • Agile and collaborative office environment with free parking, fruit, biscuits, and drinks
  • Fulltime
Read More
Arrow Right

Staff Software Engineer I - Internal Access Management

We are seeking a Staff Software Engineer to lead the technical vision, architect...
Location
Location
Salary
Salary:
225100.00 - 264500.00 CAD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of engineering experience
  • 4+ years in security, IAM, or distributed systems
  • Deep expertise in Kubernetes, workload identity, cloud IAM (AWS, GCP, Azure), and zero-trust architectures
  • Strong understanding of authentication technologies: IAM, OAuth2, OIDC, policy engines, and modern zero-trust principles
  • Proven track record leading multi-team technical initiatives at a Staff or Senior Staff level
  • Strong knowledge of distributed systems, cloud infrastructure, container orchestration, and service mesh
  • Excellent communication and stakeholder-influence skills across engineering and security domains
Job Responsibility
Job Responsibility
  • Define and drive the long-term architecture and roadmap for Internal Access Management across Kubernetes and multi-cloud environments
  • Architect and implement least privilege, just-in-time access, and zero-trust models across Confluent services
  • Build and evolve scalable access-authorization workflows and lifecycle management systems using technologies such as SPIFFE/SPIRE, OPA, cloud IAM policies, workload identity, and internal enforcement engines
  • Strengthen security boundaries through threat modeling, defense-in-depth practices, and comprehensive access-auditing capabilities
  • Partner with cross-functional teams—including Platform, Kafka, Observability, Developer Productivity, Release Engineering, and SRE—to drive adoption of secure identity and access patterns
  • Mentor senior engineers, elevate engineering standards, and influence architectural decisions across the organization
  • Communicate complex technical decisions clearly and align stakeholders across engineering and security
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right
New

Site Reliability Engineer

AutoRABIT is looking for a Site Reliability/DevSecOps Engineer to help develop, ...
Location
Location
United States
Salary
Salary:
150000.00 - 175000.00 USD / Year
autorabit.com Logo
AutoRABIT
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience with deployment and maintenance of scalable, resilient, and secure infrastructure with AWS, GCP, and/or Azure based infrastructure cloud and services and automation
  • Knowledge of key DevSecOps tools for monitoring (ELK, AWS Azure CloudWatch etc.), Infrastructure management platforms (Kubernetes, Docker, Ansible, Jenkins, Terraform etc.)
  • Experience with Shell Scripting (Bash), Python or equivalent is required
  • Knowledge of programming languages such as Python, Go, or Java
  • Experience with configuration management tools such as Ansible or Chef
  • Solid understanding of CI/CD pipelines and tools such as Jenkins, GitLab CI, or CircleCI
  • Excellent troubleshooting skills in SaaS, or customer environments
  • Team player, receiving and giving feedback as well as sharing knowledge
  • Can-do attitude: challenging status, leading, and contributing to key improvements and innovations, while maintaining accountability
  • Excellent written and verbal US English communication skills for working across a global team environment
Job Responsibility
Job Responsibility
  • Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service
  • Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing
  • Assist with resolution of AutoRABIT service or customer issues as required
  • Participate in and practice sustainable incident response and blameless postmortems
  • Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments
  • Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve
  • Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays
  • Fulltime
Read More
Arrow Right