CrawlJobs Logo

Senior Cloud SRE

hazelcast.com Logo

Hazelcast

Location Icon

Location:
United Kingdom

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

We are looking for an SRE, experienced in distributed systems, Kubernetes & microservices to join our Applications team. The team focuses on providing tooling to enrich the core Hazelcast Platform, making it easier to use, scale and provide greater functionality. Ensuring solutions to meet the most demanding customer needs. Day to day, you’ll be leveraging your solid engineering fundamentals with a focus on performance, consistency, resilience and scale, bringing your passion for solving difficult problems to help realize the product vision. Your role as a SRE is crucial in ensuring that Hazelcast Platform meets business objectives, is robust and scalable, and is depended upon by customers for mission-critical implementations.

Job Responsibility:

  • Keep Hazelcast cloud-based production systems running smoothly 24/7/365
  • Design, develop, and maintain our cloud infrastructure to support both our end user management center and microservice based platform
  • Implement new solutions using AWS and terraform, improving scalability, throughput, and reliability
  • Support and manage our Keycloak IDP ensuring it provides appropriate security while meeting the needs of the development team
  • Implement security measures to protect data integrity and confidentiality, including encryption, access control, and compliance with relevant regulations
  • Work with our operations team to maintain our SOC2 & ISO27001 compliance, and keeping our environment secure
  • Monitor the system for performance issues, errors, and potential failures, and implement maintenance procedures such as backups, data recovery, and disaster recovery plans
  • Troubleshoot issues related to data storage, including performance bottlenecks, data corruption, or compatibility issues with other software components
  • Collaborate with cross-functional teams, including software developers, architects, and product managers, to ensure the effective integration and operation of the components within the overall software infrastructure
  • Document design decisions, implementation details, and operational procedures to facilitate collaboration among team members and ensure the maintainability of the system
  • Stay updated with the latest developments in storage technologies, Java programming language, and software engineering best practices, and apply this knowledge to improve existing storage systems and develop new solutions
  • On-call participation
  • Be part of our on-call rotation to respond to availability incidents and work with support and engineers on customer incidents

Requirements:

  • Experience of distributed systems, Kubernetes & microservices
  • Infrastructure as Code (Terraform)
  • Modern devops stack (K8s, Prometheus, Grafana, Opentelemetry, ArgoCD, helm)
  • Experience with at least one programming languages, preferably Golang or Python
  • Experience with CI and building CD pipelines (Jenkins, GitHub Actions)
  • A passion for automation and keeping our software delivery fast and efficient
  • Bachelor's degree in a relevant field of study (Computer Science, or related discipline) OR equivalent experience

Nice to have:

  • Mutli-cloud (AWS, GCP and/or Azure)
  • Experience working with software engineers in designing cloud-native applications or troubleshooting them
  • Experience as part of an on-call rota
What we offer:
  • 25 days annual leave + Bank holidays
  • Group Company Pension Plan
  • Private Medical Insurance
  • Private Dental Insurance
  • Life Insurance
  • EAP (Employee Assistance Program)

Additional Information:

Job Posted:
April 16, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:
PREMIUM
More languages and countries
Unlock 29494 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Cloud SRE

Senior Site Reliability Engineer Cloud Platform

Zilliz is a fast-growing startup developing the industry’s leading vector databa...
Location
Location
Salary
Salary:
175000.00 - 225000.00 USD / Year
zilliz.com Logo
Zilliz
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems
  • Proficiency in scripting languages such as Python, Go, or Java
  • Strong knowledge of container orchestration technologies like Kubernetes and Docker
  • Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools
  • Experience with infrastructure as code tools such as Terraform or Ansible
  • Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo
  • Proven ability to troubleshoot complex distributed systems and resolve issues promptly
  • Bachelor’s degree or above in computer science, software engineering, or other relevant disciplines
  • Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously
Job Responsibility
Job Responsibility
  • Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms
  • Ensure the reliability, availability, and performance of Zilliz’s distributed database systems
  • Develop and implement strategies for monitoring, incident management, and disaster recovery
  • Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention
  • Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness
  • Collaborate with software engineers to enhance system reliability, scalability, and performance
  • Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes
  • Actively contribute to the Milvus Vector Database open-source community, focusing on improving reliability and operational efficiency
  • Fulltime
Read More
Arrow Right

Senior Vice President, Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 8+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 5+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS.
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organization
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

Senior Cloud Architect

Strategic and hands-on role focused on evolving CHUB’s architecture to support m...
Location
Location
United States , Bothell; Bellevue
Salary
Salary:
102000.00 - 184000.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering and architecture roles
  • 3+ years passionate about AWS cloud solutions
  • Strong experience designing scalable distributed systems, microservices, and event-driven architectures
  • Hands-on experience with AWS services such as RDS, Lambda/Step Functions, S3, DynamoDB, ElastiCache, Elasticsearch, and Neptune
  • Proficiency with Infrastructure as Code tools (Terraform, CloudFormation), CI/CD pipelines, and Git-based workflows
  • Proven understanding of cloud security, scalability, high availability, and cost optimization principles
  • Background in Java/Spring Boot or similar frameworks
  • Sophisticated user in observability tools (e.g., Splunk, SignalFX, CloudWatch, Prometheus, Grafana)
  • Strong communication and leadership skills
  • able to influence technical direction across teams
Job Responsibility
Job Responsibility
  • Provide architectural leadership during the product intake phase—evaluating options, identifying risks, and helping define scalable solution designs
  • Own solution architecture across multiple parallel initiatives, ensuring design consistency and quality across CHUB components
  • Own the modernization of CHUB systems by driving cloud-native design and migration strategies using AWS services
  • Establish reusable cloud reference architectures aligned to AWS Well-Architected Framework and CHUB's technical vision
  • Partner with engineering, SRE, and platform teams to implement cloud infrastructure and services using Infrastructure as Code (IaC) and CI/CD automation
  • Coach and mentor engineers across CHUB in cloud architecture, security, resiliency, and observability practices
  • Participate in and lead design reviews, documentation efforts, and architectural governance processes
  • Promote technical excellence through mentorship, sharing knowledge and best practices, and documentation of architectural decisions
  • Support cloud cost optimization and recommend improvements to increase operational efficiency
  • Participate in incident reviews and root cause analysis for cloud-based systems
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer who is passionate about sc...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 5+ years of hands on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure)
  • Familiarity with Unix / Linux operating systems
  • Strong emphasis to debug, improve code, and automate routine tasks
  • Strong backend engineering experience in one or more prominent languages such as Java, Go or Python
  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
  • An ability and desire to mentor and coach engineers
Job Responsibility
Job Responsibility
  • Scaling Cloud services
  • Own the infrastructure, tooling and automation that Jira Cloud runs on
  • Analyse and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency
What we offer
What we offer
  • Health and wellbeing resources
  • Paid volunteer days
Read More
Arrow Right

Senior Site Reliability Engineer

You'll join the team primarily responsible for making our self-hosted product of...
Location
Location
United States
Salary
Salary:
200000.00 - 220000.00 USD / Year
tines.com Logo
Tines
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years in an SRE or similar role
  • Experience architecting, maintaining, and supporting systems with containerized applications, ideally k8s
  • Experience with troubleshooting deployment issues, creating clear documentation, and designing robust escalation paths
  • Comfortable learning new technologies
  • Experience with Ruby, Rails, React, TypeScript, Postgres, Redis and Docker
  • Customer obsessed and willing to go deep into unfamiliar stacks to find root causes
  • Authorized to work for any employer in the U.S.
Job Responsibility
Job Responsibility
  • Making our self-hosted product offering as easy as possible for customers to install and operate
  • Owning all of the supporting services and tools that our self-hosted customers rely on
  • Identifying and fixing availability risks and monitoring gaps
  • Enabling software engineers to build new product features that work seamlessly across cloud and self-hosted environments
  • Using our own product extensively to automate infrastructure maintenance and to build DevOps tooling for customer deployments
  • Identifying areas for improvement in our containerized architecture and deployment strategies
  • Mentoring other engineers in container orchestration and Kubernetes best practices
  • Act as a subject matter expert for critical self-hosted customer issues
What we offer
What we offer
  • Competitive salary
  • Startup equity & extended exercise window
  • Matching retirement plans
  • Home office setup
  • Private healthcare plans
  • 25 days annual leave
  • Extra company holidays
  • Generous parental leave programs
  • Flexibility in how and where you work
  • Phone and home Internet allowance
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Vice President - Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team which is responsible for Private a...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent work experience
  • 6+ years of relevant work experience
  • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
  • 4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
  • Experience working on observability, logging and metrics toolsets
  • Experience of k8s and container technologies such as Docker, Openshift and EKS
  • Experience with public cloud technologies such as AWS, GCP or Azure
  • Experience with Secrets products such as HashiCorp Vault or CyberArk
Job Responsibility
Job Responsibility
  • Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
  • Architecting and building tools and platforms that provide capabilities for SRE
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
  • Actively owning production level incidents till resolution.
  • Fulltime
Read More
Arrow Right

Senior DevOps Infra Engineer

BioCatch is the leader in Behavioral Biometrics, a technology that leverages mac...
Location
Location
Israel , TLV
Salary
Salary:
Not provided
biocatch.com Logo
BioCatch
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience as SRE /DevOps engineer
  • 5+ years of experience with cloud environments such as Azure, GCP or AWS
  • Proficiency in programming languages such as Python
  • Proven experience working with Kubernetes in production environments
  • Linux distributions familiarity: Ubuntu / CentOS
  • Solid understanding of computer networking fundamentals and cloud security principles
  • CDN and DNS systems – AWS CloudFront / Azure Front door
  • Deep Understanding of web servers (Nginx preferred)
  • Experience with logging monitoring services (such as Prometheus, Grafana, Datadog and Coralogix)
  • Experience working with Infrastructure as Code (IaC) tools (Terraform preferred)
Job Responsibility
Job Responsibility
  • Build and maintain a scalable cloud infrastructure in Production (Azure, GCP)
  • Troubleshoot and fix Production infrastructure/platform issues
  • Proactively and continuously improve Production stability and robustness to assure our infrastructure components are always up and available
  • Develop end to end projects in order to make our infrastructure better in large-scale including design, implementation, and ongoing maintenance
  • Monitor and optimize cloud costs associated with data infrastructure and processes
  • Fulltime
Read More
Arrow Right