CrawlJobs Logo

Site Reliability Engineer - Core

United Kingdom, London · Job Posted December 06, 2025
Apply Position
Job Link Share

Job Description

We are looking for a Site Reliability Engineer to join our Core team to encourage infrastructure best practices across our organization that would allow to securely scale a distributed financial platform that touches millions of people a day. Our distributed financial platform tackles some of the most interesting problems in the crypto for millions of our customers and continues to grow rapidly. The SRE team at blockchain combines software and systems engineering to provide a platform that abstracts complexity for increased security, reliability and rapid product delivery. As a member of the Core team you will be tasked with developing an in-depth understanding of the infrastructure needs of our products. You will establish and maintain creative engineering solutions to improve our customers’ experience by building necessary tooling. Crucially, you will also guide and educate developer teams so that they can deliver new features in a rapid, secure and scalable manner.

Job Responsibility

  • Play a critical role in evolving our infrastructure as we develop solutions to complex technical problems involving reliability, latency, bandwidth and most importantly security
  • Be an integral part of improving observability, monitoring and alerting throughout the platform
  • Help co-ordinate work across different areas of the company to ensure the most efficient path of execution
  • Centralize wherever possible common streams of work that are currently duplicated across developer teams
  • Focus heavily on writing tooling to replace manual, repetitive work in a scalable way
  • Work in a fast paced, and dynamic environment complementing our existing high calibre team

Requirements

  • Experience with containerization and service orchestration, including best practices and security
  • Strong knowledge of at least one programming language
  • Linux, including an understanding of resource allocation, network and/or internals
  • Experience working with cloud solutions (GCP or AWS)
  • Deep understanding and demonstrable experience with modern monitoring tools such as Prometheus, Datadog, Grafana, Telegraf
  • Experience with infrastructure as code tools
  • Solid background with configuration management tools
  • Experience with using GitOps and CI to make changes, preferably Github Actions
  • Experience with messaging systems such as Kafka
  • Experience with database management

Nice to have

  • Experience with Hashicorp Nomad, Consul and Vault is a plus
  • Experience with Golang, Python, and Bash is a plus
  • Experience with complex Terraform deployments is a plus
  • Experience with Saltstack is a plus
  • Experience working in Data Centers is a plus
  • Knowledge of routing and switching protocols is a plus

What we offer

  • Full-time salary based on experience and meaningful equity in an industry-leading company
  • Hybrid model working from home & awesome office location in the heart of London
  • Unlimited vacation policy
  • work hard and take time when you need it
  • Work from Anywhere Policy: You can work remotely from anywhere in the world for up to 20 days per year
  • Apple equipment
  • The opportunity to be a key player and build your career at a rapidly expanding, global technology company in an emerging field
  • Flexible work culture

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer - Core

8 matching positions

Cloud Engineer / Site Reliability Engineer (SRE)

Location
Location
United States , Orlando
Salary
Salary:
75.00 USD / Hour
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on AWS experience with solid understanding of core AWS services
  • Experience supporting and troubleshooting AWS and Azure cloud environments
  • Terraform experience for Infrastructure as Code
  • Docker/containerization experience
  • Strong troubleshooting and problem-solving skills
  • Ability to translate requirements into technical execution
  • Experience performing cloud architecture and diagramming
  • Experience supporting deployments, environments, and site standups
  • Strong communication and collaboration skills
Job Responsibility
Job Responsibility
  • Support cloud infrastructure and deployments across AWS and Azure
  • Troubleshoot infrastructure and application-related cloud issues
  • Build and maintain Terraform-based infrastructure
  • Support Docker/containerized environments
  • Create architecture diagrams and technical documentation
  • Work closely with engineering and project teams to execute cloud initiatives
  • Assist with automation and operational improvement efforts
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

NetApp is looking for a Senior TechOps Engineer - Cassandra to join our growing ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in Apache Cassandra administration and architecture, with a desire to continuously learn and develop to an expert level
  • Experience in diagnosing and recommending mitigation strategies for Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modeling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries
  • Hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, schema disagreement resolution, and configuration management
  • Experience handling Cassandra maintenance activities, including upgrades and migrations
  • Ability to investigate and research Cassandra issues by reviewing the Apache Cassandra codebase
  • Strong knowledge and experience with Linux, with the ability to work comfortably from the command line
  • Exceptional ability to communicate clearly and professionally in written and verbal English
  • Experience working with at least one public cloud platform, preferably AWS
  • Prior IT customer service or support experience within an ITIL-based environment
  • Strong fundamental computer science and software engineering skills, particularly in operating system internals, memory management, and networking
Job Responsibility
Job Responsibility
  • Your work will ensure the security, reliability, and performance of world-class systems and databases
  • You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups
What we offer
What we offer
  • Volunteer time off
  • Well-being
  • Time away
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Site Reliability Engineer to support the stability, perform...
Location
Location
United States , New York
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related discipline, or equivalent practical experience in infrastructure or operations
  • Working knowledge of Linux and/or Windows server administration fundamentals
  • Understanding of core networking principles such as TCP/IP, DNS, VLANs, routing, and firewall concepts
  • Experience with at least one scripting or automation language such as Python, Bash, or PowerShell
  • Familiarity with cloud infrastructure concepts in at least one major platform, such as Azure or AWS
  • Exposure to automation and configuration tools such as Terraform or Ansible
  • Strong analytical thinking, troubleshooting ability, and a willingness to learn in a fast-moving technical environment
  • Clear written and verbal communication skills with the ability to document operational procedures effectively
Job Responsibility
Job Responsibility
  • Oversee the health of production platforms through monitoring tools, assist with incident response, and help refine alerts, dashboards, and issue tracking processes
  • Support day-to-day operations for infrastructure spanning on-premises facilities and cloud environments, including servers, storage, network components, and middleware services
  • Contribute to the administration of multi-cloud resources across platforms such as Azure and Amazon EC2, with involvement in compute, networking, storage, and identity-related tasks
  • Build and enhance automation solutions using Infrastructure as Code practices to streamline repeatable work and improve platform consistency
  • Participate in DevSecOps and GitOps processes by assisting with CI/CD workflows, configuration management, and policy adherence
  • Help strengthen cloud security by identifying configuration gaps, assisting with remediation efforts, and supporting vulnerability reduction initiatives
  • Join the on-call rotation, respond to operational events, and contribute to post-incident reviews focused on continuous improvement
  • Create and maintain runbooks, technical procedures, and system documentation to improve operational readiness and knowledge sharing
  • Assist with containerized and orchestrated environments, including platforms that use Kubernetes, to support scalable application operations
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

Doctolib is looking for a Senior Site Reliability Engineer to keep Doctolib prod...
Location
Location
France , Paris
Salary
Salary:
Not provided
doctolib.fr Logo
Doctolib
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have a strong hands-on experience (6y+) on a production platform, if possible at scale
  • Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
  • Have proven experience with datastores such as PostgreSQL and/or Kafka and/or Couchbase
  • Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
  • Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and understanding of infrastructure as code principles
  • Are fluent in English
Job Responsibility
Job Responsibility
  • Design, build and maintain core infrastructure databases that allow Doctolib scaling to support hundreds of thousands of concurrent users
  • Automate deployment, scaling, and maintenance of databases to enhance system reliability and operational efficiency
  • Implement and improve monitoring, alerting, and incident response processes to identify and address potential issues before they impact both practitioners and patients
  • Provide documentation and tooling to empower the feature teams in their use of their databases, while ensuring their reliability
  • Mitigate production database issues during working hours when the issue cannot be fixed by the responsible feature team
  • Research and evaluate new technologies, tools, and best practices to continuously improve the reliability and availability of our systems and processes
What we offer
What we offer
  • Free Health Insurance for you
  • Up to 14 days of RTT
  • A flexible workplace policy offering both hybrid and office-based modes
  • Flexibility days allowing to work in EU countries and the UK 10 days per year
  • Wellbeing program with free mental health and coaching through moka.care
  • Special support package for caregivers and workers with disabilities
  • Lunch voucher with Swile card
  • Work Council subsidy for sport club membership or creative activities
  • Bicycle subsidy
  • Public transportation reimbursement
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Mercor is at the intersection of labor markets and AI research. We partner with ...
Location
Location
United States , San Francisco
Salary
Salary:
130000.00 USD / Year
mercor.com Logo
Mercor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience doing true SRE work (not just operations) across multiple roles or companies
  • Deep familiarity with SRE practices as popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems)
  • 5+ years of SRE experience
  • 15+ years of overall experience is ideal for this first SRE hire
  • Proven success operating systems at scale, with a strong understanding of the challenges of large, distributed production environments
  • Strong collaboration skills
  • able to work efficiently with cross-functional engineering teams
  • Ability to drive cultural change around reliability while remaining hands-on in building and fixing systems
  • Comfort working in high-intensity, high-availability environments where uptime and production quality are critical
Job Responsibility
Job Responsibility
  • Own reliability and production safety for core shared services and customer-facing systems
  • Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and production safety roadmap
  • Repair and improve how our production systems are structured so they are stable, resource-efficient, isolated, and well-observed
  • Introduce and champion modern SRE practices (e.g., incident response, postmortems, SLIs/SLOs) across engineering teams
  • Collaborate with leverage engineering and applied AI teams to ensure sustainable growth
  • Represent SRE best practices internally and help teams onboard onto production in a way that is safe, scalable, and consistent with SRE principles
What we offer
What we offer
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are seeking a talented Site Reliability Engineer (SRE) to join our team and s...
Location
Location
United States , Alpharetta
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in a Site Reliability Engineering, DevOps, or cloud infrastructure role
  • Strong experience with Azure cloud services and infrastructure
  • Hands‑on experience with Java, Terraform, and Terragrunt for infrastructure‑as‑code
  • Proficiency with Kubernetes and container orchestration (AKS experience preferred)
  • Experience building and maintaining CI/CD workflows using GitHub Actions/Workflows and ArgoCD
  • Solid understanding of observability tools such as Grafana (Prometheus, Loki, Tempo experience is beneficial)
  • Bachelor’s degree required
  • Master’s degree preferred
Job Responsibility
Job Responsibility
  • Design, implement, and manage cloud infrastructure on Azure using Terraform and Terragrunt
  • Maintain, optimize, and scale Kubernetes environments, particularly Azure Kubernetes Service (AKS)
  • Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD to support GitOps deployment practices
  • Improve system reliability by implementing monitoring, alerting, and observability solutions using Grafana
  • Automate operational tasks to reduce manual effort and enhance team efficiency
  • Participate in on‑call rotations, assist with incident response, and contribute to post‑incident reviews
  • Partner with development teams to optimize application performance, scalability, and resilience
  • Implement and promote core SRE principles including SLIs, SLOs, and error budgets
  • Continuously enhance system performance, security, and cost efficiency
What we offer
What we offer
  • Benefits are available to contract/temporary professionals, including medical, vision, dental, and life and disability insurance
  • Hired contract/temporary professionals are also eligible to enroll in our company 401(k) plan
Read More
Arrow Right

Site Reliability Engineer II - FedRAMP

Trimble is seeking a Site Reliability Engineer to join their world class and glo...
Location
Location
India , Chennai
Salary
Salary:
Not provided
trimble.com Logo
Trimble Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree or equivalent in Computer Science, Engineering or related field or equivalent experience
  • Recent college graduate or one year of experience in IT operations, including knowledge of networking, computing and storage
  • Experience with AWS and/or Azure public cloud
  • Windows system administration familiarity and scripting skills, such as Python, Powershell
  • Linux system administration familiarity and scripting skills, including Bash and Perl
  • Familiarity with application operations, including Incident Management, Change Management, and Capacity Management
  • Excellent written and verbal communication
  • Troubleshooting and problem solving skills
  • Strong desire to learn new things
Job Responsibility
Job Responsibility
  • Responsible for configuration, optimization, documentation and support of the infrastructure components of software products which are hosted primarily in cloud services (AWS and Azure)
  • Perform day-to-day server application management, monitoring, incident response/resolution and working with the customer application development and technical support teams to establish effective application monitoring and to identify application changes to improve operations
  • Develop new and enhance current shared public cloud services with consideration for Availability, Operations, Performance, Capacity, Security, and User Experience
  • Responsible for management of security posture and adherence to corporate security best practices
  • Develop and maintain documentation including but not limited to architecture diagrams, service descriptions, build and deploy documentation and operations run book documentation
  • Provide design and deployment assistance for divisions needing help on a project basis
  • Manage AWS & FedRAMP best practice expectations (incorporating Trimble Cloud Core Platform standards)
  • Work with a global team and are able to occasionally meet or perform tasks off-hours
Read More
Arrow Right

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...
Location
Location
United States
Salary
Salary:
148320.00 - 185400.00 USD / Year
absencesoft.com Logo
AbsenceSoft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or a related engineering role
  • Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
  • Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
  • Experience building and operating CI/CD pipelines using Jenkins and GitHub
  • Proficiency in Python, Go, or Bash for automation
  • Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
  • Demonstrated experience leading incident response in complex, distributed systems
  • Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
  • Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
  • A collaborative, ownership-driven mindset with strong communication skills
Job Responsibility
Job Responsibility
  • Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
  • Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
  • Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
  • Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
  • Define and maintain SLOs, SLIs, and error budgets
  • Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
  • Lead blameless postmortems
  • Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
  • Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
  • Mentor junior SREs through code reviews, incident pairing, and documentation
What we offer
What we offer
  • Impact that matters
  • Flexibility and trust
  • Remote-first and results driven
  • Growth and development
  • Access to learning resources, leadership programs, and real opportunities to take on new challenges
  • Competitive rewards
  • Comprehensive benefits
  • Performance-based bonus program
  • Equity opportunities
  • Time for life
  • Fulltime
Read More
Arrow Right