CrawlJobs Logo

Junior Site Reliability Engineer

United Kingdom · Job Posted December 05, 2025
Apply Position
Job Link Share

Job Description

As a Jr. Site Reliability Engineer, you will 'make things scale' which includes supporting delivery and operation of the managed accesso Horizon product in customers’ cloud environments (AWS/Azure/GCP). You will work under mentor guidance to deploy, operate and support customer environments, automate tasks, and learn site reliability and cloud best practices.

Job Responsibility

  • Assisting with provisioning and deploying accesso Horizon components to customer cloud accounts using Infrastructure as Code (Terraform)
  • Help maintain CI/CD pipelines (GitHub Actions) for application and infrastructure deployments
  • Support monitoring, logging and alerting (Prometheus, Grafana & Coralogix) and respond to basic alerts with supervision
  • Implement and improve basic automation and scripting
  • Participate in incident triage, root cause investigation and follow-up tasks
  • Follow security and compliance requirements for customer cloud environments (identity, secrets, network controls)
  • Produce and maintain operational runbooks, deployment guides and change notes
  • Participate in on-call rotation as a L1 responder
  • Normal workday may require time outside the normal working day
  • Learn and apply accesso Horizon product architecture and configuration

Requirements

  • Some practical exposure to cloud platforms (AWS/Azure/GCP)—coursework, internships, or self-led projects
  • Ability to self-learn with assistance from Senior Engineers
  • Basic scripting ability using Python or Bash
  • Familiarity with basic Linux systems and general command–line
  • Understanding of Git and basic CI/CD concepts
  • Good written and verbal communication
  • customer-focused approach
  • Ability to work with minimal direction
  • Willingness to learn, take direction and work within a team

Nice to have

  • Experience with Terraform, Docker, Kubernetes (EKS/AKS/GKE) or monitoring tools
  • Familiarity with security fundamentals (IAM, network ACLs, secrets management)
  • Experience supporting a SaaS or managed service

What we offer

  • Competitive compensation package including an annual bonus opportunity
  • 8-days of paid bank holiday leave and 26-days of paid annual leave (paid leave increases with tenure)
  • 8 hours of paid Volunteer Time Off
  • Inclusive Family Benefits, including a $7,500 benefit for surrogacy, adoption, and fertility
  • Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance
  • Matching pension scheme (up to 8%)
  • Unlimited access to Udemy for Business
  • Flexible work schedule

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Junior Site Reliability Engineer

8 matching positions

Junior Site Reliability Engineer

We are looking for an early-career Site Reliability Engineer to join our global ...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
aiven.io Logo
Aiven Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ability to Code: basic programming skills, with a preference for Python
  • Linux Fundamentals: comfortable working in a terminal and have a grasp of Linux systems administration and networking
  • Analytical Problem Solving: enjoy the detective work of debugging
  • AI Curiosity: interested in how AI is changing the infrastructure landscape
  • Operational Mindset: ready to contribute to a rotation
Job Responsibility
Job Responsibility
  • Handle essential operational duties, including stakeholder-driven tasks like managing account lifecycles and service adjustments
  • Improve our observability framework and automate manual toil to create a self-healing, highly visible production environment
  • Participate in our on-call rotation to maintain platform health
What we offer
What we offer
  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Lead Site Reliability Engineer (SRE) with strong experience...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
karix.com Logo
Karix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE / DevOps / Production Engineering roles
  • Strong expertise in troubleshooting distributed systems and microservices architecture
  • Hands-on experience with Kafka, RabbitMQ, and Redis
  • Strong knowledge of Kubernetes and container orchestration
  • Experience with CI/CD pipelines and deployment automation
  • Solid understanding of Linux, networking, and cloud platforms (AWS / Azure / GCP)
  • Experience with Infrastructure as Code (Terraform, Ansible)
  • Strong scripting skills (Python, Bash, or similar)
  • Database experience: MySQL / Oracle / MongoDB
  • Strong problem-solving, ownership mindset, and ability to lead initiatives
Job Responsibility
Job Responsibility
  • Lead troubleshooting and resolution of complex production issues in distributed systems
  • Drive reliability engineering practices, ensuring high availability and performance of systems
  • Manage and optimize messaging systems like Apache Kafka, RabbitMQ, and Redis
  • Architect, manage, and optimize Kubernetes clusters for scalability and resilience
  • Manage CI/CD pipelines and drive deployment automation
  • Implement and maintain monitoring, alerting, and observability using Prometheus, Grafana, and ELK stack
  • Lead incident management, root cause analysis (RCA), and post-mortem reviews
  • Mentor junior engineers and collaborate with cross-functional teams to improve system design and reliability
What we offer
What we offer
  • Impactful Work: Play a key role in ensuring reliability and scalability of platforms that handle large-scale, real-time communication systems
  • Tremendous Growth Opportunities: Accelerate your career by leading critical reliability initiatives and working on high-scale distributed systems
  • Innovative Environment: Work in a fast-paced ecosystem that embraces automation, cloud-native technologies, and continuous improvement
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Engineering to make a system more resilient and efficient frees up time and mone...
Location
Location
United States , Annapolis Junction
Salary
Salary:
86900.00 - 198000.00 USD / Year
boozallen.com Logo
Booz Allen Hamilton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience creating and maintaining highly reliable and scalable systems to reduce issues and downtime, including design and implementation of physical servers, storage systems, and network infrastructures
  • 5+ years of experience providing technical support for system upgrades, rollouts, and enhancements
  • 3+ years of experience developing and deploying infrastructure solutions
  • 3+ years of experience employing and sustaining VMware for v6.x and later, including the design and implementation of virtual data centers
  • 3+ years of experience designing and deploying highly available storage solutions for technologies, including SAN storage and high-capacity storage solutions
  • Experience with data center design and buildout
  • Experience transforming large-scale software, data center, or on-premises infrastructure programs to a virtualized architecture
  • Ability to interact with clients and lead, train, and mentor junior system administrators
  • Top Secret clearance
  • Bachelor's degree
Job Responsibility
Job Responsibility
  • Lead the development of more robust systems for Booz Allen by building a resilient infrastructure
  • Build in redundancy, implement monitoring tools, and automate wherever possible
  • Reduce toil by scripting routine tasks and automating self-repair
  • Support your team of engineers and act as a subject matter expert for our clients
What we offer
What we offer
  • Health benefits
  • Life benefits
  • Disability benefits
  • Financial benefits
  • Retirement benefits
  • Paid leave
  • Professional development
  • Tuition assistance
  • Work-life programs
  • Dependent care
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...
Location
Location
United States
Salary
Salary:
148320.00 - 185400.00 USD / Year
absencesoft.com Logo
AbsenceSoft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or a related engineering role
  • Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
  • Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
  • Experience building and operating CI/CD pipelines using Jenkins and GitHub
  • Proficiency in Python, Go, or Bash for automation
  • Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
  • Demonstrated experience leading incident response in complex, distributed systems
  • Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
  • Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
  • A collaborative, ownership-driven mindset with strong communication skills
Job Responsibility
Job Responsibility
  • Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
  • Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
  • Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
  • Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
  • Define and maintain SLOs, SLIs, and error budgets
  • Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
  • Lead blameless postmortems
  • Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
  • Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
  • Mentor junior SREs through code reviews, incident pairing, and documentation
What we offer
What we offer
  • Impact that matters
  • Flexibility and trust
  • Remote-first and results driven
  • Growth and development
  • Access to learning resources, leadership programs, and real opportunities to take on new challenges
  • Competitive rewards
  • Comprehensive benefits
  • Performance-based bonus program
  • Equity opportunities
  • Time for life
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking a highly skilled and passionate Senior Site Reliability Engineer ...
Location
Location
Spain; Portugal; United Kingdom
Salary
Salary:
Not provided
parserdigital.com Logo
Parser Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep SRE Expertise: Proven experience as a Senior Site Reliability Engineer or a similar role, with a strong understanding of SRE principles (error budgets, SLOs/SLIs, toil reduction)
  • Azure Cloud Proficiency: Extensive hands-on experience designing, deploying, and operating highly available and scalable applications on Microsoft Azure
  • Azure Kubernetes Service (AKS) Expertise: Mandatory extensive hands-on experience with AKS for container orchestration, including deployment, scaling, monitoring, and troubleshooting
  • Java Ecosystem Mastery: Expert-level proficiency with Java, including experience with modern frameworks (ideally Micronaut, Spring Boot, or similar) and JVM performance tuning
  • Distributed Systems Knowledge: Solid understanding and practical experience with distributed systems, microservices architecture, and associated challenges (e.g., consistency, fault tolerance)
  • Messaging & Database Expertise: Hands-on experience with an event streaming platform (ideally Kafka) and NoSQL data storage (ideally Couchbase), including operational best practices
  • Automation First Mindset: Strong scripting skills (e.g., Python, Bash) and experience with Infrastructure as Code tools (e.g., Terraform, ARM templates) and CI/CD pipelines (e.g., Azure DevOps, Jenkins)
  • Observability Tools: Experience with monitoring, logging, and alerting tools (e.g., Azure Monitor, Prometheus, Grafana, ELK Stack, Splunk)
  • Problem-Solving Acumen: Exceptional analytical and troubleshooting skills, with a methodical approach to diagnosing and resolving complex production issues
  • Communication & Collaboration: Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Architect and Implement Reliability: Design, build, and maintain highly scalable, resilient, and performant systems on Azure, focusing on our Java, Kafka, and Couchbase stack
  • Drive Modernisation: Work hands-on as part of the team spearheading the adoption of Micronaut, standardising application templates, and transitioning to managed cloud services
  • Enhance Operational Excellence: Develop and implement strategies for improving system observability (standardised logging, metrics, tracing), alerting, and on-call practices
  • Automate Everything: Champion automation across the software development lifecycle (SDLC), from CI/CD pipelines to infrastructure provisioning, focusing on accelerating delivery and de-risking deployments
  • Incident Management & Learning: Contribute to our mature, blameless post-incident review process, identifying root causes and implementing preventative measures to reduce incident hours
  • Tooling & Standards: Develop, maintain, and drive the adoption of shared, standardised SRE tooling and best practices across engineering teams, including containerisation (e.g., Docker, Kubernetes on Azure), infrastructure as code (e.g., Terraform), and configuration management
  • Mentorship & Collaboration: Provide technical leadership and mentorship to junior engineers, fostering a culture of SRE principles and operational excellence across the wider engineering organisation
  • Strategic Input: Contribute to the overall technical strategy and roadmap for our SRE and platform initiatives, ensuring alignment with business objectives
What we offer
What we offer
  • The chance to join an organization with triple-digit growth that is changing the paradigm on how software products are built
  • The opportunity to form part of an amazing, multicultural community of tech experts
  • A highly competitive compensation package
  • Medical insurance
  • English lessons
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer (SRE) with deep experience...
Location
Location
United States , Chicago
Salary
Salary:
131000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or Cloud Engineering
  • Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
  • Hands-on experience with Terraform, Ansible, or other IaC tools
  • Strong scripting/coding skills (Python, Go, Shell, etc.)
  • Experience with Kubernetes, containerization, and orchestration
  • Deep knowledge of Linux systems and networking
  • Experience with Service Meshes (e.g., Istio, App Mesh)
  • Familiarity with AWS Well-Architected Framework
  • Experience building self-healing systems and automated remediation
  • Background in security, compliance, or multi-account/multi-region AWS architectures
Job Responsibility
Job Responsibility
  • Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
  • Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
  • Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
  • Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
  • Optimize systems for cost, performance, and reliability
  • Drive chaos engineering and resilience testing
  • Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
  • Mentor junior SREs and promote DevOps/SRE culture across the org
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a member of Kalshi’s engineering team, you’ll help build the next-generation ...
Location
Location
United States , New York
Salary
Salary:
100000.00 - 250000.00 USD / Year
kalshi.com Logo
Kalshi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of software engineering experience
  • Experience designing, building, scaling, and maintaining production services and service-oriented architectures
  • Strong system design, coding, debugging, performance-tuning, and observability skills
  • High-quality coding practices with strong testing discipline
  • Excellent written and verbal communication
  • comfort working transparently across teams
  • Strong interpersonal skills across junior-to-principal engineering levels
  • Ability to think clearly under pressure and dive into any layer of the stack
  • Passion for building an open financial system that connects the world
  • Willingness to participate in on-call rotations and swiftly resolve issues
Job Responsibility
Job Responsibility
  • Improve observability, reliability, and service availability by defining and measuring key metrics
  • Build automation and systems that eliminate toil and reduce operational burden
  • Collaborate with core infrastructure engineers to performance-tune and optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.)
  • Partner with product teams to minimize service disruptions and automate incident response
  • Identify and analyze reliability problems across the stack, designing and implementing software for significant, long-term improvements
  • Mentor engineers and drive a culture where reliability is a core engineering value
  • Write high-quality, well-tested code that supports internal and external customer needs
  • Debug complex technical issues and improve system usability, operability, and diagnosability
  • Review feature designs across the company and ensure security, safety, scalability, and architectural clarity
  • Build and maintain integrations with third-party vendors
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of ...
Location
Location
Portugal
Salary
Salary:
Not provided
outsystems.com Logo
OutSystems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields)
  • 8+ years of experience in Software Engineering or SRE, ideally within high-growth, cloud-native environments
  • Expertise in Observability: Proven ability to implement SLIs/SLOs and telemetry systems that provide actionable insights into complex distributed systems
  • Cloud Mastery: Deep architectural knowledge of AWS/GCP/Azure, specifically regarding networking, security, and cost-optimization
  • Strategic Impact: Demonstrated success in leading cross-functional initiatives that improved system reliability or developer velocity at an organizational scale
  • System Design & Architecture: Expertise in designing highly available, fault-tolerant distributed systems (Microservices, Event-driven architecture)
  • Development: Professional-level proficiency in Go, Python, or Rust, with the ability to contribute to core product codebases and build custom internal tooling
  • Cloud Ecosystems: Deep-tier mastery of AWS, GCP, or Azure (specifically IAM, VPC networking, Transit Gateways, and Cross-region redundancy)
  • Orchestration at Scale: Extensive experience managing Kubernetes (K8s) in production, including Custom Resource Definitions (CRDs), Service Mesh (Istio/Linkerd), and Admission Controllers
  • Infrastructure as Code (IaC): Advanced usage of Terraform, CloudFormation, or Spacelift, focusing on modularity, state management, and CI/CD integration for infrastructure.
Job Responsibility
Job Responsibility
  • Help define and execute the strategic vision and roadmap for the Site Reliability Engineering function
  • Provide leadership and mentorship to more junior SREs, fostering a culture of innovation, collaboration, and operational excellence
  • Collaborate with leadership and other stakeholders to ensure cross-functional alignment
  • Take active participation, collaborate effectively with development teams, and influence the design of a highly reliable and scalable infrastructure, leveraging cloud technologies and industry best practices
  • Collaborate with development teams at all stages of the product development lifecycle to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant
  • Drive the adoption, definition, and improvement of Service Level Objectives (SLOs)
  • Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents
  • Oversee incident response efforts, ensuring quick resolution and minimal downtime, and effective RCA/post-mortems
  • Automate every operational task, with a special focus on fast incident detection & recovery
  • Foster a culture of continuous improvement and knowledge sharing
What we offer
What we offer
  • A company that is always growing, changing, and innovating
  • Real career opportunities
  • Work colleagues that are as smart, hard-working, and driven as you
  • Disrupting the status quo is in our DNA
  • We ask “why” a lot
  • OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best.
  • Fulltime
Read More
Arrow Right