CrawlJobs Logo

Junior Site Reliability Engineer

Finland, Helsinki · Job Posted April 24, 2026
Apply Position
Job Link Share

Job Description

We are looking for an early-career Site Reliability Engineer to join our global team. In this role, you will be the engine that keeps our cloud operations platform running smoothly, turning complex open-source technologies into reliable services for our customers. You’ll be part of a team that champions platform reliability. This is a hands-on operational role where you’ll dive into the day-to-day mechanics of a massive cloud infrastructure, from handling stakeholder requests to building the tools that monitor our systems. We value automation over manual repetition, and we’ll give you the space to grow your skills in both software development and systems administration.

Job Responsibility

  • Handle essential operational duties, including stakeholder-driven tasks like managing account lifecycles and service adjustments
  • Improve our observability framework and automate manual toil to create a self-healing, highly visible production environment
  • Participate in our on-call rotation to maintain platform health

Requirements

  • Ability to Code: basic programming skills, with a preference for Python
  • Linux Fundamentals: comfortable working in a terminal and have a grasp of Linux systems administration and networking
  • Analytical Problem Solving: enjoy the detective work of debugging
  • AI Curiosity: interested in how AI is changing the infrastructure landscape
  • Operational Mindset: ready to contribute to a rotation

Nice to have

Hands-on Database/Streaming Experience: worked on open-source tools like PostgreSQL, Kafka, Clickhouse or OpenSearch

What we offer

  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Junior Site Reliability Engineer

8 matching positions

Junior Site Reliability Engineer

As a Jr. Site Reliability Engineer, you will 'make things scale' which includes ...
Location
Location
United Kingdom
Salary
Salary:
Not provided
accesso.com Logo
accesso
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Some practical exposure to cloud platforms (AWS/Azure/GCP)—coursework, internships, or self-led projects
  • Ability to self-learn with assistance from Senior Engineers
  • Basic scripting ability using Python or Bash
  • Familiarity with basic Linux systems and general command–line
  • Understanding of Git and basic CI/CD concepts
  • Good written and verbal communication
  • customer-focused approach
  • Ability to work with minimal direction
  • Willingness to learn, take direction and work within a team
Job Responsibility
Job Responsibility
  • Assisting with provisioning and deploying accesso Horizon components to customer cloud accounts using Infrastructure as Code (Terraform)
  • Help maintain CI/CD pipelines (GitHub Actions) for application and infrastructure deployments
  • Support monitoring, logging and alerting (Prometheus, Grafana & Coralogix) and respond to basic alerts with supervision
  • Implement and improve basic automation and scripting
  • Participate in incident triage, root cause investigation and follow-up tasks
  • Follow security and compliance requirements for customer cloud environments (identity, secrets, network controls)
  • Produce and maintain operational runbooks, deployment guides and change notes
  • Participate in on-call rotation as a L1 responder
  • Normal workday may require time outside the normal working day
  • Learn and apply accesso Horizon product architecture and configuration
What we offer
What we offer
  • Competitive compensation package including an annual bonus opportunity
  • 8-days of paid bank holiday leave and 26-days of paid annual leave (paid leave increases with tenure)
  • 8 hours of paid Volunteer Time Off
  • Inclusive Family Benefits, including a $7,500 benefit for surrogacy, adoption, and fertility
  • Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance
  • Matching pension scheme (up to 8%)
  • Unlimited access to Udemy for Business
  • Flexible work schedule
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Engineering to make a system more resilient and efficient frees up time and mone...
Location
Location
United States , Annapolis Junction
Salary
Salary:
86900.00 - 198000.00 USD / Year
boozallen.com Logo
Booz Allen Hamilton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience creating and maintaining highly reliable and scalable systems to reduce issues and downtime, including design and implementation of physical servers, storage systems, and network infrastructures
  • 5+ years of experience providing technical support for system upgrades, rollouts, and enhancements
  • 3+ years of experience developing and deploying infrastructure solutions
  • 3+ years of experience employing and sustaining VMware for v6.x and later, including the design and implementation of virtual data centers
  • 3+ years of experience designing and deploying highly available storage solutions for technologies, including SAN storage and high-capacity storage solutions
  • Experience with data center design and buildout
  • Experience transforming large-scale software, data center, or on-premises infrastructure programs to a virtualized architecture
  • Ability to interact with clients and lead, train, and mentor junior system administrators
  • Top Secret clearance
  • Bachelor's degree
Job Responsibility
Job Responsibility
  • Lead the development of more robust systems for Booz Allen by building a resilient infrastructure
  • Build in redundancy, implement monitoring tools, and automate wherever possible
  • Reduce toil by scripting routine tasks and automating self-repair
  • Support your team of engineers and act as a subject matter expert for our clients
What we offer
What we offer
  • Health benefits
  • Life benefits
  • Disability benefits
  • Financial benefits
  • Retirement benefits
  • Paid leave
  • Professional development
  • Tuition assistance
  • Work-life programs
  • Dependent care
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...
Location
Location
United States
Salary
Salary:
148320.00 - 185400.00 USD / Year
absencesoft.com Logo
AbsenceSoft
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or a related engineering role
  • Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
  • Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
  • Experience building and operating CI/CD pipelines using Jenkins and GitHub
  • Proficiency in Python, Go, or Bash for automation
  • Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
  • Demonstrated experience leading incident response in complex, distributed systems
  • Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
  • Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
  • A collaborative, ownership-driven mindset with strong communication skills
Job Responsibility
Job Responsibility
  • Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
  • Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
  • Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
  • Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
  • Define and maintain SLOs, SLIs, and error budgets
  • Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
  • Lead blameless postmortems
  • Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
  • Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
  • Mentor junior SREs through code reviews, incident pairing, and documentation
What we offer
What we offer
  • Impact that matters
  • Flexibility and trust
  • Remote-first and results driven
  • Growth and development
  • Access to learning resources, leadership programs, and real opportunities to take on new challenges
  • Competitive rewards
  • Comprehensive benefits
  • Performance-based bonus program
  • Equity opportunities
  • Time for life
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking a highly skilled and passionate Senior Site Reliability Engineer ...
Location
Location
Spain; Portugal; United Kingdom
Salary
Salary:
Not provided
parserdigital.com Logo
Parser Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep SRE Expertise: Proven experience as a Senior Site Reliability Engineer or a similar role, with a strong understanding of SRE principles (error budgets, SLOs/SLIs, toil reduction)
  • Azure Cloud Proficiency: Extensive hands-on experience designing, deploying, and operating highly available and scalable applications on Microsoft Azure
  • Azure Kubernetes Service (AKS) Expertise: Mandatory extensive hands-on experience with AKS for container orchestration, including deployment, scaling, monitoring, and troubleshooting
  • Java Ecosystem Mastery: Expert-level proficiency with Java, including experience with modern frameworks (ideally Micronaut, Spring Boot, or similar) and JVM performance tuning
  • Distributed Systems Knowledge: Solid understanding and practical experience with distributed systems, microservices architecture, and associated challenges (e.g., consistency, fault tolerance)
  • Messaging & Database Expertise: Hands-on experience with an event streaming platform (ideally Kafka) and NoSQL data storage (ideally Couchbase), including operational best practices
  • Automation First Mindset: Strong scripting skills (e.g., Python, Bash) and experience with Infrastructure as Code tools (e.g., Terraform, ARM templates) and CI/CD pipelines (e.g., Azure DevOps, Jenkins)
  • Observability Tools: Experience with monitoring, logging, and alerting tools (e.g., Azure Monitor, Prometheus, Grafana, ELK Stack, Splunk)
  • Problem-Solving Acumen: Exceptional analytical and troubleshooting skills, with a methodical approach to diagnosing and resolving complex production issues
  • Communication & Collaboration: Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Architect and Implement Reliability: Design, build, and maintain highly scalable, resilient, and performant systems on Azure, focusing on our Java, Kafka, and Couchbase stack
  • Drive Modernisation: Work hands-on as part of the team spearheading the adoption of Micronaut, standardising application templates, and transitioning to managed cloud services
  • Enhance Operational Excellence: Develop and implement strategies for improving system observability (standardised logging, metrics, tracing), alerting, and on-call practices
  • Automate Everything: Champion automation across the software development lifecycle (SDLC), from CI/CD pipelines to infrastructure provisioning, focusing on accelerating delivery and de-risking deployments
  • Incident Management & Learning: Contribute to our mature, blameless post-incident review process, identifying root causes and implementing preventative measures to reduce incident hours
  • Tooling & Standards: Develop, maintain, and drive the adoption of shared, standardised SRE tooling and best practices across engineering teams, including containerisation (e.g., Docker, Kubernetes on Azure), infrastructure as code (e.g., Terraform), and configuration management
  • Mentorship & Collaboration: Provide technical leadership and mentorship to junior engineers, fostering a culture of SRE principles and operational excellence across the wider engineering organisation
  • Strategic Input: Contribute to the overall technical strategy and roadmap for our SRE and platform initiatives, ensuring alignment with business objectives
What we offer
What we offer
  • The chance to join an organization with triple-digit growth that is changing the paradigm on how software products are built
  • The opportunity to form part of an amazing, multicultural community of tech experts
  • A highly competitive compensation package
  • Medical insurance
  • English lessons
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer (SRE) with deep experience...
Location
Location
United States , Chicago
Salary
Salary:
131000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in SRE, DevOps, or Cloud Engineering
  • Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
  • Hands-on experience with Terraform, Ansible, or other IaC tools
  • Strong scripting/coding skills (Python, Go, Shell, etc.)
  • Experience with Kubernetes, containerization, and orchestration
  • Deep knowledge of Linux systems and networking
  • Experience with Service Meshes (e.g., Istio, App Mesh)
  • Familiarity with AWS Well-Architected Framework
  • Experience building self-healing systems and automated remediation
  • Background in security, compliance, or multi-account/multi-region AWS architectures
Job Responsibility
Job Responsibility
  • Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
  • Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
  • Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
  • Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
  • Optimize systems for cost, performance, and reliability
  • Drive chaos engineering and resilience testing
  • Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
  • Mentor junior SREs and promote DevOps/SRE culture across the org
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a member of Kalshi’s engineering team, you’ll help build the next-generation ...
Location
Location
United States , New York
Salary
Salary:
100000.00 - 250000.00 USD / Year
kalshi.com Logo
Kalshi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of software engineering experience
  • Experience designing, building, scaling, and maintaining production services and service-oriented architectures
  • Strong system design, coding, debugging, performance-tuning, and observability skills
  • High-quality coding practices with strong testing discipline
  • Excellent written and verbal communication
  • comfort working transparently across teams
  • Strong interpersonal skills across junior-to-principal engineering levels
  • Ability to think clearly under pressure and dive into any layer of the stack
  • Passion for building an open financial system that connects the world
  • Willingness to participate in on-call rotations and swiftly resolve issues
Job Responsibility
Job Responsibility
  • Improve observability, reliability, and service availability by defining and measuring key metrics
  • Build automation and systems that eliminate toil and reduce operational burden
  • Collaborate with core infrastructure engineers to performance-tune and optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.)
  • Partner with product teams to minimize service disruptions and automate incident response
  • Identify and analyze reliability problems across the stack, designing and implementing software for significant, long-term improvements
  • Mentor engineers and drive a culture where reliability is a core engineering value
  • Write high-quality, well-tested code that supports internal and external customer needs
  • Debug complex technical issues and improve system usability, operability, and diagnosability
  • Review feature designs across the company and ensure security, safety, scalability, and architectural clarity
  • Build and maintain integrations with third-party vendors
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of ...
Location
Location
Portugal
Salary
Salary:
Not provided
outsystems.com Logo
OutSystems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields)
  • 8+ years of experience in Software Engineering or SRE, ideally within high-growth, cloud-native environments
  • Expertise in Observability: Proven ability to implement SLIs/SLOs and telemetry systems that provide actionable insights into complex distributed systems
  • Cloud Mastery: Deep architectural knowledge of AWS/GCP/Azure, specifically regarding networking, security, and cost-optimization
  • Strategic Impact: Demonstrated success in leading cross-functional initiatives that improved system reliability or developer velocity at an organizational scale
  • System Design & Architecture: Expertise in designing highly available, fault-tolerant distributed systems (Microservices, Event-driven architecture)
  • Development: Professional-level proficiency in Go, Python, or Rust, with the ability to contribute to core product codebases and build custom internal tooling
  • Cloud Ecosystems: Deep-tier mastery of AWS, GCP, or Azure (specifically IAM, VPC networking, Transit Gateways, and Cross-region redundancy)
  • Orchestration at Scale: Extensive experience managing Kubernetes (K8s) in production, including Custom Resource Definitions (CRDs), Service Mesh (Istio/Linkerd), and Admission Controllers
  • Infrastructure as Code (IaC): Advanced usage of Terraform, CloudFormation, or Spacelift, focusing on modularity, state management, and CI/CD integration for infrastructure.
Job Responsibility
Job Responsibility
  • Help define and execute the strategic vision and roadmap for the Site Reliability Engineering function
  • Provide leadership and mentorship to more junior SREs, fostering a culture of innovation, collaboration, and operational excellence
  • Collaborate with leadership and other stakeholders to ensure cross-functional alignment
  • Take active participation, collaborate effectively with development teams, and influence the design of a highly reliable and scalable infrastructure, leveraging cloud technologies and industry best practices
  • Collaborate with development teams at all stages of the product development lifecycle to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant
  • Drive the adoption, definition, and improvement of Service Level Objectives (SLOs)
  • Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents
  • Oversee incident response efforts, ensuring quick resolution and minimal downtime, and effective RCA/post-mortems
  • Automate every operational task, with a special focus on fast incident detection & recovery
  • Foster a culture of continuous improvement and knowledge sharing
What we offer
What we offer
  • A company that is always growing, changing, and innovating
  • Real career opportunities
  • Work colleagues that are as smart, hard-working, and driven as you
  • Disrupting the status quo is in our DNA
  • We ask “why” a lot
  • OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best.
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

As an SRE Engineer II, you will be responsible for managing our multi-cloud infr...
Location
Location
United States , Sunnyvale
Salary
Salary:
138000.00 - 159000.00 USD / Year
illumio.com Logo
Illumio
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • or equivalent work experience
  • 2+ years of experience working as an SRE, DevOps Engineer, or similar role, with hands-on experience in Azure cloud platform in a production environment setting
  • Proficiency in scripting and programming languages such as PowerShell, Python, or Go for automation and infrastructure management tasks
  • Experience with CI/CD tools and methodologies, containerization technologies, and microservices architecture in cloud environments
  • Strong analytical, problem-solving, and communication skills, with the ability to collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Design, deploy, and maintain cloud infrastructure solutions on Azure, AWS, and/or GCP to support our applications and services
  • Implement infrastructure as code (IaC) principles using tools such as Terraform, ARM templates, or CloudFormation to automate provisioning and configuration management
  • Develop and maintain CI/CD pipelines for automated software delivery and deployment, leveraging tools such as Azure DevOps, AWS CodePipeline, or Jenkins
  • Monitor system performance, application health, and infrastructure metrics using cloud monitoring and logging services, and implement proactive measures to optimize performance and availability
  • Support incident response and resolution efforts, conduct root cause analysis, implement corrective actions, and document post-incident reviews
  • Collaborate with Engineering teams to design and implement scalable and reliable architectures, providing guidance on best practices for cloud-native application development
  • Implement security best practices and controls in cloud environments to protect data, applications, and infrastructure, and ensure compliance with regulatory requirements
  • Drive automation initiatives to streamline operational tasks, reduce manual effort, and improve overall efficiency in cloud operations
  • Stay current with cloud platform updates, trends, and best practices, and evaluate emerging technologies for potential adoption to drive innovation and efficiency
  • Provide support and guidance to junior team members, fostering a culture of learning, collaboration, and continuous improvement within the SRE/DevOps team
What we offer
What we offer
  • Medical, Dental, Vision Coverage
  • Health and Dependent Savings Accounts
  • Life and Disability Programs
  • Paid Parental Leave
  • Voluntary Benefit Programs
  • Company Sponsored Wellness Program
  • Wellness Reimbursement Program
  • Retirement Savings
  • Equity Opportunities
  • Paid time off and Paid Holidays
  • Fulltime
Read More
Arrow Right