CrawlJobs Logo

Junior Site Reliability Engineer

aiven.io Logo

Aiven Deutschland GmbH

Location Icon

Location:
Finland , Helsinki

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are looking for an early-career Site Reliability Engineer to join our global team. In this role, you will be the engine that keeps our cloud operations platform running smoothly, turning complex open-source technologies into reliable services for our customers. You’ll be part of a team that champions platform reliability. This is a hands-on operational role where you’ll dive into the day-to-day mechanics of a massive cloud infrastructure, from handling stakeholder requests to building the tools that monitor our systems. We value automation over manual repetition, and we’ll give you the space to grow your skills in both software development and systems administration.

Job Responsibility:

  • Handle essential operational duties, including stakeholder-driven tasks like managing account lifecycles and service adjustments
  • Improve our observability framework and automate manual toil to create a self-healing, highly visible production environment
  • Participate in our on-call rotation to maintain platform health

Requirements:

  • Ability to Code: basic programming skills, with a preference for Python
  • Linux Fundamentals: comfortable working in a terminal and have a grasp of Linux systems administration and networking
  • Analytical Problem Solving: enjoy the detective work of debugging
  • AI Curiosity: interested in how AI is changing the infrastructure landscape
  • Operational Mindset: ready to contribute to a rotation

Nice to have:

Hands-on Database/Streaming Experience: worked on open-source tools like PostgreSQL, Kafka, Clickhouse or OpenSearch

What we offer:
  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast

Additional Information:

Job Posted:
April 24, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Junior Site Reliability Engineer

Junior Site Reliability Engineer

As a Jr. Site Reliability Engineer, you will 'make things scale' which includes ...
Location
Location
United Kingdom
Salary
Salary:
Not provided
accesso.com Logo
accesso
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Some practical exposure to cloud platforms (AWS/Azure/GCP)—coursework, internships, or self-led projects
  • Ability to self-learn with assistance from Senior Engineers
  • Basic scripting ability using Python or Bash
  • Familiarity with basic Linux systems and general command–line
  • Understanding of Git and basic CI/CD concepts
  • Good written and verbal communication
  • customer-focused approach
  • Ability to work with minimal direction
  • Willingness to learn, take direction and work within a team
Job Responsibility
Job Responsibility
  • Assisting with provisioning and deploying accesso Horizon components to customer cloud accounts using Infrastructure as Code (Terraform)
  • Help maintain CI/CD pipelines (GitHub Actions) for application and infrastructure deployments
  • Support monitoring, logging and alerting (Prometheus, Grafana & Coralogix) and respond to basic alerts with supervision
  • Implement and improve basic automation and scripting
  • Participate in incident triage, root cause investigation and follow-up tasks
  • Follow security and compliance requirements for customer cloud environments (identity, secrets, network controls)
  • Produce and maintain operational runbooks, deployment guides and change notes
  • Participate in on-call rotation as a L1 responder
  • Normal workday may require time outside the normal working day
  • Learn and apply accesso Horizon product architecture and configuration
What we offer
What we offer
  • Competitive compensation package including an annual bonus opportunity
  • 8-days of paid bank holiday leave and 26-days of paid annual leave (paid leave increases with tenure)
  • 8 hours of paid Volunteer Time Off
  • Inclusive Family Benefits, including a $7,500 benefit for surrogacy, adoption, and fertility
  • Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance
  • Matching pension scheme (up to 8%)
  • Unlimited access to Udemy for Business
  • Flexible work schedule
  • Fulltime
Read More
Arrow Right

Lead Site Reliability Engineer

Groupon is a marketplace where customers discover new experiences and services e...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in systems engineering
  • at least 5+ years in SRE or DevOps roles
  • expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • proficiency in programming and scripting languages like Python, Go, and Bash
  • advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • deep understanding of networking, DNS, load balancing, and security principles
  • proven track record of managing high-availability systems in demanding environments
  • exceptional analytical and problem-solving skills
Job Responsibility
Job Responsibility
  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher
  • drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools
  • create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery
  • build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack
  • collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs
  • lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues
  • design and execute performance testing, capacity planning, and scalability strategies for evolving workloads
  • proactively identify and resolve bottlenecks, increasing system performance and developer efficiency
  • mentor junior engineers, fostering a collaborative and growth-oriented team environment
  • guide architectural decisions that drive innovation and enhance system reliability
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • a collaborative and innovative work values alignment that values your expertise and contributions
  • professional growth and leadership development pathways tailored to your aspirations
  • a chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right

Staff Engineer, Site Reliability

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in I...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
learnupon.com Logo
LearnUpon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in a software or Ops role
  • 5+ years of cloud engineering experience, with at least 2 years experience with AWS
  • Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
  • Experience in designing and implementing Observability tech stacks
  • Have championed the benefits of Observability to Engineering teams
  • Can architect the design of SLO/SLI implementation that balances the needs of different teams
  • Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
  • Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers
Job Responsibility
Job Responsibility
  • Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
  • Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
  • Driving the processes to maintain resilient, scalable and cost-effective infrastructure
  • Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
  • Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
  • Reacting quickly to changing customer and business needs
  • Participate in on-call rota
  • Mentoring junior talent
What we offer
What we offer
  • Work in a fun and supportive environment with regular team events
  • Excellent career progression
  • Structured learning environment
  • Competitive salary and company ESOP
  • Private health insurance
  • 26 days annual leave
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We're looking for a Senior Site Reliability Engineer for our Currents team, resp...
Location
Location
United States , Austin
Salary
Salary:
129600.00 - 232200.00 USD / Year
braze.com Logo
Braze
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s in Computer Science, Software Engineering, or a related STEM field
  • Five (5) years of experience in any role/occupation/position involving software engineering or site reliability engineering
  • Experience using distributed systems to deploy and monitor live applications such as Kubernetes or Docker Swarm
  • Experience working with alerting software (Sentry, Datadog, and/or PagerDuty)
  • Experience utilizing programming languages (Java, Kotlin, and/or Ruby) to understand and contribute to the codebase
  • Experience storing data in relational and non-relational databases such as Postgres and MongoDb
  • Experience with data streaming or queuing systems to build data pipelines with technologies like Kafka, Sidekiq or SQS and SNS
  • Experience leveraging continuous integration tools such as Jenkins or Buildkite
  • Experience collaborating with engineers through pull requests and code reviews in version control software such as GitHub or GitLab
Job Responsibility
Job Responsibility
  • Solve live performance and reliability issues and prevent their recurrence
  • Write and review code, educating engineers and building a culture of reliability
  • Practice sustainable incident response and blameless postmortems
  • Define and enable standards for monitoring, reliability, and performance
  • Bridge the gap between infrastructure and platform engineering teams
  • Support and improve services by planning for scale and reliability
  • Guide junior engineers in SRE best practices, software engineering, and agile project leadership
What we offer
What we offer
  • Competitive compensation that may include equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave
  • Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
  • A curated in-office employee experience, designed to foster community, team connections, and innovation
  • Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
  • Employee Resource Groups that provide supportive communities within Braze
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We're looking for a Senior Site Reliability Engineer for our Currents team, resp...
Location
Location
United States , San Francisco
Salary
Salary:
129600.00 - 232200.00 USD / Year
braze.com Logo
Braze
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s in Computer Science, Software Engineering, or a related STEM field
  • Five (5) years of experience in any role/occupation/position involving software engineering or site reliability engineering
  • Experience using distributed systems to deploy and monitor live applications such as Kubernetes or Docker Swarm
  • Experience working with alerting software (Sentry, Datadog, and/or PagerDuty)
  • Experience utilizing programming languages (Java, Kotlin, and/or Ruby) to understand and contribute to the codebase
  • Experience storing data in relational and non-relational databases such as Postgres and MongoDb
  • Experience with data streaming or queuing systems to build data pipelines with technologies like Kafka, Sidekiq or SQS and SNS
  • Experience leveraging continuous integration tools such as Jenkins or Buildkite
  • Experience collaborating with engineers through pull requests and code reviews in version control software such as GitHub or GitLab
Job Responsibility
Job Responsibility
  • Solve live performance and reliability issues and prevent their recurrence
  • Write and review code, educating engineers and building a culture of reliability
  • Practice sustainable incident response and blameless postmortems
  • Define and enable standards for monitoring, reliability, and performance
  • Bridge the gap between infrastructure and platform engineering teams
  • Support and improve services by planning for scale and reliability
  • Guide junior engineers in SRE best practices, software engineering, and agile project leadership
What we offer
What we offer
  • Competitive compensation that may include equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave
  • Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
  • A curated in-office employee experience, designed to foster community, team connections, and innovation
  • Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
  • Employee Resource Groups that provide supportive communities within Braze
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We're looking for a Senior Site Reliability Engineer for our Currents team, resp...
Location
Location
United States , New York City
Salary
Salary:
129600.00 - 232200.00 USD / Year
braze.com Logo
Braze
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s in Computer Science, Software Engineering, or a related STEM field
  • Five (5) years of experience in any role/occupation/position involving software engineering or site reliability engineering
  • Experience must include: Using distributed systems to deploy and monitor live applications such as Kubernetes or Docker Swarm
  • Working with alerting software (Sentry, Datadog, and/or PagerDuty)
  • Utilizing programming languages (Java, Kotlin, and/or Ruby) to understand and contribute to the codebase
  • Storing data in relational and non-relational databases such as Postgres and MongoDb
  • Data streaming or queuing systems to build data pipelines with technologies like Kafka, Sidekiq or SQS and SNS
  • Leveraging continuous integration tools such as Jenkins or Buildkite
  • Collaborating with engineers through pull requests and code reviews in version control software such as GitHub or GitLab
Job Responsibility
Job Responsibility
  • Solve live performance and reliability issues and prevent their recurrence
  • Write and review code, educating engineers and building a culture of reliability
  • Practice sustainable incident response and blameless postmortems
  • Define and enable standards for monitoring, reliability, and performance
  • Bridge the gap between infrastructure and platform engineering teams
  • Support and improve services by planning for scale and reliability
  • Guide junior engineers in SRE best practices, software engineering, and agile project leadership
What we offer
What we offer
  • Competitive compensation that may include equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave
  • Professional development supported by formal career pathing, learning platforms, and a yearly learning stipend
  • A curated in-office employee experience, designed to foster community, team connections, and innovation
  • Opportunities to give back to your community, including an annual company-wide Volunteer Week and donation matching
  • Employee Resource Groups that provide supportive communities within Braze
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a member of Kalshi’s engineering team, you’ll help build the next-generation ...
Location
Location
United States , New York
Salary
Salary:
100000.00 - 250000.00 USD / Year
kalshi.com Logo
Kalshi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of software engineering experience
  • Experience designing, building, scaling, and maintaining production services and service-oriented architectures
  • Strong system design, coding, debugging, performance-tuning, and observability skills
  • High-quality coding practices with strong testing discipline
  • Excellent written and verbal communication
  • comfort working transparently across teams
  • Strong interpersonal skills across junior-to-principal engineering levels
  • Ability to think clearly under pressure and dive into any layer of the stack
  • Passion for building an open financial system that connects the world
  • Willingness to participate in on-call rotations and swiftly resolve issues
Job Responsibility
Job Responsibility
  • Improve observability, reliability, and service availability by defining and measuring key metrics
  • Build automation and systems that eliminate toil and reduce operational burden
  • Collaborate with core infrastructure engineers to performance-tune and optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.)
  • Partner with product teams to minimize service disruptions and automate incident response
  • Identify and analyze reliability problems across the stack, designing and implementing software for significant, long-term improvements
  • Mentor engineers and drive a culture where reliability is a core engineering value
  • Write high-quality, well-tested code that supports internal and external customer needs
  • Debug complex technical issues and improve system usability, operability, and diagnosability
  • Review feature designs across the company and ensure security, safety, scalability, and architectural clarity
  • Build and maintain integrations with third-party vendors
What we offer
What we offer
  • equity and benefits
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

We are seeking a highly skilled and passionate Senior Site Reliability Engineer ...
Location
Location
Spain; Portugal; United Kingdom
Salary
Salary:
Not provided
parserdigital.com Logo
Parser Limited
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep SRE Expertise: Proven experience as a Senior Site Reliability Engineer or a similar role, with a strong understanding of SRE principles (error budgets, SLOs/SLIs, toil reduction)
  • Azure Cloud Proficiency: Extensive hands-on experience designing, deploying, and operating highly available and scalable applications on Microsoft Azure
  • Azure Kubernetes Service (AKS) Expertise: Mandatory extensive hands-on experience with AKS for container orchestration, including deployment, scaling, monitoring, and troubleshooting
  • Java Ecosystem Mastery: Expert-level proficiency with Java, including experience with modern frameworks (ideally Micronaut, Spring Boot, or similar) and JVM performance tuning
  • Distributed Systems Knowledge: Solid understanding and practical experience with distributed systems, microservices architecture, and associated challenges (e.g., consistency, fault tolerance)
  • Messaging & Database Expertise: Hands-on experience with an event streaming platform (ideally Kafka) and NoSQL data storage (ideally Couchbase), including operational best practices
  • Automation First Mindset: Strong scripting skills (e.g., Python, Bash) and experience with Infrastructure as Code tools (e.g., Terraform, ARM templates) and CI/CD pipelines (e.g., Azure DevOps, Jenkins)
  • Observability Tools: Experience with monitoring, logging, and alerting tools (e.g., Azure Monitor, Prometheus, Grafana, ELK Stack, Splunk)
  • Problem-Solving Acumen: Exceptional analytical and troubleshooting skills, with a methodical approach to diagnosing and resolving complex production issues
  • Communication & Collaboration: Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Architect and Implement Reliability: Design, build, and maintain highly scalable, resilient, and performant systems on Azure, focusing on our Java, Kafka, and Couchbase stack
  • Drive Modernisation: Work hands-on as part of the team spearheading the adoption of Micronaut, standardising application templates, and transitioning to managed cloud services
  • Enhance Operational Excellence: Develop and implement strategies for improving system observability (standardised logging, metrics, tracing), alerting, and on-call practices
  • Automate Everything: Champion automation across the software development lifecycle (SDLC), from CI/CD pipelines to infrastructure provisioning, focusing on accelerating delivery and de-risking deployments
  • Incident Management & Learning: Contribute to our mature, blameless post-incident review process, identifying root causes and implementing preventative measures to reduce incident hours
  • Tooling & Standards: Develop, maintain, and drive the adoption of shared, standardised SRE tooling and best practices across engineering teams, including containerisation (e.g., Docker, Kubernetes on Azure), infrastructure as code (e.g., Terraform), and configuration management
  • Mentorship & Collaboration: Provide technical leadership and mentorship to junior engineers, fostering a culture of SRE principles and operational excellence across the wider engineering organisation
  • Strategic Input: Contribute to the overall technical strategy and roadmap for our SRE and platform initiatives, ensuring alignment with business objectives
What we offer
What we offer
  • The chance to join an organization with triple-digit growth that is changing the paradigm on how software products are built
  • The opportunity to form part of an amazing, multicultural community of tech experts
  • A highly competitive compensation package
  • Medical insurance
  • English lessons
  • Fulltime
Read More
Arrow Right