Junior Site Reliability Engineer Job at Aiven Deutschland GmbH (Helsinki)

Junior Site Reliability Engineer

As a Jr. Site Reliability Engineer, you will 'make things scale' which includes ...

Location

United Kingdom

Salary:

Not provided

accesso

Expiration Date

Until further notice

Requirements

Some practical exposure to cloud platforms (AWS/Azure/GCP)—coursework, internships, or self-led projects
Ability to self-learn with assistance from Senior Engineers
Basic scripting ability using Python or Bash
Familiarity with basic Linux systems and general command–line
Understanding of Git and basic CI/CD concepts
Good written and verbal communication
customer-focused approach
Ability to work with minimal direction
Willingness to learn, take direction and work within a team

Job Responsibility

Assisting with provisioning and deploying accesso Horizon components to customer cloud accounts using Infrastructure as Code (Terraform)
Help maintain CI/CD pipelines (GitHub Actions) for application and infrastructure deployments
Support monitoring, logging and alerting (Prometheus, Grafana & Coralogix) and respond to basic alerts with supervision
Implement and improve basic automation and scripting
Participate in incident triage, root cause investigation and follow-up tasks
Follow security and compliance requirements for customer cloud environments (identity, secrets, network controls)
Produce and maintain operational runbooks, deployment guides and change notes
Participate in on-call rotation as a L1 responder
Normal workday may require time outside the normal working day
Learn and apply accesso Horizon product architecture and configuration

What we offer

Competitive compensation package including an annual bonus opportunity
8-days of paid bank holiday leave and 26-days of paid annual leave (paid leave increases with tenure)
8 hours of paid Volunteer Time Off
Inclusive Family Benefits, including a $7,500 benefit for surrogacy, adoption, and fertility
Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance
Matching pension scheme (up to 8%)
Unlimited access to Udemy for Business
Flexible work schedule

Fulltime

Site Reliability Engineer

Engineering to make a system more resilient and efficient frees up time and mone...

Location

United States , Annapolis Junction

Salary:

86900.00 - 198000.00 USD / Year

Booz Allen Hamilton

Expiration Date

Until further notice

Requirements

5+ years of experience creating and maintaining highly reliable and scalable systems to reduce issues and downtime, including design and implementation of physical servers, storage systems, and network infrastructures
5+ years of experience providing technical support for system upgrades, rollouts, and enhancements
3+ years of experience developing and deploying infrastructure solutions
3+ years of experience employing and sustaining VMware for v6.x and later, including the design and implementation of virtual data centers
3+ years of experience designing and deploying highly available storage solutions for technologies, including SAN storage and high-capacity storage solutions
Experience with data center design and buildout
Experience transforming large-scale software, data center, or on-premises infrastructure programs to a virtualized architecture
Ability to interact with clients and lead, train, and mentor junior system administrators
Top Secret clearance
Bachelor's degree

Job Responsibility

Lead the development of more robust systems for Booz Allen by building a resilient infrastructure
Build in redundancy, implement monitoring tools, and automate wherever possible
Reduce toil by scripting routine tasks and automating self-repair
Support your team of engineers and act as a subject matter expert for our clients

What we offer

Health benefits
Life benefits
Disability benefits
Financial benefits
Retirement benefits
Paid leave
Professional development
Tuition assistance
Work-life programs
Dependent care

Fulltime

Site Reliability Engineer III

We're looking for a senior Site Reliability Engineer to join our small, high-own...

Location

United States

Salary:

148320.00 - 185400.00 USD / Year

AbsenceSoft

Expiration Date

Until further notice

Requirements

5+ years of experience in SRE, DevOps, or a related engineering role
Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
Experience building and operating CI/CD pipelines using Jenkins and GitHub
Proficiency in Python, Go, or Bash for automation
Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
Demonstrated experience leading incident response in complex, distributed systems
Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
A collaborative, ownership-driven mindset with strong communication skills

Job Responsibility

Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
Define and maintain SLOs, SLIs, and error budgets
Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
Lead blameless postmortems
Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
Mentor junior SREs through code reviews, incident pairing, and documentation

What we offer

Impact that matters
Flexibility and trust
Remote-first and results driven
Growth and development
Access to learning resources, leadership programs, and real opportunities to take on new challenges
Competitive rewards
Comprehensive benefits
Performance-based bonus program
Equity opportunities
Time for life

Fulltime

Senior Site Reliability Engineer

We are seeking a highly skilled and passionate Senior Site Reliability Engineer ...

Location

Spain; Portugal; United Kingdom

Salary:

Not provided

Parser Limited

Expiration Date

Until further notice

Requirements

Deep SRE Expertise: Proven experience as a Senior Site Reliability Engineer or a similar role, with a strong understanding of SRE principles (error budgets, SLOs/SLIs, toil reduction)
Azure Cloud Proficiency: Extensive hands-on experience designing, deploying, and operating highly available and scalable applications on Microsoft Azure
Azure Kubernetes Service (AKS) Expertise: Mandatory extensive hands-on experience with AKS for container orchestration, including deployment, scaling, monitoring, and troubleshooting
Java Ecosystem Mastery: Expert-level proficiency with Java, including experience with modern frameworks (ideally Micronaut, Spring Boot, or similar) and JVM performance tuning
Distributed Systems Knowledge: Solid understanding and practical experience with distributed systems, microservices architecture, and associated challenges (e.g., consistency, fault tolerance)
Messaging & Database Expertise: Hands-on experience with an event streaming platform (ideally Kafka) and NoSQL data storage (ideally Couchbase), including operational best practices
Automation First Mindset: Strong scripting skills (e.g., Python, Bash) and experience with Infrastructure as Code tools (e.g., Terraform, ARM templates) and CI/CD pipelines (e.g., Azure DevOps, Jenkins)
Observability Tools: Experience with monitoring, logging, and alerting tools (e.g., Azure Monitor, Prometheus, Grafana, ELK Stack, Splunk)
Problem-Solving Acumen: Exceptional analytical and troubleshooting skills, with a methodical approach to diagnosing and resolving complex production issues
Communication & Collaboration: Excellent communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively with cross-functional teams

Job Responsibility

Architect and Implement Reliability: Design, build, and maintain highly scalable, resilient, and performant systems on Azure, focusing on our Java, Kafka, and Couchbase stack
Drive Modernisation: Work hands-on as part of the team spearheading the adoption of Micronaut, standardising application templates, and transitioning to managed cloud services
Enhance Operational Excellence: Develop and implement strategies for improving system observability (standardised logging, metrics, tracing), alerting, and on-call practices
Automate Everything: Champion automation across the software development lifecycle (SDLC), from CI/CD pipelines to infrastructure provisioning, focusing on accelerating delivery and de-risking deployments
Incident Management & Learning: Contribute to our mature, blameless post-incident review process, identifying root causes and implementing preventative measures to reduce incident hours
Tooling & Standards: Develop, maintain, and drive the adoption of shared, standardised SRE tooling and best practices across engineering teams, including containerisation (e.g., Docker, Kubernetes on Azure), infrastructure as code (e.g., Terraform), and configuration management
Mentorship & Collaboration: Provide technical leadership and mentorship to junior engineers, fostering a culture of SRE principles and operational excellence across the wider engineering organisation
Strategic Input: Contribute to the overall technical strategy and roadmap for our SRE and platform initiatives, ensuring alignment with business objectives

What we offer

The chance to join an organization with triple-digit growth that is changing the paradigm on how software products are built
The opportunity to form part of an amazing, multicultural community of tech experts
A highly competitive compensation package
Medical insurance
English lessons

Fulltime

Senior Site Reliability Engineer

We are looking for a Senior Site Reliability Engineer (SRE) with deep experience...

Location

United States , Chicago

Salary:

131000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

5+ years of experience in SRE, DevOps, or Cloud Engineering
Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
Hands-on experience with Terraform, Ansible, or other IaC tools
Strong scripting/coding skills (Python, Go, Shell, etc.)
Experience with Kubernetes, containerization, and orchestration
Deep knowledge of Linux systems and networking
Experience with Service Meshes (e.g., Istio, App Mesh)
Familiarity with AWS Well-Architected Framework
Experience building self-healing systems and automated remediation
Background in security, compliance, or multi-account/multi-region AWS architectures

Job Responsibility

Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
Optimize systems for cost, performance, and reliability
Drive chaos engineering and resilience testing
Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
Mentor junior SREs and promote DevOps/SRE culture across the org

Fulltime

Site Reliability Engineer

As a member of Kalshi’s engineering team, you’ll help build the next-generation ...

Location

United States , New York

Salary:

100000.00 - 250000.00 USD / Year

Kalshi

Expiration Date

Until further notice

Requirements

4+ years of software engineering experience
Experience designing, building, scaling, and maintaining production services and service-oriented architectures
Strong system design, coding, debugging, performance-tuning, and observability skills
High-quality coding practices with strong testing discipline
Excellent written and verbal communication
comfort working transparently across teams
Strong interpersonal skills across junior-to-principal engineering levels
Ability to think clearly under pressure and dive into any layer of the stack
Passion for building an open financial system that connects the world
Willingness to participate in on-call rotations and swiftly resolve issues

Job Responsibility

Improve observability, reliability, and service availability by defining and measuring key metrics
Build automation and systems that eliminate toil and reduce operational burden
Collaborate with core infrastructure engineers to performance-tune and optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.)
Partner with product teams to minimize service disruptions and automate incident response
Identify and analyze reliability problems across the stack, designing and implementing software for significant, long-term improvements
Mentor engineers and drive a culture where reliability is a core engineering value
Write high-quality, well-tested code that supports internal and external customer needs
Debug complex technical issues and improve system usability, operability, and diagnosability
Review feature designs across the company and ensure security, safety, scalability, and architectural clarity
Build and maintain integrations with third-party vendors

What we offer

equity and benefits

Fulltime

Principal Site Reliability Engineer

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of ...

Location

Portugal

Salary:

Not provided

OutSystems

Expiration Date

Until further notice

Requirements

STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields)
8+ years of experience in Software Engineering or SRE, ideally within high-growth, cloud-native environments
Expertise in Observability: Proven ability to implement SLIs/SLOs and telemetry systems that provide actionable insights into complex distributed systems
Cloud Mastery: Deep architectural knowledge of AWS/GCP/Azure, specifically regarding networking, security, and cost-optimization
Strategic Impact: Demonstrated success in leading cross-functional initiatives that improved system reliability or developer velocity at an organizational scale
System Design & Architecture: Expertise in designing highly available, fault-tolerant distributed systems (Microservices, Event-driven architecture)
Development: Professional-level proficiency in Go, Python, or Rust, with the ability to contribute to core product codebases and build custom internal tooling
Cloud Ecosystems: Deep-tier mastery of AWS, GCP, or Azure (specifically IAM, VPC networking, Transit Gateways, and Cross-region redundancy)
Orchestration at Scale: Extensive experience managing Kubernetes (K8s) in production, including Custom Resource Definitions (CRDs), Service Mesh (Istio/Linkerd), and Admission Controllers
Infrastructure as Code (IaC): Advanced usage of Terraform, CloudFormation, or Spacelift, focusing on modularity, state management, and CI/CD integration for infrastructure.

Job Responsibility

Help define and execute the strategic vision and roadmap for the Site Reliability Engineering function
Provide leadership and mentorship to more junior SREs, fostering a culture of innovation, collaboration, and operational excellence
Collaborate with leadership and other stakeholders to ensure cross-functional alignment
Take active participation, collaborate effectively with development teams, and influence the design of a highly reliable and scalable infrastructure, leveraging cloud technologies and industry best practices
Collaborate with development teams at all stages of the product development lifecycle to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant
Drive the adoption, definition, and improvement of Service Level Objectives (SLOs)
Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents
Oversee incident response efforts, ensuring quick resolution and minimal downtime, and effective RCA/post-mortems
Automate every operational task, with a special focus on fast incident detection & recovery
Foster a culture of continuous improvement and knowledge sharing

What we offer

A company that is always growing, changing, and innovating
Real career opportunities
Work colleagues that are as smart, hard-working, and driven as you
Disrupting the status quo is in our DNA
We ask “why” a lot
OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best.

Fulltime

Site Reliability Engineer II

As an SRE Engineer II, you will be responsible for managing our multi-cloud infr...

Location

United States , Sunnyvale

Salary:

138000.00 - 159000.00 USD / Year

Illumio

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
or equivalent work experience
2+ years of experience working as an SRE, DevOps Engineer, or similar role, with hands-on experience in Azure cloud platform in a production environment setting
Proficiency in scripting and programming languages such as PowerShell, Python, or Go for automation and infrastructure management tasks
Experience with CI/CD tools and methodologies, containerization technologies, and microservices architecture in cloud environments
Strong analytical, problem-solving, and communication skills, with the ability to collaborate effectively with cross-functional teams

Job Responsibility

Design, deploy, and maintain cloud infrastructure solutions on Azure, AWS, and/or GCP to support our applications and services
Implement infrastructure as code (IaC) principles using tools such as Terraform, ARM templates, or CloudFormation to automate provisioning and configuration management
Develop and maintain CI/CD pipelines for automated software delivery and deployment, leveraging tools such as Azure DevOps, AWS CodePipeline, or Jenkins
Monitor system performance, application health, and infrastructure metrics using cloud monitoring and logging services, and implement proactive measures to optimize performance and availability
Support incident response and resolution efforts, conduct root cause analysis, implement corrective actions, and document post-incident reviews
Collaborate with Engineering teams to design and implement scalable and reliable architectures, providing guidance on best practices for cloud-native application development
Implement security best practices and controls in cloud environments to protect data, applications, and infrastructure, and ensure compliance with regulatory requirements
Drive automation initiatives to streamline operational tasks, reduce manual effort, and improve overall efficiency in cloud operations
Stay current with cloud platform updates, trends, and best practices, and evaluate emerging technologies for potential adoption to drive innovation and efficiency
Provide support and guidance to junior team members, fostering a culture of learning, collaboration, and continuous improvement within the SRE/DevOps team

What we offer

Medical, Dental, Vision Coverage
Health and Dependent Savings Accounts
Life and Disability Programs
Paid Parental Leave
Voluntary Benefit Programs
Company Sponsored Wellness Program
Wellness Reimbursement Program
Retirement Savings
Equity Opportunities
Paid time off and Paid Holidays

Fulltime

Select Country

Junior Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Junior Site Reliability Engineer

Junior Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer III

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer

Principal Site Reliability Engineer

Site Reliability Engineer II

Our AI answers in your language