CrawlJobs Logo

Site Reliability Engineer

getyourguide.com Logo

GetYourGuide

Location Icon

Location:
Germany , Berlin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Site Reliability Engineer you will be part of an empowered full remote team that plays a key part in building, automating and enhancing our cloud and container-based infrastructure. We act as 'engineers for the engineers' helping others understand and leverage the architecture and platform underlying their features. Our technology stack consists of AWS, Kubernetes and Istio. Our aim is to create a reliable platform for running our core services while enabling teams to move fast, take risks and experiment.

Job Responsibility:

  • Build and scale our cloud-based infrastructure including managing our Kubernetes clusters and AWS environment
  • Ensure the high availability, autoscaling and failure recovery capabilities of production and pre-production systems
  • Develop custom controllers to automate the management of clusters
  • Leverage Istio and Envoy to manage service communication and provide network observability
  • Actively drive initiatives towards better system design and implementation of new technologies
  • Participate in infrastructure on-call rotations
  • Champion our operations culture and help the engineering organization deliver highly available services for our customers

Requirements:

  • Availability from 13:00 to 17:00 Central European Standard Time zone (Berlin/Zurich) every day for collaboration with the team
  • Experience with Kubernetes and running containers at scale
  • A good, low level understanding of the Linux operating system
  • Strong coding skills in at least one programming language. Our most used language is Go
  • Good understanding of distributed systems, networking and container technology
  • Sufficient grasp of public cloud environments like AWS
  • Positive, proactive team player who is passionate about their craft and cares about helping the team deliver
  • You care about monitoring and understanding the state of systems
  • Problem solver with operations skills that can quickly diagnose and pinpoint issues in a production environment
  • Excellent written and verbal communication skills in English

Nice to have:

  • Took part in company wide initiatives to improve operational excellence
  • Extended or contributed to open source components (mainly Kubernetes and Istio or similar tools in compute and networking domain)
What we offer:
  • Annual personal growth budget and mentorship programs for continuous learning and development
  • Work from anywhere in the world for 40 days per year
  • Flexible working arrangements to support work-life balance
  • Opportunities to collaborate and socialize with team members through quarterly team events and yearly company-wide events
  • Monthly transportation and fitness budget
  • Discounts for you, your friends, and family on GetYourGuide activities
  • Language reimbursement program
  • Health and wellness benefits
  • Monthly allowance for transport (Deutschland ticket)
  • Bonuses for successful employee referrals
  • Company contributions to personal pension plans
  • 30 days per year for telecommuting
  • 20% discount for friends & family on GetYourGuide activities

Additional Information:

Job Posted:
December 08, 2025

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Site Reliability Engineer

Principal Site Reliability Engineer

We are looking for a reliability expert who is passionate about scaling Cloud se...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expert-level proficiency with 10+ years experience in one or more prominent languages such as Java, Go or Python
  • Expert-level proficiency with 7+ years experience in public cloud offerings (with at least 2+ years specifically on GCP)
  • Expert-level proficiency with 7+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
  • An ability and desire to mentor and coach engineers
Job Responsibility
Job Responsibility
  • Analyse and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency
  • Cross team and functional boundaries to advocate for reliability methodologies
  • Work with a variety of platform, product and SRE teams to both build reliability into our platform and drive adoption of those practices into our products
  • Be the driving force for change
Read More
Arrow Right

Principal Site Reliability Engineer

We are looking for a Principal Site Reliability Engineer to join the CVML Platfo...
Location
Location
United States
Salary
Salary:
166000.00 - 293000.00 USD / Year
bluerivertechnology.com Logo
Blue River Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience building infrastructure with K8S, AWS, and bare metal
  • 8+ years of experience working with Python and Go (with production experience)
  • 8+ years of experience working with infra automation tools: Terraform / Terragrunt (or Pulumi / CDK)
  • 8+ experience with Linux-based systems and networks, and a deep understanding of internal components, networking, and security aspects
  • Has a track record of building and maintaining scalable systems in production environments
  • Experience in building CI/CD pipelines using GitHub Actions (or GitLab / Jenkins) for application release and deployment
  • Experience in using AWS ECS, EKS, IAM, EC2, and RDS at production scale
  • Deep understanding of Kubernetes and its internals (kubelet, CRDs, etc) and experience with building and extending clusters from scratch
  • Strong problem-solving skills and ability to troubleshoot complex infrastructure and networking issues
  • Excellent communication skills to collaborate effectively with technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • System Design: Architect and implement various cloud and on-premise applications, systems, and infrastructure
  • Hybrid system integration: Integrate extremely diverse systems, configure stable integration, uptime, and monitoring
  • Edge device integration: work with edge devices of various formats and integrate them with on-prem and cloud workflows, including networking, low-level OS, and electrical/control integration
  • Low-level performance optimization: optimize the performance and throughput of the system at the filesystem, networking, and software levels
  • High-level optimisation of cost and stability: optimize cost, operational stability, and supportability of highly diverse platforms and tech stack
  • Product Mindset: Collaborate with cross-functional teams to design, develop, and maintain robust, scalable, and user-friendly web and mobile data-intensive applications
  • System Integration: Build tools that enable users to easily move between different applications and platforms to utilize the strengths of each in a coherent ecosystem
  • Collaboration: Work closely with cross-functional teams, including data scientists, analysts, software engineers, and product managers, to understand data requirements and deliver data solutions that align with business goals
  • Documentation: Create and maintain technical documentation, including data flow diagrams, architecture designs, and standard operating procedures
  • Technology Evaluation: Stay up-to-date with industry trends and emerging technologies related to data engineering, recommending and implementing new tools and frameworks as appropriate
What we offer
What we offer
  • eligibility for Blue River’s bonus and benefit programs
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer II

Under general supervision, the Site Reliability Systems Administrator II is resp...
Location
Location
United States , Birmingham
Salary
Salary:
Not provided
genpt.com Logo
Genuine Parts Company
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree
  • Three (3) to five (5) years of related experience or an equivalent combination
  • Intermediate knowledge of appropriate networks, products, and protocols
  • Knowledge of Unix, Windows NT/2000/98, Internet Security, Oracle ERP, Distributed computing systems
  • Knowledge of job associated database/software/documentation/programming languages/monitoring and version control tools
  • Troubleshooting skills
  • Problem solving skills
  • Demonstrated knowledge and adherence to Change Management processes
  • Ability to interface well with customers, end users, partners, and associates
Job Responsibility
Job Responsibility
  • Defines, designs, and administers network systems used for data communications and recommends improvements to problems of moderate scope
  • Responsible for making sure that the company network works
  • Manages the load configuration of a central data communication processor under limited guidance and makes some recommendations for the purchase or upgrade of data networks
  • Exercises some discretion in proposing and implementing network system enhancements (software and hardware updates)
  • Serves as a point of contact for performance analysis, scalability, and service architecture/database administration issues
  • Coordinates equipment orders including terminals and cable installation, as well as upgrading, monitoring, testing, and servicing the database/systems
  • Helps to negotiate and place orders with common carriers
  • Performs other duties as assigned
What we offer
What we offer
  • Healthcare coverage
  • 401(k)
  • Tuition reimbursement
  • Vacation
  • Sick pay
  • Holiday pay
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Join our client, a leading financial institution at the forefront of innovation,...
Location
Location
United States , Austin
Salary
Salary:
57.00 - 63.33 USD / Hour
aquent.com Logo
Aquent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading engineering teams and delivering projects using Scrum and efficient release practices
  • Strong background in converting high-level designs into low-level designs and providing technical oversight
  • Demonstrated experience in designing, architecting, and deploying cloud-native applications, specifically on GCP
  • Proficiency with various database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL
  • Expertise in containerization technologies such as Docker and Kubernetes, and building/managing CI/CD pipelines
  • Experience leveraging AI-Driven software development tools to enhance productivity, code comprehension, and documentation
  • Proven track record of integrating and applying AI/Machine Learning models for data analytics, visualization, automation, and problem-solving
  • Ability to maintain high quality standards while delivering within tight schedules
  • Exceptional collaborative mindset with a bias for action, engaging effectively with product management, architects, and other domains
  • Strong ability to work with internal, external, and offshore stakeholders
Job Responsibility
Job Responsibility
  • Drive Technical Leadership & Project Delivery: Lead engineering teams through the entire project lifecycle, leveraging agile methodologies like Scrum to ensure efficient delivery and robust release practices
  • Architect & Design Cloud-Native Solutions: Translate high-level architectural visions into detailed low-level designs, providing expert technical oversight for the development and deployment of cutting-edge cloud-native applications
  • Champion Reliability & Scalability: Design, architect, and deploy highly available and scalable cloud-native applications on platforms such as GCP, ensuring optimal performance and resilience
  • Optimize Data Management: Leverage your expertise with diverse database technologies, including MongoDB, Aerospike, SQL Server, and PostgreSQL, to build and maintain robust data solutions
  • Advance DevOps & Automation: Implement and optimize containerization strategies using technologies like Docker and Kubernetes, and establish sophisticated CI/CD pipelines to streamline development and deployment
  • Innovate with AI/ML: Integrate and apply AI/Machine Learning models to enhance data analytics, visualization, automation, and creatively solve complex business and technical challenges
  • Foster Collaboration & Mentorship: Work closely with diverse stakeholders across product management, architecture, and other engineering domains, while actively mentoring and coaching multiple teams to elevate technical capabilities
  • Influence & Present Solutions: Effectively engage subject matter experts, present complex architectural solutions to governance boards and stakeholders, and advocate for data-driven proposals
What we offer
What we offer
  • subsidized health, vision, and dental plans
  • paid sick leave
  • retirement plans with a match
Read More
Arrow Right

Senior Site Reliability Engineer

Affirm is reinventing credit to make it more honest and friendly, giving consume...
Location
Location
Spain
Salary
Salary:
85000.00 - 115000.00 EUR / Year
affirm.com Logo
Affirm
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of experience designing, developing and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin
  • A track record of developing highly available distributed systems using technologies like AWS, MySQL and Kubernetes
  • Meaningful experience contributing in or driving parts of the Incident Lifecycle process, enabling actionable insights that improve the quality culture, reliability, resilience, and system performance
  • 4+ years working in a Site Reliability or Production Engineering team
  • Experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design
  • Experience in making impactful changes in a large code base, and have developed a suite of tools and practices that enable you and your team to do so safely
  • Strong verbal and written communication skills that support effective collaboration with our global engineering team
  • On-Call Rotation - There would be an on-call rotation for this role as a requirement
Job Responsibility
Job Responsibility
  • You will be responsible for owning and delivering quarterly goals for your team, leading engineers on your team through ambiguity to solve open-ended problems, and ensuring that everyone is supported throughout delivery
  • You will support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs
  • You will proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis
  • You will support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts
  • You will foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks
  • You will help develop talent on your team by providing feedback and guidance, and leading by example
What we offer
What we offer
  • Flexible Spending Wallets for tech, food and lifestyle
  • Away Days - wellness days to take off work and recharge
  • Learning & Development programs
  • Parental benefit
  • Employee Resource & Community Groups
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

You develop cloud platform according to modern principles. You advise our custom...
Location
Location
Spain , Valencia
Salary
Salary:
Not provided
maibornwolff.de Logo
MaibornWolff GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ideally, a degree in computer science or comparable training
  • Sound technical understanding
  • Idea of how to build and run a secure application in the cloud
  • Experience with container orchestration, ideally with Kubernetes
  • Experience with Infrastructure-as-Code tools such as Terraform, Helm, Ansible, or CDK
  • Experience in setting up the release management process using modern CI/CD systems
  • Knowledge of a cloud provider (AWS, Azure, Google Cloud) certified in the best case
  • Development skills in at least one object-oriented, functional or scripting language
  • Very good English and good German Skills
Job Responsibility
Job Responsibility
  • Develop cloud platform according to modern principles
  • Advise customers on the sensible use of services in the cloud with regard to effort, costs and maintenance
  • Live a vibrant DevOps culture internally and carry it to customers
  • Help the customer to introduce the correct release processes and implement them based on the modern CI/CD tools (Azure DevOps, Gitlab, Github)
  • Develop and integrate monitoring and logging infrastructure to improve application maintainability
  • Design and develop scalable and fail-safe IT architectures
What we offer
What we offer
  • Home Office & Office
  • Flexible Working Hours
  • Part-Time Models
  • Working Time Account
  • Sabbatical
  • 30 days of paid vacation
  • An annual training budget of 1.5 gross monthly salaries for training, certifications, conferences, and more
  • Corporate seminars
  • Christmas parties
  • Private health and dental insurance
Read More
Arrow Right

Site Reliability Engineer

About LogRocket: Founded in 2016, LogRocket's goal is to make every experience o...
Location
Location
United States , Boston
Salary
Salary:
135000.00 - 220000.00 USD / Year
logrocket.com Logo
LogRocket
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 4 years of experience as a Site Reliability Engineer, or related job
  • Ability to read and understand product code
  • Familiarity with the state of the art in cloud technologies, including common providers, specific tools of the trade, and their strengths and weaknesses
  • Experience operating applications and databases with demanding scalability or availability requirements
  • Proven expertise in modern container orchestration practices
  • A strong understanding of the performance, architecture, tooling, and cost of cloud systems
  • A security focused mindset with a solid understanding of incident response and risk mitigation
  • A strong collaborator who is transparent about progress on tasks, seeks feedback early and often, works effectively with the team and customers
Job Responsibility
Job Responsibility
  • Improve quality of pager alerts while reducing noise
  • Maintain awareness of engineering initiatives across the organization and monitor their impact on stability, cost, and performance
  • Keep infrastructure up-to-date to take advantage of security patches and new features
  • Improve operational security without sacrificing engineering independence
What we offer
What we offer
  • Catered lunch and an impressive array of your favorite snacks
  • Unlimited vacation policy
  • Health, Dental, Vision benefits, 401k, commuter benefits
  • Generous stock options
  • Regular team outings and activities
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a highly skilled Site Reliability Engineer (SRE), you will contribute to buil...
Location
Location
United States , New York City; San Francisco
Salary
Salary:
160000.00 - 300000.00 USD / Year
hebbia.ai Logo
Hebbia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 5+ years software development experience at a venture-backed startup or top technology firm
  • Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role
  • Strong expertise in managing CI/CD pipelines and deployment automation
  • Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop)
  • Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes
  • Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar
  • Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Familiarity with security best practices and tools for infrastructure and application security
  • Excellent problem-solving skills and the ability to troubleshoot complex issues
Job Responsibility
Job Responsibility
  • Assist in managing deployment pipelines to facilitate smooth and efficient software releases
  • Help implement and maintain observability solutions for monitoring system performance and reliability
  • Support local development environments to optimize developer workflows
  • Work with development teams to ensure infrastructure aligns with project requirements
  • Contribute to improving the security of our infrastructure by assisting with proactive measures and audits
  • Assist in developing and maintaining automation scripts and tools to enhance operational efficiency
  • Help troubleshoot and resolve infrastructure and application issues to minimize downtime and maintain smooth operations
  • Participate in evaluating and integrating new technologies to enhance the scalability, reliability, and security of our infrastructure
What we offer
What we offer
  • PTO: Unlimited
  • Insurance: Medical + Dental + Vision + 401K
  • Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late
  • Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent
  • Fertility benefits: $15k lifetime benefit
  • New hire equity grant: competitive equity package with unmatched upside potential
  • Fulltime
Read More
Arrow Right