CrawlJobs Logo

Site Reliability Engineer

Netherlands, Utrecht · Job Posted May 30, 2026
Apply Position
Job Link Share

Job Description

RED Global is currently supporting one of our international clients in their search for an experienced Site Reliability Engineer for a contract project based in Utrecht.

Requirements

  • Strong experience as a Site Reliability Engineer
  • Experience supporting and maintaining reliable, scalable production environments
  • Strong troubleshooting and incident management capabilities
  • Experience working within complex enterprise environments
  • Strong communication and stakeholder management skills

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Site Reliability Engineer

8 matching positions

New

Site Reliability Engineer

We are currently seeking a Site Reliability Engineer to join our team in Westlak...
Location
Location
United States , Westlake
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence)
  • 5+ years’ hands-on Terraform experience
  • 5+ years’ experience supporting mission-critical enterprise applications in production environments
  • 5+ years’ experience with cloud networking, security, and infrastructure architecture
  • 5+ years of hands-on experience managing hybrid cloud environments
  • 5 + years of automation skills using Python, Ansible, Shell scripting, or similar technologies
  • 5+ years’ experience building reusable infrastructure modules and automated deployment frameworks
Job Responsibility
Job Responsibility
  • Design, implement, and support highly available load balancing solutions using F5 BIG-IP, Broadcom AVI, and cloud-native load balancing services
  • Build and maintain Infrastructure-as-Code (IaC) solutions using Terraform
  • Develop automation solutions for infrastructure provisioning, configuration management, and operational workflows
  • Support and enhance CI/CD pipelines using tools such as Jenkins, Azure DevOps, GitHub Actions, or similar platforms
  • Collaborate with application, cloud, network, and platform teams to improve reliability, performance, and scalability
  • Monitor production systems and proactively identify reliability, performance, and availability risks
  • Implement Site Reliability Engineering best practices including observability, incident management, capacity planning, and resiliency engineering
  • Troubleshoot complex issues across networking, cloud infrastructure, load balancing, and application environments
  • Support hybrid infrastructure environments spanning on-premises datacenters and public cloud platforms
  • Participate in on-call rotation and provide production support for critical business applications
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Qargo is a cloud-based (SaaS) Transport Management Platform. We are a scale-up b...
Location
Location
Belgium , Ghent
Salary
Salary:
Not provided
qargo.com Logo
Qargo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience as a Software Engineer, with an interest in infrastructure, scalability, reliability
  • Strong programming skills (preferably Python or similar backend languages)
  • Experience working with cloud platforms, container orchestrators, serverless (preferably Google Cloud)
  • Familiarity with distributed systems and scalability challenges
  • Experience with CI/CD pipelines and automation
  • Solid understanding of databases and performance tuning (SQL and/or NoSQL)
  • Familiarity with monitoring and observability tools
  • A problem-solving mindset and the ability to think in systems
  • Strong collaboration skills and a proactive approach to improving systems
Job Responsibility
Job Responsibility
  • Build and maintain systems and tooling that improve the reliability, scalability, and performance of our platform
  • Improve software delivery cycle, focusing on automation and developer experience
  • Develop internal tools and services to reduce manual operational work
  • Improve observability by implementing monitoring, logging, and alerting across systems
  • Optimize system performance, including databases such as PostgreSQL and Firestore
  • Collaborate with backend engineers and other engineering teams to design reliable and scalable system architectures
  • Troubleshoot complex production issues and implement long-term fixes
  • Continuously improve infrastructure (Infrastructure as Code, automation, etc.)
What we offer
What we offer
  • A fast-growing SaaS company with a strong mission and an impact-driven team
  • A flexible work environment with flexible hours and hybrid working
  • A green office with a great atmosphere and lots of initiatives
  • A role with a lot of responsibility, ownership, and tangible impact
  • The opportunity to grow with us and shape both your career and our platform
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Site Reliability Engineer (SRE) to support reliable, high-p...
Location
Location
United States , Novi
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Information Technology, Computer Science, Computer Engineering, or comparable practical experience
  • At least 5 years of experience supporting production environments in a corporate, startup, or similarly fast-paced technical setting
  • Hands-on expertise with infrastructure as code, including Terraform, along with experience in cloud platforms and related services
  • Working knowledge of container technologies such as Docker and orchestration platforms like Kubernetes
  • Experience supporting live systems, participating in on-call rotations, and contributing to incident reviews and corrective actions
  • Proficiency with automation and scripting using Bash and Python to reduce manual operational effort
  • Strong communication skills with the ability to explain technical decisions and tradeoffs to cross-functional or non-technical stakeholders
  • Willingness and ability to travel to customer or plant locations as business needs require
Job Responsibility
Job Responsibility
  • Maintain dependable and secure production environments across plant-edge and cloud-based systems, with a focus on uptime, responsiveness, and operational stability
  • Design, refine, and support monitoring dashboards, alerting frameworks, and operational runbooks using tools such as Prometheus, Grafana, and modern telemetry solutions
  • Build and manage infrastructure through code using Terraform, applying version control standards, peer reviews, and controlled deployment processes
  • Create automation scripts and lightweight tools in Bash and Python to streamline routine operations, recovery procedures, backup workflows, and environment setup
  • Take part in incident response and on-call coverage, troubleshoot service disruptions, coordinate initial communication, and document follow-up actions through blameless reviews
  • Establish and measure service reliability indicators and objectives, helping stakeholders balance system dependability with release speed and operational risk
  • Support secure connectivity between factory networks and cloud resources by configuring and maintaining VPNs, routing, private networking, and access controls
  • Administer and optimize relational or time-series databases, including backup planning, replication, performance tuning, and long-term storage health
  • Contribute to CI/CD delivery practices by improving deployment pipelines, supporting controlled release strategies, and preparing rollback procedures when needed
  • Partner with controls, software, and data teams to enable reliable data flow from industrial systems and ensure safe deployment to edge infrastructure
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Site Reliability Engineer

As a Site Reliability Engineer, you are passionate about experience innovation a...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
  • 2+ years in DevOps, SRE, or Support Engineering roles
  • Experience with incident management in high-traffic, public-facing platforms
  • Strong scripting skills (Python, Bash, or PowerShell)
  • Familiarity with CI/CD tools: GitHub Actions, Azure DevOps, GitLab, Jenkins
  • Experience with monitoring/APM tools: Datadog, New Relic, Dynatrace, Prometheus, Grafana
  • Basic knowledge of serverless services in AWS, Azure, or GCP
  • Proficiency with Docker and containerized environments
  • Excellent English communication skills (B2+ level)
  • Experience working in international, cross-cultural teams
Job Responsibility
Job Responsibility
  • Maintain and improve observability systems (monitoring, logging, alerting)
  • Define, adjust, and maintain Service Level Objectives (SLOs)
  • Participate in incident resolution and on-call rotations (max 1 week/month)
  • Drive proactive reliability improvements across platforms
  • Collaborate with teams to analyze failure scenarios and implement mitigations
  • Create and maintain runbooks for incident response and prevention
  • Eliminate non-value-adding tasks through automation and process optimization
What we offer
What we offer
  • Flexibility, with hybrid work options (country-dependent)
  • Learning and development, with access to cutting-edge tools, training and industry experts
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

NetApp is looking for a Senior TechOps Engineer - Cassandra to join our growing ...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in Apache Cassandra administration and architecture, with a desire to continuously learn and develop to an expert level
  • Experience in diagnosing and recommending mitigation strategies for Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modeling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries
  • Hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, schema disagreement resolution, and configuration management
  • Experience handling Cassandra maintenance activities, including upgrades and migrations
  • Ability to investigate and research Cassandra issues by reviewing the Apache Cassandra codebase
  • Strong knowledge and experience with Linux, with the ability to work comfortably from the command line
  • Exceptional ability to communicate clearly and professionally in written and verbal English
  • Experience working with at least one public cloud platform, preferably AWS
  • Prior IT customer service or support experience within an ITIL-based environment
  • Strong fundamental computer science and software engineering skills, particularly in operating system internals, memory management, and networking
Job Responsibility
Job Responsibility
  • Your work will ensure the security, reliability, and performance of world-class systems and databases
  • You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups
What we offer
What we offer
  • Volunteer time off
  • Well-being
  • Time away
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As Site Reliability Engineer you will contribute to the overarching implementati...
Location
Location
Romania , Bucuresti
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • Minimum 5 years proven work experience as a Reliability Engineer or similar role
  • Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with Docker / Kubernetes in combination with Google Kubernetes Engine (GKE), Terraform or similar technology
  • Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm
  • Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud Operations, Lightstep and Ansible
  • Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures and experience with API Management with Apigee or WSO2
  • Proactive attitude and collaborative Team player mindset paired with self confidence
  • Not losing your coolness and keep your eye for details even in stressful situations where time matters
  • Having a creative approach towards solving technical problems
  • Excellent communication skills in English
Job Responsibility
Job Responsibility
  • Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability
  • Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on
  • Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint
  • Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry mechanisms, etc.
  • Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best practices
  • Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE chapter function
  • Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population
  • Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high standards
  • Lead people of your engineer profession and make them become better each day
What we offer
What we offer
  • Smooth integration and a supportive mentor
  • Pick your working style: choose from Remote, Hybrid or Office work opportunities
  • Our projects have different working hours to suit your needs
  • Sponsored certifications, trainings and top e-learning platforms
  • Private Health Insurance – custom-made for you
  • Individual coaching sessions or accredited Coaching School
  • Epic parties or themed events – lovingly designed for our people and their families
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

Build the tools and systems that make M365 sovereign cloud operations faster, sm...
Location
Location
United States , Multiple Locations
Salary
Salary:
102100.00 - 219200.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Passionate about distributed systems and working with highly scalable services
  • Enjoys new technological challenges and is motivated to solve them
  • Excited about making better software and continuously improving the development, integration, and deployment processes
  • Self-starter who thrives in a bottoms-up, fast-paced, highly technical environment
  • Effective collaborator, experienced in creating technical partnerships across teams
  • Committed to ensuring exceptional customer satisfaction through technical excellence
  • Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI)
Job Responsibility
Job Responsibility
  • Creates and implements code for a product, service, or feature, reusing code as applicable with minimal supervision
  • Acts as a designated responsible individual (DRI), working on-call to monitor a system/product feature/service for degradation, downtime, or interruptions
  • Maintains operations of live site service, following security best practices when responding quickly to mitigate issues while using the minimum required permissions to do so that arise on a rotational, on-call basis
  • Contributes to identifying dependencies, and incorporates them into the development of design documents for a product area with little oversight
  • Contributes to the identification of requirements for, and development of automation within production and deployment of a complex product feature, targeting zero-touch deployment when possible
  • Works with appropriate internal stakeholders to understand and determine customer/user requirements for a set of features
  • Remains current in skills by investing time and effort into being informed of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
What we offer
What we offer
  • Certain roles may be eligible for benefits and other compensation
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

We are looking for a Site Reliability Engineer to support the stability, perform...
Location
Location
United States , New York
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related discipline, or equivalent practical experience in infrastructure or operations
  • Working knowledge of Linux and/or Windows server administration fundamentals
  • Understanding of core networking principles such as TCP/IP, DNS, VLANs, routing, and firewall concepts
  • Experience with at least one scripting or automation language such as Python, Bash, or PowerShell
  • Familiarity with cloud infrastructure concepts in at least one major platform, such as Azure or AWS
  • Exposure to automation and configuration tools such as Terraform or Ansible
  • Strong analytical thinking, troubleshooting ability, and a willingness to learn in a fast-moving technical environment
  • Clear written and verbal communication skills with the ability to document operational procedures effectively
Job Responsibility
Job Responsibility
  • Oversee the health of production platforms through monitoring tools, assist with incident response, and help refine alerts, dashboards, and issue tracking processes
  • Support day-to-day operations for infrastructure spanning on-premises facilities and cloud environments, including servers, storage, network components, and middleware services
  • Contribute to the administration of multi-cloud resources across platforms such as Azure and Amazon EC2, with involvement in compute, networking, storage, and identity-related tasks
  • Build and enhance automation solutions using Infrastructure as Code practices to streamline repeatable work and improve platform consistency
  • Participate in DevSecOps and GitOps processes by assisting with CI/CD workflows, configuration management, and policy adherence
  • Help strengthen cloud security by identifying configuration gaps, assisting with remediation efforts, and supporting vulnerability reduction initiatives
  • Join the on-call rotation, respond to operational events, and contribute to post-incident reviews focused on continuous improvement
  • Create and maintain runbooks, technical procedures, and system documentation to improve operational readiness and knowledge sharing
  • Assist with containerized and orchestrated environments, including platforms that use Kubernetes, to support scalable application operations
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
  • Fulltime
Read More
Arrow Right