This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Site Reliability Engineer, you will focus on ensuring that the Prolific platform is resilient, scalable and highly performant for our customers. You’ll ensure stability and reliability across our platform and ensure our observability is at the right standard, and dive into incident remediation where needed in collaboration with service delivery and teams. You will work with cross-functional teams to embed SRE principles, upskill teams in key areas such as kubernetes and observability.
Job Responsibility:
Develop and maintain highly available infrastructure using modern infra-as-code techniques, with a focus on terragrunt and terraform
Manage and optimise Kubernetes clusters and their workloads with a focus on reliability and performance
Participate in incident response and remediation, working with relevant product teams and stakeholders to resolve production issues efficiently, including creating and maintaining runbooks
Review and optimise other areas of our tooling stack, such as CICD or release strategies
Foster a culture of continuous improvement, such as enhancing documentation and upskilling teams in cloud architecture and kubernetes
Improve observability and alerting systems across our application and infrastructure, ensuring proactive detection of system degradation
Collaborate with Engineering teams to foster an SRE culture, including contributing defining SLO’s, SLA’s and error budgets
Design and implement automation strategies to ensure managed services remain up-to-date, secure, and performant
Lead and support initiatives that automate processes to improve system efficiency, resilience and reduce toil
Organising, supporting and responding to on-call incidents
Requirements:
5+ years with Google Cloud Platform, GKE, and the Kubernetes ecosystem with experience with Terraform and Terragrunt
Strong programming skills in Python
Strong experience in observability principles and tooling
Experience in GitOps flows and platforms for Kubernetes, such as ArgoCD
Deep understanding of system architecture and scalability principles
Strong collaboration and communication skills to work with cross-functional teams
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.