This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced engineer with strong Linux and system-level expertise who can operate autonomously in complex production environments. You must be able to independently troubleshoot incidents, lead and support post-incident service recovery, and drive improvements to overall system stability, performance, and observability. We are looking for a hands-on Site Reliability Engineer (SRE) with a strong background in Linux infrastructure and third-party system operations. This role focuses on managing and optimizing large-scale environments (5,000+ hosts) running technologies like Kafka, Redis, and Kubernetes. The position does not involve application development but requires deep operational expertise and solid troubleshooting skills.
Job Responsibility:
Operate autonomously in complex production environments
Independently troubleshoot incidents
Lead and support post-incident service recovery
Drive improvements to overall system stability, performance, and observability
Manage and optimize large-scale environments (5,000+ hosts) running technologies like Kafka, Redis, and Kubernetes
On-call rotation: one week every 4–5 weeks (24x7 coverage)
Requirements:
5+ years of experience in Linux system administration or SRE roles