Site Reliability Engineer Emplois

Description du poste

As our Site Reliability Engineer you are responsible for implementing and maintaining scalable infrastructure and systems that ensure the reliability, performance, and security of our production environments. This hands-on position bridges the gap between development and operations, applying software engineering principles to infrastructure and operational challenges. This role involves close collaboration with Development teams, Security teams, and other stakeholders to build and maintain robust systems, implement automation, and support operational excellence through SLOs (Service Level Objectives) and observability. Additionally, you will contribute to incident management, capacity planning, and implementing infrastructure as code practices across the organization. You will report to the Platform Engineering Manager and you are integrated within the Platform Team.

Responsabilités

Implement and maintain scalable infrastructure and systems
Ensure reliability, performance, and security of production environments
Bridge the gap between development and operations
Apply software engineering principles to infrastructure and operational challenges
Collaborate with Development teams, Security teams, and other stakeholders
Build and maintain robust systems
Implement automation
Support operational excellence through SLOs and observability
Contribute to incident management
Contribute to capacity planning
Implement infrastructure as code practices
Report to the Platform Engineering Manager
Integrated within the Platform Team
Technical Leadership & System Design: Collaborate with Development teams on infrastructure architecture, deployment strategies, and operational requirements
Design and implement monitoring, alerting, and observability solutions
Contribute to infrastructure as code initiatives and maintain deployment automation pipelines
Implement security best practices in context and maintain compliance requirements
Design and maintain disaster recovery and backup strategies
Operational Excellence & Process Implementation: Contribute to incident response efforts and drive resolution of technical issues
Develop and maintain runbooks and documentation for operational procedures
Ensure proper logging and monitoring across all systems
Increase automation initiatives to reduce manual operations
Maintain and improve SRE practices across the organization
Cross-team Collaboration & Knowledge Sharing: Work with development teams to implement operational readiness requirements
Collaborate with Security teams on infrastructure security measures
Provide technical mentorship to developers on operational practices
Lead knowledge sharing sessions and documentation efforts
Partner with Engineering Managers to improve development workflows and tools

Exigences

At least 4 years of infrastructure/systems engineering experience
Strong hands-on technical focus
Comfortable building and maintaining large-scale distributed systems
Comfortable managing incident response according to SLA
Comfortable implementing automation and self-healing systems
Comfortable developing utility scripts and functions
Fluent in French and English
Strong problem-solving skills
Reliability-focused
Excellent communication skills
Experience with tech stack (Ruby, Elixir, React.js) is a significant advantage

Souhaitable

Experience with Ruby, Elixir, React.js

Ce que nous offrons

Semaine de 4 jours
Plan de développement professionnel
Congés pour enfant malade
Solution de prévention santé mentale
Employee Resource Groups (ERG)

Beamy - Toutes les offres d'emploi

Sélectionner un pays

Site Reliability Engineer

Description du poste

Responsabilités

Exigences

Souhaitable

Ce que nous offrons

Looking for more opportunities?