Site Reliability Engineer Job at Edpuzzle (Barcelona)

Job Description

We’re looking for a passionate Site Reliability Engineer to pioneer our SRE strategies of our Security and Infrastructure Team in Barcelona. The right person will help us create the best possible product for teachers and empower them to engage their students with videos. If you’re a self-starter who’s eager to contribute to the education sector, you’ll feel right at home with us. As the key reference point for all things SRE, you'll have the autonomy to shape our systems from the ground up. This role is perfect for someone ready to lead and innovate, making a significant impact on our cloud infrastructure and observability strategies using Datadog. You’ll be responsible for ensuring our system’s reliability, scalability, and maintainability, handling everything from our cloud infrastructure to in-depth observability and comprehensive monitoring. By working closely with our DevOps and Engineering teams, you’ll drive the design and implementation of resilient systems, manage incidents effectively, and champion best practices for observability and incident response.

Job Responsibility

Work with the Product, Infrastructure and Engineering teams to find the best technical solutions by participating in discussions and sharing your opinions
Take ownership of the problems that are being worked on, understanding why they are needed by the users, carrying out your own research, making your own proposals and working on the implementation while relying on your teammates for help when needed
Communicate effectively in a team in order to maximize productivity, ownership, and focus to help projects reach the finish line with the best possible outcome and by the project deadline
Design a cloud infrastructure that is secure, scalable, and highly available on AWS
Engage in proactive monitoring and observability with comprehensive tools and practices that not only detect and warn, but also predict potential system issues before they affect our users
Lead the charge in root cause analysis for production and infrastructure issues, transforming challenges into learning opportunities
Provision, configure and maintain cloud infrastructure as code
Perform rotatory on-call service, ensuring reliability and uptime for our users
Write technical documentation, contributing to our technical knowledge base and empowering your peers
Perform other exciting duties as opportunities and needs arise.

Requirements

At least 3 years of experience in Site Reliability Engineering, DevOps Engineering, System Administration or Cloud Infrastructure Engineering for a web-based product with a focus on observability and reliability
Good knowledge of Amazon Web Services (AWS), CloudWatch and Datadog
Experience with software release management and deployment pipelines (Git, CI/CD)
Experience with Infrastructure as Code using AWS CDK
Experience writing JavaScript, TypeScript or Node.js code
Pragmatic with technologies: you understand tech is a tool to solve a product problem, tech is never the end goal
Excellent ability to communicate your ideas, regardless of the audience
Product-oriented: You make all your technology decisions with the final user in mind
You are naturally drawn towards understanding the bigger picture and recognize when there's a need for improvement, applying your intentional and rational thought process to address complex issues
You are able to work independently, plan and exercise conscious control of time spent on specific goals to reach deadlines effectively, and you don’t hesitate to pursue a goal despite the difficulties, all while maintaining a flexible mindset
You are based in Barcelona and have a work permit to work in Spain.

Nice to have

Experience with MongoDB or OpenSearch database administration
Experience deploying and maintaining complex cloud infrastructures serving high traffic web applications
Experience with complex backend architectures such as Hexagonal Architecture and Domain Driven Design (DDD)
Experience with other cloud providers such as Azure or Google Cloud Platform
... or another amazing skill you bring to the table that we haven’t thought of yet!

What we offer

On-call compensation
24 days’ paid holidays plus December 24th and 31st
Flexible working hours and reduced working time on Fridays to support work-life balance
€2000 annual allowance for meals with Cobee
Private health insurance policy with AXA
Access to Wellhub to support physical and emotional well-being
Flexible remuneration for childcare
Flexible remuneration for public transport
Flexible remuneration for health insurance of immediate family members (spouse and/or children)
Training and development (CodelyTV, Cloud Academy, etc.)
Fully stocked pantry with a variety of snacks and drinks in the Barcelona office
Team-building events during working hours to connect, learn, and create lasting bonds with passionate colleagues

Edpuzzle - All Job Offers

Select Country

Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?