Staff Engineer, Site Reliability Job at LearnUpon (Dublin)

Job Description

LearnUpon is looking for a Staff Site Reliability Engineer to join our team in Ireland. This is a flex role, working 1 day per week from LearnUpon's Dublin office. LearnUpon LMS helps organizations train their employees, partners, and customers. Businesses can manage, track, and achieve their unique learning goals — all through a single, powerful solution. As a Staff Engineer in Site Reliability Engineering you will be part of the team responsible for the scale-out of the LearnUpon infrastructure.

Job Responsibility

Identifying opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions
Leading our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management
Driving the processes to maintain resilient, scalable and cost-effective infrastructure
Working with other Engineering teams to provide infrastructure solutions that meet their ongoing requirements
Building tools focused on measuring, monitoring and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability
Reacting quickly to changing customer and business needs
Participate in on-call rota
Mentoring junior talent

Requirements

7+ years of experience in a software or Ops role
5+ years of cloud engineering experience, with at least 2 years experience with AWS
Experience deploying Microservice environments, using containerisation technologies such as Kubernetes and Docker
Experience in designing and implementing Observability tech stacks
Have championed the benefits of Observability to Engineering teams
Can architect the design of SLO/SLI implementation that balances the needs of different teams
Familiar with cost analysis of Observability metrics gathering, Engineering effort, and tooling
Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security and disaster recovery
Experience with implementing IaaC (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
Able to effectively communicate technical ideas to and collaborate with both technical and non-technical peers

Nice to have

Certification in AWS, any PaaS, and/or related technologies
Experience with database scaling would be a strong plus

What we offer

Work in a fun and supportive environment with regular team events
Excellent career progression
Structured learning environment
Competitive salary and company ESOP
Private health insurance
26 days annual leave

LearnUpon - All Job Offers

Select Country

Staff Engineer, Site Reliability

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Staff Engineer, Site Reliability

Staff Engineer, Site Reliability Engineer

Staff Site Reliability Engineer - Incident Management & Reliability

Staff Site Reliability Engineer

Staff Site Reliability Engineer - Cloud

Senior Staff Site Reliability Engineer

Site Reliability Engineer Staff

Staff Site Reliability Engineer

Staff Site Reliability Engineer

Our AI answers in your language