This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re looking for a Site Reliability Engineer (SRE) to help advance MaintainX’s reliability, observability, and developer autonomy as we scale our platform. In this role, you’ll partner closely with product and platform engineering teams to improve the stability, resilience, and operational readiness of our services. You’ll work alongside teams to design for reliability from the start, establish clear ownership and standards, and build shared tooling that enables teams to operate their services with confidence. You’ll also contribute to company-wide initiatives that define how MaintainX approaches reliability engineering, including observability standards, incident response practices, and service health metrics, helping the organization adopt proven industry practices at scale. This role is well-suited for an engineer who enjoys working across teams, influencing technical direction through strong engineering practices, and turning reliability principles into practical, scalable systems.
Job Responsibility:
Assess service maturity and provide insights to development teams
Partner with development teams to implement observability best practices
Enable development teams to become autonomous with their service deployment, support, and infrastructure
Mentor developers on reliability practices, focusing on making them self-sufficient
Act as the bridge, ear and eyes of the Platform Division teams to drive tooling and practice adoption across development teams
Requirements:
Deep understanding of observability practices in a distributed system environment and how it influences system design and team behaviour
Practical experience with SRE concepts (SLOs, error budgets, incident management)
3–5+ years in software engineering, SRE, DevOps, or production engineering roles with experience operating production systems
Proficient in cloud-native platforms and infrastructure-as-code concepts and tools
Working knowledge of at least one programming language (TypeScript/Node.js is a plus)
Excellent communication and collaboration abilities across technical and non-technical teams
Ability to translate complex reliability concepts into actionable guidance
You enjoy enabling teams to succeed independently and measuring success by reduced dependency on you
Nice to have:
TypeScript/Node.js is a plus
What we offer:
Competitive salary and meaningful equity opportunities
Healthcare, dental, and vision coverage
401(k) / RRSP enrollment program
Take what you need PTO
A Work Culture where: You’ll work alongside folks across the globe that reflect the MaintainX values, Smart Humble Optimist
We believe in meritocracy, where ideas and effort are publicly celebrated