This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We're looking for a senior Site Reliability Engineer to join our small, high-ownership SRE team. In this hands-on individual contributor role, you'll own the reliability, scalability, and security of AbsenceSoft's production infrastructure on AWS — supporting a B2B SaaS platform that processes sensitive employee leave data for enterprise customers. You'll work closely with infrastructure, application engineering, product leadership, and cross-functional partners in Security and Compliance, with a clear path to grow toward a Tech Lead opportunity as our team and platform continue to mature.
Job Responsibility:
Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
Define and maintain SLOs, SLIs, and error budgets
Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
Lead blameless postmortems
Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
Mentor junior SREs through code reviews, incident pairing, and documentation
Requirements:
5+ years of experience in SRE, DevOps, or a related engineering role
Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
Experience building and operating CI/CD pipelines using Jenkins and GitHub
Proficiency in Python, Go, or Bash for automation
Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
Demonstrated experience leading incident response in complex, distributed systems
Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
A collaborative, ownership-driven mindset with strong communication skills
A passion for mentoring junior engineers
A commitment to reducing toil through automation and AI-assisted tooling
What we offer:
Impact that matters
Flexibility and trust
Remote-first and results driven
Growth and development
Access to learning resources, leadership programs, and real opportunities to take on new challenges