Site Reliability Engineer III Job at AbsenceSoft

Job Description

We're looking for a senior Site Reliability Engineer to join our small, high-ownership SRE team. In this hands-on individual contributor role, you'll own the reliability, scalability, and security of AbsenceSoft's production infrastructure on AWS — supporting a B2B SaaS platform that processes sensitive employee leave data for enterprise customers. You'll work closely with infrastructure, application engineering, product leadership, and cross-functional partners in Security and Compliance, with a clear path to grow toward a Tech Lead opportunity as our team and platform continue to mature.

Job Responsibility

Architect, implement, and operate scalable, resilient, and secure AWS infrastructure
Lead infrastructure-as-code initiatives to ensure all environments are reproducible, auditable, and consistently configured
Design, maintain, and improve CI/CD pipelines using Jenkins and GitHub
Own the Datadog observability platform, including dashboards, monitors, alerting thresholds, and log management
Define and maintain SLOs, SLIs, and error budgets
Serve as a senior technical responder across the full incident lifecycle within a shared on-call rotation
Lead blameless postmortems
Refine, implement, and test disaster recovery plans to meet RTO/RPO objectives
Contribute to SOC 2 audit readiness with a focus on access controls, incident response, and risk mitigation
Mentor junior SREs through code reviews, incident pairing, and documentation

Requirements

5+ years of experience in SRE, DevOps, or a related engineering role
Advanced hands-on expertise in AWS production environments and core services including Lambda, ECS, S3, ALB, and GuardDuty
Strong proficiency in infrastructure-as-code tooling such as Terraform, CloudFormation, or CDK
Experience building and operating CI/CD pipelines using Jenkins and GitHub
Proficiency in Python, Go, or Bash for automation
Hands-on experience with Datadog or a comparable observability platform for monitoring, alerting, and log management
Demonstrated experience leading incident response in complex, distributed systems
Working knowledge of SLO/SLI frameworks, error budgets, and disaster recovery planning against defined RTO/RPO objectives
Familiarity with SOC 2 compliance frameworks and experience contributing to audit readiness, access controls, and security control evidence collection
A collaborative, ownership-driven mindset with strong communication skills
A passion for mentoring junior engineers
A commitment to reducing toil through automation and AI-assisted tooling

What we offer

Impact that matters
Flexibility and trust
Remote-first and results driven
Growth and development
Access to learning resources, leadership programs, and real opportunities to take on new challenges
Competitive rewards
Comprehensive benefits
Performance-based bonus program
Equity opportunities
Time for life
Flexible time off
Paid holidays
Flexible leave programs
Belonging and balance
Inclusive culture

AbsenceSoft - All Job Offers

Select Country

Site Reliability Engineer III

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?