Senior Site Reliability Engineer Job at Zuora (Chennai)

Job Description

We're hiring a Senior Site Reliability Engineer to lead reliability strategy and drive AI-powered automation at scale. This role involves owning complex systems, shaping architecture, and influencing cross-functional teams.

Job Responsibility

Define and evolve SLOs, SLIs, and resilience patterns
Build AI-driven automation for detection, remediation, and forecasting
Lead cloud infrastructure and Kubernetes platforms
Drive incident response and operational excellence
Mentor engineers and influence org-wide reliability practices

Requirements

8+ years of hands-on experience in Site Reliability Engineering, DevOps, or large-scale production operations.
Advanced expertise in AWS, including architecture design across services such as EC2, EKS, VPC, IAM, RDS, S3, and CloudWatch.
Deep experience with Infrastructure-as-Code using Terraform, including complex modules, state management, and governance.
Strong programming and automation skills using Python and Shell
experience building production-grade automation systems.
Expert-level Linux systems knowledge, including performance tuning, security hardening, and deep troubleshooting.
Proven experience operating distributed systems and data streaming platforms such as Kafka in high-throughput environments.
Demonstrated ability to work independently on complex, ambiguous problems with broad organizational impact.
Proven technical leadership experience driving large, cross-team reliability or infrastructure initiatives, including setting technical direction, influencing design decisions, and mentoring engineers to deliver measurable outcomes at scale.
Practical experience designing or implementing AI/ML-driven automation in operations, reliability, or platform engineering.
Experience integrating AI capabilities into monitoring, alerting, incident response, or workflow automation systems.
Strong understanding of how AI can be safely and effectively applied in production environments.

Nice to have

Experience with advanced observability platforms (Prometheus, Grafana, ELK, or similar) enhanced with AI-driven insights.
Familiarity with predictive analytics, anomaly detection, or AIOps platforms.
Experience influencing architectural decisions at a platform or product level.
Prior experience operating in a 24/7, global, high-availability SaaS environment.

What we offer

Competitive compensation, variable bonus and performance-based reward opportunities, and retirement programs
Medical, dental, and vision insurance
Generous, flexible time off, plus paid holidays, wellness days, and a company-wide year-end break
Paid parental leave (including fully paid leave for eligible ZEOs, subject to local policy)
Learning & development stipend to support ongoing growth
Opportunities to volunteer and give back, including charitable donation matching where available
Mental wellbeing resources and support

Zuora - All Job Offers

Select Country

Senior Site Reliability Engineer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Our AI answers in your language