This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This role leads globally distributed DevOps/SRE teams across the US and India, with end-to-end accountability for workforce planning, team performance, and the hiring, development, and retention of a high-performing organization. It oversees the reliability, scalability, and cost efficiency of production and non-production environments across AWS and Azure, applying expertise in capacity planning, traffic management, and cloud optimization. Leading teams of 20+ engineers and contractors, the role drives platform delivery, technical and security enhancements, and multi-functional collaboration. Success is measured by platform reliability, timely delivery of capabilities, team growth, and the overall impact on organizational performance and customer experience.
Job Responsibility:
Lead and manage distributed DevOps/SRE teams (US and India) globally, ensuring effective workforce planning, shift and availability management, performance development, mentorship, and continuous skill growth aligned with organizational needs
Own the security and vulnerability management lifecycle, ensuring timely remediation, cloud posture hardening, secure configuration management, and alignment with enterprise security, governance, and risk controls
Lead implementation of observability platforms across monitoring, logging, tracing, and alerting
develop dashboards and insights to proactively identify failures, bottlenecks, and performance deviations
Define and implement continuous improvement practices across technical fields and organizational processes
Drive SRE frameworks, including SLA/SLI/SLO definitions, reliability measurement, error-budget policies, and adoption of standards that improve operational excellence
Provide end-to-end ownership of incident management, including response coordination, root-cause analysis (RCA), post-incident reviews, and implementation of corrective actions to strengthen system resilience
Oversee technical vendor relationships to incorporate feature and function requests into product releases
Drive and maintain the current and future technical roadmap in collaboration with design and architecture teams
Collaborate with product, architecture, quality, and security organizations to align technical priorities and delivery objectives
drive execution of a long-term platform engineering roadmap covering modernization, automation, migrations, and innovation initiatives
Recruit and hire qualified managers and team members to strengthen the platforms and the support model
Requirements:
Bachelor's Degree plus 7 years of related work experience OR a combination of education and experience deemed equivalent. Acceptable areas of study include Computer Science, Engineering, IT or equivalent experience. (Required)
7-10 years Relevant Product Management experience in an agile software product development environment. (Required)
2-4 years Experience in a leadership role. (Required)
7-10 years Technical Leadership: Strong command of cloud infrastructure (AWS & Azure), CI/CD systems, GitLab administration, IaC tools (Terraform/CloudFormation/Bicep), automation, and modern DevOps/SRE methodologies. (Preferred)
2-4 years Experience managing teams of 5 or more resources in direct reporting relationships in a Platform Management organization. (Preferred)
At least 18 years of age
Legally authorized to work in the United States
Strong understanding of Software Development Life Cycle (SDLC) and Agile methodologies
Experience delivering complex technology initiatives across engineering and operations
Expertise in vulnerability management, cloud security procedures, secure SDLC, compliance frameworks, and regulatory alignment
Knowledge of observability concepts including monitoring, logging, and alerting
Understanding of SLAs, SLOs, and service performance management
Ability to collaborate with multi-functional partners and influence technical decisions
Strong written and verbal communication skills with the ability to convey technical concepts clearly
Analytical skills to assess system performance, operational metrics, and improvement opportunities
Nice to have:
Cloud certifications (AWS or Azure)
Kubernetes or related containerization certifications