This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Prisma Access team is seeking a seasoned Principal Site Reliability Engineer to serve as a Technical Lead and Catalyst for our Sovereign Cloud cloud operations. In this role, you will design intelligent automation to manage production infrastructure at scale, ensure high uptime for our security SaaS products, and manage massive traffic across multi-cloud environments. You will be the authority on enforcing security policies to protect sensitive data while maintaining system performance. As a Senior Principal Engineer, you will set technical direction, mentor engineers, collaborate closely with product and leadership teams, and deliver robust solutions that scale with the business.
Job Responsibility
Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments for our sovereign customers
Lead cross-functional initiatives to ensure applications are production-ready, scalable, secure, and resilient
Develop expertise in new technologies, embracing continuous learning and the adoption of AI tools
Develop tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles
Automate robust deployments and orchestrate end-to-end monitoring and alerting solutions
Participate in on-call rotations to support critical business and production systems
Lead root cause analysis of critical issues, driving improvements and preventing recurrence
Champion the success of SRE and DevOps initiatives, aligning technical decisions with business goals
Requirements
10+ years of experience in Infrastructure, SRE, or DevOps roles
BS or MS in Computer Science, a related field, or equivalent professional experience
7+ years of experience with GCP, and expertise in their architecture, services and PKI concepts for cloud security
Expert troubleshooting skills to resolve cloud infrastructure and service issues, effectively identifying root cause and devising effective solutions
Proficiency in automation using Python and shell scripting
Expertise in Infrastructure as Code (IaC) with Terraform and Helm, leveraging AI tools for development
Solid experience with Kubernetes, container networking, and container workloads
Strong Linux administration skills
Proficiency with CI/CD pipelines, GitOps principles, and tooling like GitLab and Jenkins
Excellent written and verbal communication skills, with the ability to collaborate effectively to drive outcomes
Self-disciplined, self-managed, and highly driven with a strong sense of ownership and urgency
Ready to understand and dissect new technology stacks quickly
Ability to adapt quickly to evolving cloud technologies, security threats, and advancements through continuous learning
Effectively address customer needs and provide clear Root Cause Analysis (RCA) to customers