This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a Principal DevOps Engineer for the Cortex platform, you will serve as a technical pillar and visionary, architecting, scaling, and optimizing a massive-scale, multi-region GCP environment. In this high-impact role, you will leverage a deep, comprehensive DevOps and Site Reliability Engineering skillset to define the future of our cloud infrastructure. You will not just operate systems; you will set engineering standards, pioneer infrastructure-as-code paradigms, and drive organizational alignment across product engineering and specialized CI/CD groups. Your mission is to champion absolute system resilience, eliminate engineering friction at scale, and guarantee that our next-generation SecOps platform remains secure, cost-optimized, and highly available.
Job Responsibility
Architectural Leadership & IaC Strategy: Design and govern the global, multi-region cloud infrastructure strategy using Infrastructure as Code (IaC) principles (Terraform). Establish blueprints and modular architectures that scale seamlessly across the enterprise
Strategic CI/CD & Platform Engineering: Partner with engineering leadership and specialized pipeline teams to architect robust, secure, and self-healing continuous delivery workflows, radically reducing time-to-market for Cortex features
Cloud Economics & Optimization: Drive the long-term optimization strategy for our global GCP footprint. Architect for maximum performance, multi-zone reliability, and sophisticated cost-efficiency/FinOps models
Consultative Engineering & Influence: Serve as a trusted advisor to Product Engineering teams. Influence the core application architecture early in the lifecycle to ensure services are natively containerized, horizontally scalable, and highly observable
Systemic Reliability & Incident Governance: Act as a critical escalation point for complex, systemic outages. Lead post-mortems, identify systemic vulnerabilities, and design architectural mitigations to prevent recurring incidents across the entire platform
Advanced Tooling & R&D: Spearhead the creation of internal platforms, automated remediation frameworks, and intelligent auto-scaling solutions to eliminate manual operational toil
Technology Evangelism: Continually evaluate emerging technologies, paradigms, and open-source tools. Define the technical roadmap for the DevOps organization and mentor senior engineers across the team
Requirements
10+ years of experience in DevOps, Site Reliability Engineering, or Cloud Architecture, with a proven track record of owning large-scale, business-critical production environments
Cloud Infrastructure: Deep, authoritative expertise in Google Cloud Platform (GCP) or Amazon Web Services (AWS), including complex networking, IAM governance, and multi-region architectures
Container Orchestration: Expert-level mastery of Kubernetes (GKE/EKS) and the broader cloud-native ecosystem (Service Meshes, Ingress controllers, advanced scheduling)
Automation & Software Engineering: High proficiency in Python or Go, advanced Linux internals, and robust shell scripting
Expert-level experience designing maintainable, dry, and scalable Terraform modules
Enterprise CI/CD: Deep architectural understanding of enterprise-scale software delivery pipelines (e.g., GitLab CI, GitHub Actions, Jenkins) and GitOps methodologies (e.g., ArgoCD)
Observability & Telemetry: Proven experience designing comprehensive, platform-wide observability strategies using tools like Prometheus, Grafana, OpenTelemetry, and PagerDuty
Nice to have
Technical Leadership: Demonstrated ability to lead cross-functional initiatives, influence engineering directors, and align technical roadmaps across distributed global teams
Complex Problem Solving: Exceptional capacity for troubleshooting deeply complex, distributed systems, networking bottlenecks, and cloud infrastructure anomalies
Autonomy & Decision Making: A proven track record of operating with absolute autonomy, making high-stakes architectural decisions, and owning the long-term outcomes