This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers; you are architecting the reliability, scalability, and security of a massive Kubernetes ecosystem in the Sovereign Cloud. We are looking for a visionary who balances deep systems expertise with a modern, AI-augmented development workflow. You will lead the evolution of our GKE (Google Kubernetes Engine) environment, championing GitOps best practices and integrating advanced security protocols directly into our delivery pipelines.
Job Responsibility
Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization
GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model
Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance
Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability
AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks
Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements
Participate in on-call rotations to support critical business and production systems
Requirements
7+ years of experience in Infrastructure, SRE, or DevOps roles
BS or MS in Computer Science, a related field, or equivalent professional experience
Kubernetes Mastery: Expert-level experience (6+ years) managing production K8s workloads (preferably within GKE, but will also consider EKS)
Deep understanding of Networking, Storage, and RBAC
CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins)
Programming: Proficient in Python for systems programming and automation
Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment
Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated, strong sense of ownership, urgency, and drive
Ready to understand and dissect new technology stacks quickly