This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a DevOps Engineer, you won't just "maintain" infrastructure, you will own it. You will lead the charge in cloud security automation, using an AI-first mindset to drive extreme efficiency. Your mission is to identify, rectify, and prevent misconfigurations and bottlenecks in advance. You are expected to operate with high autonomy, shaping the future of secure cloud environments by being a proactive force, not a reactive one.
Job Responsibility
Innovate & Implement: Design and implement cloud infrastructure solutions with a focus on GCP, including compute reservations, BigQuery, Pub/Sub, GCS, and networking
Release Engineering: Lead weekly production upgrade cycles across global multi-region environments, including branch-out processes, hotfix management, version gating, and rollback procedures
Service Deployment & Lifecycle: Own end-to-end service deployments on Kubernetes - from Helm chart creation and Flux/GitOps configuration to production rollout and scaling
Database Administration: Manage and optimize database infrastructure including MySQL, Redis, BigQuery, Neo4j, Scylla, MongoDB, and PostgreSQL in production environments
AI Integration: Utilize AI-supporting tools to optimize coding, automate repetitive tasks, and solve complex architectural puzzles. Contribute to AI-native infrastructure such as Vertex AI and AI Gateway services
Tenant & Customer Infrastructure: Manage customer-specific infrastructure including dedicated compute reservations, tenant provisioning, licensing configuration, and feature flag management across multi-tenant and single-tenant environments
Infrastructure Automation & Tooling: Develop internal CLI tools and automation scripts (Python) to streamline operations
Cost Optimization: Drive cloud cost optimization through resource right-sizing, reservation management, database disk reduction, and efficiency improvements
Service Reliability: Enhance uptime by establishing SLAs, setting up comprehensive monitoring (Prometheus, Grafana, Stackdriver), and participating in the production on-call rotation (including off-hours support)
Requirements
4+ years in DevOps/SRE with a focus on multi-region production environments
AI-First Workflow: Must be proficient in using LLM-based agents (Cursor, Claude Code, etc.) for coding and architecture
Cloud & Containers: Deep expertise in GCP (GKE), including Kubernetes orchestration (HPA, Node Pools) and Terraform for IaC
Automation: Strong Python/Bash skills and experience with GitOps workflows (Flux/ArgoCD) and CI/CD (GitLab/Jenkins)
Data & Infrastructure: Experience managing production databases (SQL/NoSQL) and standard Linux/Networking troubleshooting
Nice to have
Experience with AI-native infrastructure (Vertex AI, AI Gateways)
Observability stacks (Prometheus/Grafana) and managing Multi-tenant SaaS platforms