This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Site Reliability Engineer on the SASE Platform team, you will play a critical role in building and operating highly available, secure, and globally distributed services. Your mission is to ensure our cloud-native security and networking platform is reliable, scalable, and performant from day one, protecting the users, applications, and data for the world's largest enterprises as they adopt cloud, remote work, and AI.
Job Responsibility:
Proactively collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages
Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance
Build and operate automation for provisioning, deploying, and managing global infrastructure using Infrastructure as Code (IaC)
Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments
Drive observability best practices, including metrics, logs, traces, and SLIs/SLOs to enable data-driven incident analysis
Participate in on-call rotations, reducing mean time to resolution (MTTR) through automation and proactive reliability improvements
Challenge existing processes by championing reliability, security, and operational maturity across the organization
Requirements:
5+ years of experience working with Unix/Linux systems, including shell, tools, networking, and kernel concepts
2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms
Proven experience operating workloads in public cloud environments (e.g., AWS, GCP, Azure) at scale
Proficiency in building automation and tools in at least one scripting or programming language (e.g., Python, Go, Java)
Strong experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible
Bachelor’s degree in Engineering, Computer Science, or a related technical field, or equivalent practical experience
Nice to have:
Deep expertise in designing and operating monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, ELK Stack)
Advanced networking expertise, including TCP/IP, DNS, BGP, routing, and cloud networking concepts relevant to SASE architectures
Prior experience operating or supporting SASE, SD-WAN, Zero Trust, or network security platforms
Familiarity with using AI/LLM technologies to improve operational workflows (e.g., incident analysis, automation)