This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
SREs at Optimizely are focused on making us the most reliable, performant, and trustworthy Digital Experience Optimization platform ever. Our engineering teams have built data pipelines that process 10 billion events daily and applications that support powerful experimentation and collaboration workflows at scale. Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL, and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform, and Chef. We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical project. We are looking for a Senior Site Reliability Engineer to help build and scale our CloudOps capabilities. You will be responsible for designing, implementing, and operating critical infrastructure and platform services while collaborating closely with engineering, support, and product teams to improve the reliability, scalability, and performance of our systems. This is a hands-on technical role where you will be instrumental in shaping the SRE culture, driving automation, and ensuring high availability across all services.
Job Responsibility:
Champion a Site Reliability Engineering culture across the organization by sharing best practices, tools, documentation, and code
Identify and automate manual operational tasks using scripting, infrastructure-as-code, and CI/CD pipelines
Build and maintain observability (monitoring, logging, tracing) for all production systems to ensure reliability, availability, and performance
Proactively monitor alerts across all platforms and coordinate with SRE, Operations, Engineering, and Support teams to ensure quick detection and resolution of incidents—minimizing MTTA/MTTR
Lead and manage on-call rotations, driving a blameless incident management and postmortem culture
Collaborate with development teams to define and implement SLOs, SLIs, and error budgets
Ensure uptime SLAs are met through robust automation, testing, monitoring, and operational best practices
Create and maintain runbooks, playbooks, and system documentation to ensure operational readiness and knowledge sharing
Requirements:
Strong experience in Linux Systems Administration in cloud or virtualized environments
Proficiency in infrastructure-as-code tools such as Terraform
Hands-on experience with configuration management tools like Ansible or SaltStack
Skilled in scripting and automation using Python and Bash
Experience deploying and maintaining services in public cloud environments (Azure, AWS, or GCP)
Solid understanding of observability tooling, especially Datadog, ELK Stack (Elasticsearch, Logstash, Kibana), or similar
Experience building and maintaining CI/CD pipelines (e.g., GitHub Actions, Azure DevOps, Octopus)
Familiarity with Kubernetes and Docker
production experience is a strong plus
Experience operating and scaling distributed systems across multiple regions
Strong communication and collaboration skills
comfortable working across time zones
Passion for learning, continuous improvement, and a strong sense of ownership
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.