This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a foundational Site Reliability Engineer to join our Device Insurance Technology team as we build a new internal engineering capability. This role is a unique opportunity to help establish our DevOps and SRE practices from the ground up within a modern, cloud-native environment. As the first dedicated SRE on the team, you will play a critical role in designing, building, and owning CI/CD pipelines, deployment processes, and production observability systems. You will work closely with development teams, architects, and external partners to transition operational ownership from legacy systems and enable scalable, reliable service delivery. This is a hands-on, high-impact engineering role where you will balance building foundational systems with supporting live production environments. You will help shape the future operating model, drive automation, and influence how reliability is implemented across the platform.
Job Responsibility:
Develop, configure, and support CI/CD pipelines
Automate build, test, and deployment workflows to enable safe and repeatable releases
Integrate automated quality checks, code scanning, and deployment validations into pipelines
Support containerized deployments using Docker and Kubernetes
Use Infrastructure-as-Code (IaC) tools like Helm to manage cloud infrastructure
Participate in automated provisioning of environments and system configurations
Embed monitoring and alerting into delivery pipelines
Support debugging of build, deployment, and environment issues across Dev/Test/Prod systems
Automate processes to enhance system reliability and resilience
Minimize operational incidents through proactive monitoring and maintenance
Develop scripts, tools and automation to reduce manual efforts in operational tasks
Manage incident response to ensure rapid recovery and minimal disruption
Help build and maintain dashboards, alerts, and logs that provide visibility into system health and application behavior
Use tools such as Prometheus, Grafana, Splunk, or OpenTelemetry to monitor services and infrastructure
Analyze system performance data to guide optimizations and proactively detect issues
Adapt to new technologies to maintain and enhance system robustness
Contribute to documentation, runbooks, playbooks, and operational readiness reviews
Requirements:
4+ years of experience in DevOps and SRE role
Experience in developing and maintaining CI/CD pipelines for software deployment
Experience with Gitlab pipelines and helm
4+ years - Implementing and managing cloud-native platforms and solutions
Hands-on experience with containerization (Docker, Kubernetes)
4+ years Hands-on experience with monitoring/logging tools such as Splunk, Grafana, OpenTelemetry and incident management
4+ years - Guiding and mentoring teams in reliability engineering practices
Understanding of web protocols, how full stack applications operate and data flows
Basic knowledge of at least one major cloud platform (AWS preferred)
Strong communication skills and ability to work under pressure
Bachelor's Degree plus 3 years of related work experience OR advanced degree with 1 year of related work experience OR combination of education and experience deemed equivalent
Acceptable areas of study include Computer Science, Engineering or related field
At least 18 years of age
Legally authorized to work in the United States
Nice to have:
Experience integrating DevSecOps tools like code scanning, policy enforcement or container image validation
Understanding of blue/green, canary or rolling deployment strategies
Exposure to artifact management, secrets management or GitOps workflows
Exposure to incident management frameworks including alerting, escalation and postmortem practices
Understanding of Agile methodologies to improve and streamline processes
Ability to analyze system performance data to identify trends and improvement opportunities
Capability to drive innovation in system management and operations through new technologies and approaches
Ability to adapt to new technologies and changes in the digital landscape to maintain system robustness
Experience using generative AI tools (e.g., Claude, GitHub Copilot) for development support and task acceleration
AWS Certified DevOps Engineer
Certified Kubernetes Administrator
Google Cloud Certified - Professional DevOps Engineer