This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Seeking a Senior Site Reliability Engineer to help establish and scale SRE capabilities within a cloud-based environment. This role will focus on building monitoring frameworks, improving system reliability, and driving operational excellence across applications and infrastructure. The ideal professional brings strong experience in Azure-based ecosystems, observability tools, and automation-driven reliability practices.
Job Responsibility:
Building and scaling SRE practices from the ground up
Enhancing monitoring, alerting, and observability frameworks
Improving system reliability, performance, and incident response
Driving automation and operational efficiency across cloud environments
Requirements:
7+ years of experience in a Senior SRE or similar reliability/production engineering role
Strong expertise with Dynatrace, including dashboard configuration, alerting, and code-level integration
Hands-on experience with Azure Application Insights (AppInsights) for monitoring, alerting, and log management
Experience working with Azure Metrics API for advanced monitoring and observability
Solid experience with AKS (Azure Kubernetes Service) and containerized environments
Strong background in IT operational analytics and data visualization (dashboards, metrics, reporting)
Experience with GitHub and Azure DevOps, including CI/CD pipeline integration for monitoring and deployments
Strong understanding of SRE principles (reliability, scalability, automation, incident reduction)
Experience integrating APIs to support monitoring and observability solutions
Proven ability to collaborate across engineering and operations teams with strong communication skills
Nice to have:
Exposure to or experience with Grafana
Experience in incident management and root cause analysis
Background in performance optimization and system tuning
Experience implementing security monitoring and compliance controls