This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers. You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
Job Responsibility:
Observability Implementation Implement and maintain metrics, logs, and traces for applications and infrastructure
Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
Configure dashboards, alerts, and basic anomaly detection Application Support Instrumentation
Work with development teams to enable structured logging, basic distributed tracing, and core metrics
Validate observability requirements during Production Readiness Reviews (PRR)
Troubleshoot missing or low-quality telemetry
Monitoring Alerting Configure alerts based on golden signals (latency, errors, traffic, saturation)
Help reduce alert noise by tuning thresholds and alert logic
Support incident response by gathering logs, metrics, and traces
Operations Reliability Support root cause analysis using observability tools
Maintain dashboards and documentation used by on-call and support teams
Participate in on-call rotations (as applicable)
Automation Continuous Improvement Assist in automating observability onboarding and validation tasks
Create and maintain reusable dashboards and alert templates
Follow established observability standards and best practices
Requirements:
24 years of experience in Observability, or SRE
Working knowledge of metrics, logs, and basic tracing concepts
Hands-on experience with at least one observability platform (Dynatrace, Elastic ELK, Datadog, New Relic, etc.)
Basic understanding of SLIs SLOs and service health indicators
Experience with cloud platforms or hybrid environments
Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting
Nice to have:
Experience with Open Telemetry or APM agents
Familiarity with Kubernetes or containerized workloads
Experience working with incident management tools (PagerDuty, ServiceNow)
Exposure to Dynatrace Kibana ELK or similar cloud-native monitoring
Experience in regulated or enterprise environments