This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Platform Engineer focusing on Observability, you are responsible for the architecture, implementation, and management of our telemetry pipelines and monitoring ecosystem. You will operate as a shared service provider, ensuring that development teams have the self-service tools they need to maintain high-visibility into their applications. Your goal is to improve the Developer Experience (DevX) by reducing friction in the monitoring lifecycle, automating telemetry ingestion, and building resilient systems that mitigate tech debt.
Job Responsibility:
Observability as a Service: Build and manage centralized observability platforms (Elastic, Prometheus, Grafana) that serve as a shared resource for all internal development teams
Telemetry Pipeline Management: Design and optimize telemetry pipelines to ensure high-fidelity data collection, transformation, and routing using OpenTelemetry (OTel)
DevX Advocacy: Partner with software engineering teams to understand their pain points, providing consultation and tooling that makes "monitoring by default" easy and intuitive
Automation & Tooling: Write and edit code (Python, Go) to automate manual processes, reducing the operational burden on feature teams
Dashboarding & Alerting: Build sophisticated Grafana dashboards and alerting logic that provides actionable insights rather than noise
Documentation & Enablement: Create clear, concise technical documentation and "golden paths" to help developers self-serve their observability needs
CI/CD Integration: Integrate monitoring and reporting into deployment pipelines to ensure system health is validated during every release
Triage & Mitigation: Assist teams in triaging complex production issues by providing deep-link visibility and data-driven insights to prevent future regressions
Requirements:
Bachelor’s Degree: Engineering, Computer Science, or a related field (relevant work experience also considered)
3–4 years of experience in SRE, DevOps, or Platform Engineering roles
Deep expertise in OpenTelemetry, Elastic (ELK/ECK), Prometheus, and Grafana
Proficiency in Ansible and Terraform for managing cloud resources and configuration
Strong facility in Python or Go for building internal tools and automation
Experience deploying and operating applications in public cloud environments (AWS, Azure, or GCP)
Experience with Concourse, GitHub, and Artifactory
Strong UNIX/Linux background with a firm grasp of CLI utilities, network protocols (HTTP/TLS), and asynchronous messaging
A commitment to supporting developers as internal customers
Firm understanding of Agile, Scrum, and Kanban methodologies
A "win as a team" mentality
Nice to have:
Familiarity with Maven, Jenkins, or Nexus is a plus