This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking an experienced Platform Engineer who specialises in Observability, primarily focused around the open-source Grafana observability stack. In this role, you will be instrumental in managing the lifecycle of our observability platform, ensuring robust monitoring, logging, tracing and profiling for our applications running on Kubernetes. You will contribute to the architecture, implementation, and continuous improvement of our observability pipeline, enabling teams to monitor and optimise system performance efficiently.
Job Responsibility:
Implement OpenTelemetry within application codebases and managing Otel tooling and services
Architect, implement, and manage an observability stack based on Grafana, Prometheus, Loki, Mimir, Tempo, and other related technologies within a Kubernetes environment
Ensure comprehensive monitoring, logging, and tracing coverage for microservices and Kubernetes clusters
Collaborate with development and platform teams to create meaningful dashboards, alerts, and automated incident responses
Continuously improve the observability platform for scalability, multi-tenancy, and reliability
Support and mentor teams in adopting best practices for instrumentation and monitoring
Implement automation and infrastructure-as-code practices for managing observability infrastructure using Terraform, Helm, and CI/CD pipelines
Integrate observability tooling with other cloud services and on-premise infrastructure as required
Ensure security and compliance standards are met, focusing on auditability and data integrity within the observability stack
Requirements:
Experience working with Kubernetes, particularly in managing observability for containerised applications
Deep knowledge of the open-source Grafana stack, including Mimir, Loki, Tempo, and Beyla
Experience building and managing observability pipelines in a cloud environment (AWS, GCP, or Azure)
Experience utilising SaaS-based observability platforms such as New Relic
Strong automation skills and experience with IaC tools such as Terraform and Helm
Proficient in scripting and programming languages such as Node, Python, Go, or Shell
A customer-first mentality, with strong problem-solving and troubleshooting skills
Experience supporting development teams with production monitoring and root cause analysis
Nice to have:
AWS, Azure, or GCP certifications are highly regarded
What we offer:
18 weeks paid parental leave with no distinction between primary and secondary carers
Access to 'Employee Exclusives' program - a way of getting closer to our incredible brands, offering unique experiences, behind-the-scenes access, and awesome perks
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.