This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Lead Observability Engineer to join the team, and be able to work within ET (Eastern Time) or 14:00–22:00 / 15:00–23:00 CET. The ideal candidate has proven expertise in designing, operating, and scaling analytical data systems, specifically ClickHouse or similar distributed databases. In this role, you will take a hands-on leadership position in architecting and migrating our existing custom Cosmos telemetry storage system to a robust, high-performing ClickHouse-based solution. You will also play a key role in building the foundation for alerting, notification, and telemetry workflows, enabling full visibility into production systems and improving system observability at scale.
Job Responsibility:
Lead the migration and transformation of telemetry storage from custom Cosmos DB solutions to ClickHouse, building a scalable and reliable end-to-end observability platform
Architect, implement, and maintain alerting and notification systems integrated with ClickHouse for critical services and applications
Develop, deploy, and operate high-throughput telemetry pipelines, ensuring accurate and actionable monitoring across cloud environments
Collaborate with engineering and product teams to define and champion observability best practices
Work with DevOps and development teams to automate collection, ingestion, and retention policies for logs, metrics, and traces
Drive continuous improvement in system performance, stability, and reliability through effective observability
Participate in on-call rotations, incident response, and root cause analysis to enhance monitoring and alerting capabilities.
Requirements:
5+ years of engineering experience in cloud observability platforms, infrastructure, and telemetry systems
Deep experience in alerting, notifications, and monitoring at scale
Advanced expertise with ClickHouse, or similar high-performance analytical databases, for telemetry storage and querying
Hands-on experience migrating telemetry/storage solutions (preferably from Cosmos DB to ClickHouse or equivalent)
Solid understanding of telemetry pipelines, cloud-native monitoring, and best practices
Experience with dashboarding and visualization tools (Grafana, Kibana, or similar)
Strong scripting and automation skills (Python, Bash, Terraform or equivalent)
Proven collaboration and communication skills across cross-functional teams.
What we offer:
Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing