This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Are you a highly skilled Senior Data Infrastructure passionate about building high-performance data layers for the next generation of AI infrastructure? We are looking for an expert to join our AI-native team and own the observability data layer for a high-impact project, ensuring optimal GPU utilization and real-time monitoring of complex SLOs. The position is full-time, based in Greece, operating under a remote-first / hybrid work model.
Job Responsibility:
Design, deploy, and maintain production-grade time-series databases (e.g., InfluxDB, TimescaleDB, VictoriaMetrics, or Thanos) to handle massive data ingestion
Own the Prometheus ecosystem end-to-end, architecting cross-region consolidation and automating provisioning via Infrastructure-as-Code
Lead Grafana dashboard engineering, creating real-time feeds for health scores, SLO burn-rate queries, and alerting pipelines
Architect and manage automated backups, restores, failover, and disaster-recovery runbooks using Terraform and Ansible
Perform deep query-performance analysis, including cardinality management, indexing optimization, and down-sampling strategies
Work at the intersection of infrastructure and product to ensure seamless metrics collection and storage across all regions
Requirements:
Degree in Computer Science or other relevant disciplines
At least 6+ years of professional experience administering time-series databases in production at scale
Expert-level proficiency in PromQL and a solid understanding of system design, concurrency, and data consistency
Hands-on mastery of at least two of the following: Prometheus, VictoriaMetrics, InfluxDB, TimescaleDB, or ClickHouse
Proven experience scaling Prometheus and tuning Thanos, Cortex, or Mimir with cloud object-store backends
Advanced optimization skills, including flame-graph profiling, read-path parallelism tuning, and series-churn analysis
Deep experience with Kubernetes, Consul, and Ansible
Familiarity with the full Observability stack (OpenTelemetry, Loki, Tempo/Jaeger)
Knowledge of Infrastructure-as-Code for managing complex deployment pipelines
Experience with cross-tenant query isolation and block compaction scheduling