This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced System Engineer with strong hands-on expertise in building and managing real-time and batch data processing systems. The ideal candidate must have deep specialization in at least one distributed data technology—Apache Flink, Apache Kafka, Spark Structured Streaming, or Dremio—along with solid working knowledge of the others. This role requires strong experience in distributed systems, event-driven architectures, streaming data pipelines, and observability of production-grade data platforms.
Job Responsibility:
Design, build, and optimize real-time streaming pipelines and batch workloads
Ensure data correctness, reliability, and processing guarantees (at-least-once / exactly‑once)
Develop stateful stream processing solutions including joins, aggregations, windowing, and CDC pipelines
Build and operate scalable, low‑latency event-driven architectures using technologies like Flink, Kafka, Pulsar, or Spark
Design and support semantic layers, distributed SQL engines, and query acceleration using Dremio
System Integration
Data Lakes (Iceberg, Delta Lake)
OLAP / Analytical Query Engines
Downstream applications, APIs, and data consumers
Manage schema evolution and compatibility across producers and consumers (Schema Registry, formats, versions)
Monitor, troubleshoot, and performance‑tune distributed data processing jobs in production environments
Implement observability standards across streaming platforms and distributed components
Requirements:
Expert-level knowledge in at least one of the following: Apache Flink, Apache Kafka, Spark Structured Streaming, or Dremio (Trino/Presto/Drill)
Strong hands-on experience in real-time and batch data processing systems
Solid understanding of distributed systems, scalability, and fault-tolerant architectures
Experience building and operating event-driven architectures
Proficiency in managing schema evolution and compatibility across streaming ecosystems
Hands-on experience integrating data pipelines with data lakes, OLAP engines, and downstream applications
Strong skills in troubleshooting, performance tuning, and optimizing distributed data processing jobs
Knowledge of observability practices across streaming and distributed systems
Strong analytical, problem‑solving, and communication skills
Nice to have:
Experience with Pulsar, Redpanda, or other distributed messaging systems
Knowledge of Kubernetes, containerized deployments, and orchestration
Exposure to cloud platforms (AWS, Azure, GCP) and their managed streaming services
Familiarity with CI/CD pipelines and DevOps practices
Hands-on experience with data governance, lineage, and cataloging tools
Knowledge of SQL performance tuning and BI acceleration mechanisms
Understanding of Data Mesh or distributed data ownership models