This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Senior Data Engineer will design, build, and optimize the batch data pipelines and algorithms that power this matching engine. This is a hands-on role focused on scale, performance, and accuracy, working with Spark, SQL, and modern orchestration tools. This is not a traditional analytics role — it is a deep data platform and algorithmic engineering at massive scale.
Job Responsibility:
Build Data Pipelines: Design and maintain robust, scalable ETL/ELT pipelines to ingest and process third-party and first-party datasets
Data Quality & Enrichment: Apply transformation, normalization, and enrichment rules to ensure data consistency and usability
Collaborate Across Teams: Work with product managers, data architects, and content experts from Coro and Helix to align data structure with business needs
Operationalize Matching & Merging Logic: Support the implementation of data matching and entity resolution processes using AI/ML tools and proprietary frameworks
Monitor & Troubleshoot Pipelines: Build alerts, logs, and metrics to ensure data flows remain healthy and issues are identified and resolved quickly
Documentation & Standards: Contribute to documentation, code quality standards, and internal best practices to ensure maintainability
Requirements:
5+ years of experience in Big Data, Data Platform, or ETL engineering roles
Proficient in SQL, Python, and experience with Spark(PySpark or Scala), Airflow, Snowflake, and Azure Data Lake or similar technologies
Familiarity with Azure (preferred) or other major cloud platforms
Proven experience designing and operating large-scale batch data pipelines
Solid understanding of distributed systems and algorithms (partitioning, shuffles, joins, scalability trade-offs)
Proactive, detail-oriented, and eager to take ownership of projects and continuously improve systems
Comfortable working in a cross-functional environment and open to learning from and supporting teammates
Nice to have:
Experience with, or strong interest in, fuzzy and semantic matching techniques
Exposure to ML-assisted data pipelines
Familiarity with search or retrieval systems (e.g., Elasticsearch, OpenSearch, vector databases)
What we offer:
Join a fast-growing, high-impact team
Contribute to an ambitious effort to create the highest quality, most comprehensive business directory in the world
Be part of a startup-style group within the company that’s redefining how they deliver consulting through productization and data innovation
Work with cutting-edge data tools, including AI/ML enrichment, semantic matching, and modern cloud-based infrastructure