This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Our client is building a unified Master Data Management platform that consolidates data from many applications into a single source of truth. On top of this platform an AI layer is being built - either as a separate layer over the unified data, or embedded directly into the ETL processes to deliver clean, connected, and enriched data. The project domain is Occupational Health & Safety, Incident Management, Risk Management, and global regulatory frameworks. The platform must provide trustworthy, connected, and explainable data for incident management, risk assessment, and compliance with global regulatory requirements. This is a hands-on architecture role. The person will own the platform design and also build reference pipelines and data models, additionally define AI/RAG patterns together with the engineering team.
Job Responsibility
Own the end-to-end architecture of a Databricks-based MDM platform for occupational health, safety, incident, risk, and regulatory data
Design ingestion and transformation patterns using Databricks, Spark, PySpark, SQL, Delta Lake, Unity Catalog, and Lakeflow where appropriate
Define canonical data models, golden-record logic, entity-resolution rules, and survivorship strategies across heterogeneous source systems
Build a semantic layer that provides consistent definitions for incidents, organizations, locations, hazards, controls, risks, regulations, corrective actions, and compliance metrics
Design graph-based relationship models for linking entities across systems and enriching downstream analytics and AI use cases
Architect AI/RAG capabilities for semantic search, regulatory lookup, incident enrichment, data validation, and source-grounded answers over governed enterprise data
Embed data quality, lineage, governance, access control, auditability, and monitoring into the platform from the start
Partner with product, engineering, compliance, and analytics teams to convert domain requirements into scalable architecture and implementation patterns
Requirements
Strong production experience with Databricks Lakehouse architecture, including Spark, PySpark, SQL, Delta Lake, Unity Catalog, and workflow orchestration
Hands-on experience designing and building ETL/ELT pipelines for batch and incremental ingestion, cleansing, normalization, deduplication, and enrichment
Practical experience with MDM: golden records, survivorship/merge rules, trust ranking, identity resolution, duplicate detection, SCD, and exception workflows
Strong data modeling skills for analytical, operational, and semantic consumption patterns
Experience designing a semantic layer with shared business definitions, governed metrics, reusable dimensions, and consistent entity definitions
Experience with data quality and observability: pipeline SLAs, schema drift, CDC, data contracts, dead-letter handling, and source-to-master reconciliation
Experience implementing data governance and security: Unity Catalog lineage, RBAC/ABAC, row/column-level security, PII handling, and regulatory traceability
Ability to translate business requirements from product, compliance, and engineering stakeholders into scalable data architecture
Nice to have
Experience with Databricks Lakeflow Connect, Lakeflow Spark Declarative Pipelines, and Lakeflow Jobs
Experience with Unity Catalog metric views or comparable semantic-layer technologies
Experience with knowledge graphs, graph analytics (e.g. GraphFrames), or graph-based entity resolution - linking people, organizations, locations, incidents, hazards, controls, regulations, assets, and corrective actions
Experience building AI/RAG solutions over enterprise data using AI Search / Vector Search, embeddings, metadata filtering, retrieval evaluation, and source-grounded generation with citations
Experience with ML-based data enrichment, classification, anomaly detection, or entity matching
Experience in regulated domains such as occupational health and safety, incident management, risk, compliance, ESG, insurance, healthcare, or industrial operations
What we offer
Projects for such clients as PayPal, Wargaming, Xerox, Philips, Adidas and Toyota
Competitive compensation that depends on your qualification and skills
Career development system with clear skill qualifications
Flexible working hours aligned to your schedule
Options to work remotely
Corporate medical insurance covering services of private and public medical centers
English courses online
Corporate parties and events for employees and their children
Internal conferences, workshops and meetups for learning and experience sharing
Gym membership compensation
5 days of paid sick leave per year with no obligation to submit a sick-leave certificate