This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We’re seeking a hands-on Data Engineer to partner with Business Operations in building a reliable, scalable data foundation. You’ll centralize operational data across core systems, develop automated pipelines and models that power self-serve analytics, and replace manual reporting with scheduled, high-trust data products. You’ll also lay the groundwork for advanced use cases such as forecasting, decision support, and natural-language access to trusted data—accelerating better, faster decisions across the organization.
Job Responsibility:
Stand up a scalable Databricks lakehouse to ingest, model, and serve business operations data (finance, resourcing, project delivery, CRM, marketing, and time tracking)
Design and maintain automated ELT/ETL pipelines that move data from SaaS tools, databases, and files into bronze/silver/gold layers
Build the core semantic layer (cleaned, conformed, documented tables) that powers self-serve BI and executive dashboards
Replace legacy/manual engagement and utilization reports with scheduled, monitored jobs and SLAs
Partner with Business Operations, Finance, and People Operations leaders to define source-of-truth metrics (e.g., revenue, margin, utilization, velocity, pipeline, engagement health)
Lay groundwork for AI use cases (RAG over operational data, agentic processes, querying company data) by implementing robust lineage, metadata, and access controls
Architecture & Modeling: Design lakehouse architecture, dimensional/medallion models, and data contracts across systems
Pipeline Automation: Implement CI/CD for data (branching, PRs, jobs, environments), with observability and reproducibility
Data Governance: Enforce PII/PHI handling, role-based access, auditability, and retention aligned to healthcare-adjacent standards
Enablement: Document datasets, publish a data catalog, and enable self-serve usage via BI and SQL
Reporting Modernization: Decommission manual spreadsheets and one-off extracts
consolidate to certified, scheduled outputs
AI Readiness: Capture lineage/metadata and vector-friendly document stores to support future ML and RAG initiatives
Requirements:
2+ years in data engineering or analytics engineering, including building production data pipelines at scale
Expert with Databricks (Delta Lake, SQL, PySpark) and cloud data platforms (AWS or Azure)
Proficient with dbt and/or Delta Live Tables
strong SQL and data modeling fundamentals
Experience orchestrating jobs (Airflow, Databricks Workflows, or equivalent)
Comfortable with PowerBI and semantic modeling for self-serve analytics
Familiarity with data governance (RBAC/ABAC, secrets management, token-based auth) and healthcare-adjacent compliance (e.g., HIPAA concepts) is a plus
Strong stakeholder skills. Can translate business needs into reliable data products and clear SLAs
Databricks, Delta Lake, PySpark, SQL, dbt, REST/GraphQL APIs, Git/GitHub, Power BI/Tableau/Looker
Nice to have:
Familiarity with data governance (RBAC/ABAC, secrets management, token-based auth) and healthcare-adjacent compliance (e.g., HIPAA concepts)