This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Build and operate large-scale healthcare data pipelines across batch workflows, metadata-driven ingestion, and data service publishing. Own end-to-end engineering from source ingestion to conformed data products, with strong focus on reliability, data quality, and operational observability. Partner with analytics, business, and platform teams to deliver trusted datasets for sales, claims, activity, patient, and rare disease use cases.
Job Responsibility:
Design and maintain PySpark/SQL pipelines in Databricks for landing, unified, unstitched, and published data layers
Build and support Airflow DAGs for scheduling, dependencies, retries, and production operations
Implement metadata/config-driven frameworks for ingestion, transformation, and rule-based processing
Develop robust data quality controls, DQ summaries, failure handling, and alerting workflows
Manage batch/process audit logs, run status tracking, release flags, and operational reporting
Integrate multi-source data (files, APIs, cloud storage, and relational systems) into governed Delta/Spark tables
Optimize pipeline performance using partitioning, parallelization, and query tuning
Collaborate on schema evolution, business-rule onboarding, and production support
Requirements:
Bachelor’s degree in Computer Science, Information Technology, or a related field with 5-9 years of experience