This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Develop and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks
Support implementation and management of data platforms across AWS, Azure, or GCP
Work with big data technologies such as Databricks, Snowflake, and Apache Iceberg to process large datasets
Optimize Spark workloads through performance tuning, partitioning strategies, and cost-efficient processing
Contribute to Lakehouse architecture implementation, ensuring data quality, consistency, and reliability
Assist in implementing data federation solutions using Starburst/Trino across multiple data sources
Support data governance initiatives, including data quality checks, lineage tracking, metadata management, and adherence to enterprise standards
Ensure compliance with data security and governance policies, including access controls (RBAC) and auditability
Perform troubleshooting and performance optimization for data pipelines and queries
Support data modeling for analytics, reporting, and downstream consumption
Collaborate with data scientists and ML teams to enable pipelines for AI/ML and RAG-based use cases
Participate in Agile processes, including sprint planning, stand-ups, and retrospectives
Work with stakeholders to gather requirements and deliver data solutions aligned with business needs
Requirements:
4–7 years of experience in Data Engineering or related roles
Strong hands-on experience with Python, PySpark, and Spark SQL
Experience with Databricks (Delta Lake, performance tuning)
Working knowledge of Ab Initio (GDE, Co>Operating System, Conduct>It)
Experience with Snowflake for data warehousing and analytics
Familiarity with Starburst/Trino and data federation concepts
Exposure to Apache Iceberg or similar open table formats
Hands-on experience with at least one cloud platform (AWS, Azure, or GCP) and related data services (e.g., Glue, ADF, Dataflow, Redshift, Synapse, BigQuery)
Understanding of data governance frameworks, including data quality, lineage, cataloging, and security principles
Experience supporting data pipelines for analytics and machine learning workloads
Familiarity with Agile/Scrum methodologies
Strong problem-solving, collaboration, and communication skills
Nice to have:
Master’s degree is a plus but not required
What we offer:
medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays