This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Develop, test, and maintain data pipelines using Databricks, PySpark, and Python
Ingest, transform, and process structured and semi-structured data from multiple sources
Support the development of scalable ETL/ELT workflows for analytics, reporting, and machine learning use cases
Work with data engineers, analysts, and data scientists to understand data requirements and deliver reliable datasets
Perform data cleansing, validation, and quality checks to ensure accuracy and consistency
Optimize Spark jobs and Databricks notebooks for performance, reliability, and cost efficiency
Create and maintain documentation for data pipelines, workflows, data definitions, and processes
Assist in troubleshooting pipeline failures, data issues, and performance bottlenecks
Follow best practices for version control, code quality, testing, and deployment
Support basic AI/ML data preparation activities, including feature engineering, dataset creation, and model input preparation
Monitor scheduled jobs and workflows to ensure timely and successful data delivery
Collaborate with cross-functional teams in an Agile or iterative development environment
Requirements:
2-6 years of experience with Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience
Hands-on experience with Python for data processing, scripting, and automation
Strong working knowledge of PySpark and distributed data processing concepts
Proven hands-on experience using Databricks for data engineering, including notebooks, clusters, jobs, workflows, Delta tables, and performance optimization
Ability to build, maintain, and troubleshoot scalable ETL/ELT pipelines in Databricks
Experience working with Delta Lake and lakehouse architecture concepts
Working knowledge of SQL for querying, transforming, and validating data
Ability to work with structured and semi-structured data formats such as CSV, JSON, Parquet, and Delta
Understanding of data engineering concepts such as ETL/ELT, data pipelines, data lakes, data warehouses, batch processing, and data quality
Basic understanding of AI and machine learning concepts, including features, training datasets, model inputs/outputs, and model evaluation basics
Experience supporting data preparation or feature engineering for AI/ML use cases
Familiarity with cloud-based data platforms, preferably AWS, Azure, or GCP
Understanding of Git or other version control tools
Strong analytical, problem-solving, and troubleshooting skills
Good communication skills and ability to work collaboratively with technical and non-technical stakeholders
Willingness to learn new tools, technologies, and data engineering best practices
Nice to have:
Exposure to Delta Lake, Unity Catalog, or Lakehouse architecture
Experience with workflow orchestration tools or Databricks Jobs
Familiarity with CI/CD practices for data engineering projects
Exposure to machine learning workflows using MLflow, scikit-learn, or similar tools
Experience with Tableau, Power BI, or similar data visualization tools to create dashboards, support reporting needs, validate datasets, and perform exploratory analysis
Understanding of data governance, security, and access control concepts