This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly skilled and experienced Senior PySpark Data Engineer to join our dynamic data engineering team. The ideal candidate will have a strong background in building and managing large-scale data processing systems and a proven track record of working with cutting-edge Big Data technologies. You will be responsible for designing, developing, and maintaining our data pipelines, ensuring they are efficient, reliable, and scalable to meet our growing business needs.
Job Responsibility:
Design, develop, and maintain robust, scalable, and high-performance data pipelines using PySpark
Develop, schedule, and monitor complex data workflows using orchestration tools like Apache Airflow
Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions
Optimize and tune Spark jobs for performance and efficiency
Implement data quality checks and ensure data integrity across all data pipelines
Design and implement data models for optimal storage and retrieval
Mentor junior data engineers and promote best practices in data engineering
Ensure compliance with data governance and security policies
Troubleshoot and resolve data-related issues in a timely manner
Requirements:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field
6+ years of professional experience in a data engineering role
Extensive hands-on experience with PySpark and advanced Python programming skills
Proven experience with Big Data ecosystems, including Cloudera and/or DataBricks
Hands-on experience with distributed query engines like Starburst (Trino/Presto)
Proficient in designing and managing complex workflows using scheduling tools, particularly Apache Airflow
Strong expertise in SQL and experience with relational and non-relational databases
Solid understanding of data warehousing concepts, ETL/ELT processes, and data modeling techniques