This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Job Description: Develop and maintain data pipelines for efficient data extraction, transformation, and loading (ETL/ELT) processes, utilizing PySpark for distributed big data processing and Polars/Rust for maximal performance and memory safety in single-node bottlenecks. Design, implement, and own internal process improvements: automating manual processes, optimizing data delivery latency using high-speed language components, and re-designing infrastructure for greater scalability and cost efficiency. Work on the data pipeline operations to operate, maintain, and evolve our decoding pipelines, proposing improvements to automatize and industrialize all processes and ways of working with a focus on data quality and platform stability. Support the ramp-up and installations of data pipeline for future airline/aircraft deployment. Integrate high-performance Rust-compiled routines (e.g., UDFs) into Python and PySpark workflows to resolve critical performance issues.
Job Responsibility:
Develop and maintain data pipelines for efficient data extraction, transformation, and loading (ETL/ELT) processes
Design, implement, and own internal process improvements: automating manual processes, optimizing data delivery latency, and re-designing infrastructure for greater scalability and cost efficiency
Work on the data pipeline operations to operate, maintain, and evolve our decoding pipelines
Support the ramp-up and installations of data pipeline for future airline/aircraft deployment
Integrate high-performance Rust-compiled routines into Python and PySpark workflows to resolve critical performance issues
Requirements:
Engineer (or equivalent) degree in computer engineering or computer science
6-8 Years of Experience
Strong proficiency in Rust for developing memory-safe, highly concurrent, and low-latency data processing microservices or core pipeline components
Familiarity with the Cargo package manager
Mastery of Polars for high-speed, multi-threaded data manipulation on single machines
Deep understanding of the Lazy API, Apache Arrow columnar format, and query optimization techniques
Deep expertise in writing production-grade, modular, and reusable Python code
Mastery of PySpark internals
Expertise in diagnosing and resolving bottlenecks using the Spark UI
Deep understanding of Adaptive Query Execution (AQE), data skew mitigation, and optimizing shuffles
Experience with Delta Lake, Apache Hudi, or Apache Iceberg for building reliable, ACID-compliant Data Lakehouse architectures
Expert proficiency in analytical SQL and database optimization
Proven experience in designing, deploying, and maintaining complex, dependency-driven DAGs in production environments
Hands-on experience with Terraform or CloudFormation
Hands-on experience with core cloud data services and cost-management practices
Experience implementing robust data quality checks and integrating with Data Catalog/Lineage tools
Practical experience implementing least-privilege access, data encryption, and data masking/tokenization for sensitive data
Strong knowledge in building automated test, lint, and deployment pipelines
Ability to write comprehensive Unit Tests and Data Validation Tests