This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly skilled and experienced Senior Bigdata/PySpark Engineer to join our dynamic Big Data Analytics team. The ideal candidate will have a strong background in Python programming and extensive experience with Apache Spark, particularly PySpark, for large-scale data processing and analytics. This role involves designing, developing, and optimizing robust and scalable data pipelines, working with vast datasets, and contributing to the architecture of our Big Data solutions.
Job Responsibility
Design, develop, and maintain efficient, scalable, and reliable data pipelines using PySpark
Implement complex data transformations, aggregations, and data quality checks on large datasets
Collaborate with multiple stakeholders (technology and business) to understand data requirements and translate them into technical specifications
Optimize PySpark jobs for performance, efficiency, and cost-effectiveness
Develop and maintain documentation for data pipelines, data models, and data processing logic
Participate in code reviews, ensuring code quality, best practices, and adherence to established standards
Troubleshoot and resolve issues in existing data pipelines and data processing jobs
Stay up-to-date with the latest advancements in PySpark, Apache Spark, and the broader Big Data ecosystem
Mentor junior developers and contribute to the continuous improvement of the team's technical capabilities and processes
Requirements
8-12 years of relevant experience
Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field
5+ years of professional experience in software development with a focus on Big Data technologies
5+ years of hands-on experience specifically with PySpark for large-scale data processing
Strong proficiency in Python programming, including object-oriented design and data manipulation libraries (e.g., Pandas, NumPy)
In-depth understanding of Apache Spark architecture, including Spark Core, Spark SQL, Spark Streaming, and DataFrame API
Experience with various data storage technologies such as HDFS, S3, Azure Blob Storage, or similar distributed file systems
Solid understanding of relational databases and SQL
Experience with version control systems (e.g., Git)
Excellent problem-solving, analytical, and communication skills
Nice to have
Experience with cloud platforms (AWS, Azure, GCP) and their Big Data services (e.g., EMR, Databricks, Glue, Azure Synapse, Google Dataproc)
Familiarity with workflow orchestration tools (e.g., Apache Airflow, Luigi)
Experience with streaming data processing (e.g., Kafka, Spark Streaming)
Knowledge of data warehousing concepts and data modeling techniques
Experience with containerization technologies (e.g., Docker, Kubernetes)
Understanding of data governance, data security, and compliance best practices