This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a talented and experienced Big Data Hadoop Developer to join our growing data engineering team. The ideal candidate will have 4-6 years of hands-on experience designing, developing, and optimizing big data solutions using the Hadoop ecosystem, with a strong focus on Apache Spark. You will be responsible for building and maintaining scalable data pipelines, processing large datasets, and collaborating with data scientists and analysts to deliver insights.
Job Responsibility:
Design, develop, and maintain robust and scalable ETL processes and data pipelines using Apache Hadoop and Apache Spark
Write efficient, clear, and well-documented code primarily in Scala, Python, or PySpark for big data processing
Implement data ingestion, transformation, and loading routines from various sources into Hadoop Distributed File System (HDFS) and other big data stores
Optimize existing Spark jobs and Hadoop ecosystem components for performance and scalability
Collaborate with data architects, data scientists, and other stakeholders to understand data requirements and translate them into technical solutions
Ensure data quality, integrity, and security across all big data platforms
Participate in code reviews, testing, and deployment of big data applications
Troubleshoot and resolve issues in big data environments
Stay up-to-date with the latest trends and technologies in the big data ecosystem
Requirements:
Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field
3-4 years of professional experience in Big Data development
Proven experience with the Hadoop ecosystem, including HDFS, YARN, Hive, and other related technologies
Hands on experience in SQL and shell scripting
Strong expertise in Apache Spark for data processing and analysis
Proficiency in at least one of the following programming languages: Scala, Python, or PySpark
Experience with building and optimizing large-scale data pipelines
Familiarity with data warehousing concepts and ETL methodologies
Solid understanding of distributed computing principles
Excellent problem-solving skills and attention to detail
Ability to work independently and as part of a collaborative team
Nice to have:
Experience with cloud-based big data services (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc)
Experience with Databricks platform
Knowledge of other big data tools like Kafka, HBase, Flink, or Presto
Experience with SQL and NoSQL databases
Familiarity with CI/CD practices and tools (e.g., Git, Jenkins)
Understanding of machine learning concepts and how they apply to big data