This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Senior Data Engineer will be responsible for the architecture, design, development, and maintenance of our data platforms, with a strong focus on leveraging Python and PySpark for data processing and transformation. This role requires a strong technical leader who can work independently and as part of a team, contributing to the overall data strategy and helping to drive data-driven decision-making across the organization.
Job Responsibility:
Design, develop, and optimize data architectures, pipelines, and data models to support various business needs, including analytics, reporting, and machine learning
Build, test, and deploy highly scalable and efficient ETL/ELT processes using Python and PySpark to ingest, transform, and load data from diverse sources into data warehouses and data lakes
Develop and optimize complex data transformations using PySpark
Implement best practices for data quality, data governance, and data security to ensure the integrity, reliability, and privacy of our data assets
Monitor, troubleshoot, and optimize data pipeline performance, ensuring data availability and timely delivery, particularly for PySpark jobs
Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters
Provide technical guidance, mentorship, and code reviews to junior data engineers, particularly in Python and PySpark best practices, fostering a culture of excellence and continuous improvement
Work closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver solutions that meet business objectives
Research and evaluate new data technologies, tools, and methodologies to enhance our data capabilities and stay ahead of industry trends
Create and maintain comprehensive documentation for data pipelines, data models, and data infrastructure
Requirements:
Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field
5+ years of professional experience in data engineering, with a strong emphasis on building and maintaining large-scale data systems
Extensive hands-on experience with Python for data engineering tasks
Proven experience with PySpark for big data processing and transformation
Proven experience with cloud data platforms (e.g., AWS Redshift, S3, EMR, Glue
Azure Data Lake, Databricks, Synapse
Google BigQuery, Dataflow)
Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra)
Extensive experience with distributed data processing frameworks, especially Apache Spark
Expert proficiency in Python is mandatory
Strong SQL mastery is essential
In-depth knowledge and hands-on experience with Apache Spark (PySpark) for data processing, including Spark SQL, Spark Streaming, and DataFrame API
In-depth knowledge of data warehousing concepts, dimensional modeling, and ETL/ELT processes
Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, particularly those supporting Spark/PySpark workloads
Proficient with Git and CI/CD pipelines
Excellent problem-solving and analytical abilities
Strong communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders
Ability to work effectively in a fast-paced, agile environment
Proactive and self-motivated with a strong sense of ownership
Nice to have:
Familiarity with Scala or Java is a plus
Familiarity with Docker and Kubernetes is a plus
Experience with real-time data streaming and processing using PySpark Structured Streaming
Knowledge of machine learning concepts and MLOps practices, especially integrating ML workflows with PySpark
Familiarity with data visualization tools (e.g., Tableau, Power BI)
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.