This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Data Engineer to be responsible for building and maintaining the infrastructure that supports the organization’s data architecture. The role involves creating and managing data pipelines using Airflow for data extraction, processing, and loading, ensuring their maintenance, monitoring, and stability. The engineer will work closely with data analysts and end-users to provide accessible and reliable data.
Job Responsibility:
Responsible for maintaining the infrastructure that supports the current data architecture
Responsible for creating data pipelines in Airflow for data extracting, processing, and loading
Responsible for data pipelines maintenance, monitoring, and stability
Responsible for providing data access to data analysts and end-users
Responsible for DevOps infrastructure
Responsible for deploying Airflow dags to production environment using DevOps tools
Responsible for code and query optimization
Responsible for code review
Responsible for improving the current data architecture and DevOps processes
Responsible for delivering data in useful and appealing ways to users
Responsible for performing and documenting analysis, review, and study on specified regulatory topics
Responsible for understanding business change and requirement needs, assess the impact and the cost
Requirements:
Advanced Python(Mandatory)
Experience in creating APIs in Python - At least Flask (Mandatory)
Experience in documenting and testing in Python (Mandatory)
Advanced SQL skills and relational database management (Oracle is Mandatory, SQL server is desirable, PostgreSQL is desirable)
Experience with Data Warehouses
Hadoop ecosystem - HDFS + Yarn (Mandatory)
Spark Environment Architecture (Mandatory)
Advanced PySpark - (Mandatory)
Experience in creating and maintaining distributed environments using Hadoop and Spark
Data Lakes - Experience in organizing and maintaining data lakes (Mandatory) - S3 is preferred
Experience with Parquet file format is mandatory, Avro is a plus
Apache Airflow - Experience in both pipeline development and deploying Airflow in distributed environment (Mandatory)
Containerization - Docker is Mandatory
Kubernetes (Mandatory)
Apache Kafka (Mandatory)
Experience in automating applications deployment using DevOps tools - Jenkins is Mandatory, Ansible is a plus
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.