This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Design and implement data processing systems using distributed frameworks like Hadoop, Spark, Snowflake, Airflow, or other similar technologies
Build data pipelines to ingest data from various sources such as databases, APIs, or streaming platforms
Integrate and transform data to ensure its compatibility with the target data model or format
Design and optimize data storage architectures, including data lakes, data warehouses, or distributed file systems
Implement techniques like partitioning, compression, or indexing to optimize data storage and retrieval
Identify and resolve bottlenecks, tuning queries, and implementing caching strategies to enhance data retrieval speed and overall system efficiency
Design and implement data models that support efficient data storage, retrieval, and analysis
Collaborate with data scientists and analysts to understand their requirements and provide them with well-structured and optimized data for analysis and modeling purposes
Utilize frameworks like Hadoop or Spark to perform distributed computing tasks, such as parallel processing, distributed data processing, or machine learning algorithms
Implement security measures to protect sensitive data and ensuring compliance with data privacy regulations
Establish data governance practices to maintain data integrity, quality, and consistency
Monitor system performance, identifying anomalies, and conducting root cause analysis to ensure smooth and uninterrupted data operations
Communicating complex technical concepts to non-technical stakeholders in a clear and concise manner
Stay updated with emerging technologies, tools, and techniques in the field of big data engineering
Requirements:
Strong analytical thinking and problem-solving skills
Strong communication skillset – ability to translate technical details to business/non-technical stakeholders
Extensive experience in designing and building data pipelines (ELT/ETL) for large-scale datasets
Proficiency in programming languages such as Python, R or Scala
In-depth knowledge and experience with distributed systems and technologies, including On-prem Platforms, Apache Hadoop, Spark, Hive or similar frameworks
Solid understanding of data processing techniques such as batch processing, real-time streaming, and data integration
Experience with Azure Data Services - Databricks and Data Factory
Experience with Git repository maintenance and DevOps concepts
Familiarity with building, testing, and deploying process
Additional certifications in big data technologies or cloud platforms are advantageous
Nice to have:
Familiarity with tools like Databricks, Apache Nifi, Apache Airflow, or Informatica is advantageous
Familiarity with cloud-based platforms like AWS, Azure, or Google Cloud is highly desirable
Experience with data analytics tools and frameworks like Apache Kafka, Apache Flink, or Apache Storm is a plus
What we offer:
Stable employment
“Office as an option” model
Flexibility regarding working hours and your preferred form of contract
Comprehensive online onboarding program with a “Buddy” from day 1
Cooperation with top-tier engineers and experts
Unlimited access to the Udemy learning platform from day 1
Certificate training programs
Upskilling support
Internal Gallup Certified Strengths Coach to support your growth
Grow as we grow as a company
A diverse, inclusive, and values-driven community
Autonomy to choose the way you work
Create our community together
Activities to support your well-being and health
Plenty of opportunities to donate to charities and support the environment