Strong hands-on experience with Apache Spark and Delta Lake, and strong programming skills in Python and SQL. Proven experience building batch and streaming data pipelines and production-grade data platforms, with solid understanding of data modeling, data quality, and governance principles. Experience with one or more major cloud platforms, with preference for Microsoft Azure / Fabric, as well as AWS or GCP. Familiarity with modern data platforms such as Databricks and Snowflake is expected. Experience with lakehouse architectures and distributed data systems, and strong understanding of scalability, reliability, and performance considerations in data pipelines. Strong problem-solving skills focused on scalability and reliability, with a collaborative approach to working in cross-functional teams.
Experience with GenAI and AI data systems (e.g., RAG pipelines, vector databases, LLM data preparation), as well as CI/CD for data pipelines and infrastructure-as-code tools such as Terraform, ARM, or CloudFormation. Additional exposure to streaming technologies (e.g., Kafka), Spark optimization, or advanced analytics and ML workloads (including causal or experimentation platforms) is valuable. Experience building data products or large-scale analytics platforms is also beneficial.
Search for other job offers that match your skills and interests.
8 matching positions








Create a free account or sign in to open the application page for this job.
100% free for job seekers