This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Kafka Data Engineer role requires expertise in Kafka and hands-on experience with event-streaming platforms. Candidates should be proficient in AWS and Python, with responsibilities including data ingestion from APIs and building data lakes. Strong skills in Glue and Airflow are essential for success in this position.
Requirements:
Strong expertise in Kafka (4-5 years), with hands-on experience designing and operating large-scale, highly available event-streaming platforms, including partitioning strategies, consumer group optimization, schema management, and performance tuning
Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks
strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests
knows how to tune jobs for scale and cost
Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC, and follows solid RBAC and data governance practices