This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Data Engineer - Security role focuses on designing and operating large-scale event-streaming platforms using Kafka. The ideal candidate will have strong expertise in data ingestion, AWS data lakes, and proficiency in Python and PySpark. This position offers the opportunity to work in a collaborative environment with a focus on innovation and client success.
Job Responsibility:
Designing and operating large-scale event-streaming platforms using Kafka
API-first data ingestion
Building/operating S3-based lakes
Designing and optimizing Glue jobs using PySpark/DynamicFrames
Writing clean, parameterized, idempotent DAGs
Building ELT models in Snowflake
Requirements:
Kafka-Strong expertise in Kafka (4-5 years), with hands-on experience designing and operating large-scale, highly available event-streaming platforms, including partitioning strategies, consumer group optimization, schema management, and performance tuning
API-first data ingestion. Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks
strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
AWS data lake, end to end. Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
AWS Glue + PySpark expert. Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests
knows how to tune jobs for scale and cost
Airflow orchestration. Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Snowflake proficiency. Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC, and follows solid RBAC and data governance practices