This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a Data Engineer to join our staff data engineer and form a new squad. This role is for an engineer with experience in data pipelines for event tracking in the streaming layer. You will work on our data streaming platform which captures events from different clients into our Data Warehouse. You will own the full data lifecycle for real-time event tracking, ensuring scalability, reliability, and cost-effectiveness. While basic components are built, a large rollout of event tracking across different teams is required, presenting significant data validation, data modelling and scaling challenges ahead. You will work closely with analysts, engineers and product managers to enable accurate reporting in new product features and business KPIs. These events also fuel our AI feature stores and models. Our data stack is written in Python and hosted in AWS. You will support frontend and backend engineers to instrument event tracking so those can be collected across the product domains.
Job Responsibility:
Build and operate a scalable data platform ingesting real-time events with a high-throughput rate
Collaborate with analysts, engineers and product managers to understand user needs and take ownership of producing new event tracking functionality
Implement automations, high data quality controls, ensure data integrity in the ingested data, and create necessary monitoring alerts
Improve existing data pipelines architecture and workflows in collaboration with stakeholders
Requirements:
Hands-on experience in a cloud data platform for event streaming such as Kinesis, Pub-Sub or Kafka
Experience deploying and maintaining Python services in a major cloud environment (AWS, GCP, Azure)
Specific experience with the AWS stack, including Kinesis, Lambdas, Glue, Firehose, and Athena, is highly desirable
Great skills improving and operating Schema Registries and Data Catalogs (Glue, Databricks, etc.)
Relevant CI/CD experience (e.g., GitHub Actions, Gitlab Pipelines) automating tests and updates in schema registries and lambda releases
Solid experience with SQL and data warehousing, data lake environments (e.g. DataBricks, BigQuery, Redshift, S3)
Knowledge and practical experience with data modelling, proto or AVRO schemas, and managing schema evolution
Familiarity with cloud observability tools (e.g. Datadog, NewRelic, or Cloudwatch)
Nice to have:
Production experience with containerization (Docker), orchestration (Kubernetes, AWS ECS/Fargate, etc.) and IaC (Terraform, Crossplane)
Large scale data processing with PySpark and Flink