This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for a strong individual contributor who excels in the Python data ecosystem and enjoys building reliable, scalable data pipelines. This role sits within a data engineering group responsible for integrating large volumes of data from external partners and transforming it into usable datasets for internal teams. You’ll work with modern cloud tools while also helping our team gradually transition away from a legacy platform. This position is ideal for someone who wants to stay hands-on, focus on technical execution, and remain in an IC role for the next several years. We’re not looking for someone who is aiming to move immediately into architecture or leadership. This team is fully distributed, and although candidates in the Boston area can go into the office, the rest of the group is remote. Anyone local may occasionally sit with other teams when on site.
Job Responsibility:
Build and maintain ETL pipelines that ingest, clean, and aggregate data received from external vendors and large enterprise partners
Develop Python‑based data processing workflows deployed on AWS cloud services
Work with tools such as AWS Glue, Airflow, dbt, and PySpark to support data transformations and pipeline orchestration
Help modernize existing workflows and assist in the gradual migration away from a legacy data system
Collaborate with internal stakeholders to understand data needs, define requirements, and ensure reliable integration of partner data feeds
Troubleshoot pipeline issues, optimize performance, and improve overall system stability
Contribute to best practices around code quality, testing, documentation, and data governance
Requirements:
5+ years of hands-on data engineering experience (flexible for exceptional mid-level talent)
Strong expertise in Python and SQL for data manipulation, automation, and backend data workflows
Experience building ETL pipelines and working with data coming from external organizations or third-party sources
Familiarity with AWS data technologies such as Glue, RDS, Lambda, or related cloud‑native services
Experience with Airflow or dbt for orchestration (either is fine
both are a plus)
Background with PySpark or distributed data processing is highly desirable
Any exposure to supporting or migrating off legacy systems is a strong plus
Ability to work autonomously in a remote-first environment and communicate clearly with technical and non-technical colleagues
Nice to have:
Exposure to supporting or migrating off legacy systems