This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced Data Engineer to join a team delivering modern data solutions in Poughkeepsie, New York. This Long-term Contract position focuses on building reliable, scalable data platforms in Databricks while supporting analytics needs across the business. The role calls for a hands-on engineer who can improve data performance, uphold governance standards, and partner effectively with cross-functional stakeholders in an agile environment.
Job Responsibility
Create and support scalable data pipelines in Databricks using Spark technologies such as PySpark or Scala to process and deliver high-quality data
Develop lakehouse architectures on Azure Data Lake Storage Gen2 and ensure strong integration with Databricks for efficient data management
Establish and monitor data quality controls and governance practices within the platform using validation methods and Delta Lake capabilities
Investigate pipeline and application inefficiencies, then implement tuning strategies to improve Spark and Databricks performance
Work closely with analysts and other stakeholders to translate business data needs into refined, analytics-ready datasets
Automate ingestion, transformation, testing, and release processes, including integration with CI/CD workflows where appropriate
Provide guidance to less experienced engineers by sharing best practices for Databricks development, optimization, and support
Maintain clear technical documentation for notebooks, workflows, data models, configurations, and operational procedures
Protect data assets by applying security controls and compliance standards across the Databricks environment
Contribute to design sessions, solve complex data issues, and uphold change management and data integrity standards while delivering large assignments on schedule
Requirements
Hands-on experience building data engineering solutions with Databricks and Apache Spark
Strong programming ability in Python, including development of ETL and data transformation workflows
Knowledge of lakehouse and big data technologies such as Delta Lake, Apache Hadoop, and Apache Kafka
Experience working with Azure Data Lake Storage Gen2 or comparable cloud-based data storage platforms
Ability to optimize distributed data processing jobs and troubleshoot performance issues in Spark environments
Familiarity with data governance, data quality, and security practices for enterprise data platforms
Comfortable working independently and collaborating with cross-functional teams in an agile delivery model
Proven ability to analyze technical problems, break them into manageable components, and implement effective solutions.
What we offer
Medical, vision, dental, and life and disability insurance