Data Quality Engineering Lead, Citi

Citi

Location:
Singapore, Singapore

Category:
IT - Software Development

Contract Type:
Employment contract

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking a highly skilled and self-driven Data Testing Lead to oversee and own design, build, and deploy of scalable ETL pipelines across hybrid environments including Cloudera Hadoop, Red Hat OpenShift, and AWS Cloud. This role focuses on developing robust PySpark-based data processing solutions, building testing frameworks for ETL jobs, and leveraging containerization and orchestration platforms like Docker and AWS EKS for scalable workloads.

Job Responsibility:

build data pipelines
develop automated tests to ensure data pipelines are working correctly and the data is accurate
package data pipelines into containers using Docker and manage execution using orchestration tools like AWS EKS
work with various cloud services for data storage, processing, and scheduling
oversee test data strategies and environment simulations for scalable, reliable automation
build and maintain ETL validation and testing scripts that run on Red Hat OpenShift containers
design and develop PySpark-based ETL pipelines on Cloudera Hadoop platform
develop reusable frameworks, libraries, and templates
participate in code reviews, CI/CD pipelines, and maintain best practices in Spark and cloud-native development
ensure tooling can be run in CI/CD for handsfree execution
provide solutions for regression, integration, and sanity testing
lead a team of automation professionals
develop automation strategies
research and implement new tools and techniques for automation
report progress and KPIs to leadership
collaborate with other teams to meet regulatory requirements
ensure new utilities are documented

Requirements:

12-15 years of experience on data platform testing across data lineage especially with knowledge of regulatory compliance and risk management
detailed knowledge of data flows in relational database and Bigdata
familiarity with Hadoop
Selenium BDD Cucumber using Java, Python
strong experience with Python
broader understanding for batch and stream processing deploying PySpark workloads to AWS EKS
proficiency in testing on Cloudera Hadoop ecosystem (HDFS, Hive) and AWS
hands-on experience with ETL
strong knowledge of Oracle SQL and HiveQL
solid understanding of AWS services like S3, Lambda, EKS, Airflow, and IAM
understanding of architecture on cloud with S3, Lambda, Airflow DAGs to orchestrate ETL jobs
familiarity with CI/CD tools (e.g., Jenkins, GitLab CI)
scripting knowledge in Python
version control: GIT, Bitbucket, GitHub
experience on BI reports validations e.g., Tableau dashboards and views validation
strong understanding of Wealth domain, data regulatory & governance for APAC, EMEA and NAM
strong problem-solving and debugging skills
excellent communication and collaboration abilities to lead and mentor a large techno-functional team across different geographical locations
manage global teams and ability to support multiple time zones
strong financial acumen and great presentation skills
able to work in an Agile environment and deliver results independently

Nice to have:

experience with synthetic data generation
experience on BI reports validations e.g., Tableau dashboards and views validation

Additional Information:

Job Posted:
July 04, 2025

Employment Type:

Fulltime

Work Type:

On-site work

View All Jobs In This Company

Job Link Share:

Data Quality Engineering Lead