CrawlJobs Logo

Data Testing Architect

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India, Chennai

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

This job posting is for a senior-level manager who specializes in automating the movement and transformation of data (ETL) within a banking environment. The role focuses on developing robust PySpark-based data processing solutions, building testing frameworks for ETL jobs, and leveraging containerization and orchestration platforms like Docker and AWS EKS for scalable workloads.

Job Responsibility:

  • Build Data Pipelines: Create testing solutions to extract data from various sources (like databases and data lakes), clean and transform it, and load it into target systems
  • Testing and Validation: Develop automated tests to ensure the data pipelines are working correctly and the data is accurate
  • Containerization and Orchestration: Package these data pipelines into containers (using Docker) and manage their execution using orchestration tools (like AWS EKS)
  • Cloud Integration: Work with various cloud services (like AWS S3, Lambda, and Airflow) for data storage, processing, and scheduling
  • Test Data Management - Oversee test data strategies and environment simulations for scalable, reliable automation
  • Build and maintain ETL validation and testing scripts that run on Red Hat OpenShift containers
  • Work with Hive, HDFS, and Oracle data sources to extract, transform, and load large-scale datasets
  • Develop Dockerfiles and create container images for PySpark jobs
  • Deploy and orchestrate ETL jobs using AWS EKS (Elastic Kubernetes Service) and integrate them into workflows
  • Leverage AWS services such as S3, Lambda, and Airflow for data ingestion, event-driven processing, and orchestration
  • Design and develop PySpark-based ETL pipelines on Cloudera Hadoop platform
  • Create reusable frameworks, libraries, and templates to accelerate automation and testing of ETL jobs
  • Participate in code reviews, CI/CD pipelines, and maintain best practices in Spark and cloud-native development
  • Ensures tooling can be run in CICD providing real-time on demand test execution shortening the feedback loop to fully support Handsfree execution
  • Regression, Integration, Sanity testing – provide solutions and ensures timely completion

Requirements:

  • 15-18 years of experience on data platform testing across data lineage especially with knowledge of regulatory compliance and risk management
  • Detailed knowledge data flows in relational database and Bigdata (Familiarity with Hadoop)
  • Selenium BDD Cucumber using Java, Python
  • Strong experience with Python
  • broader understanding for batch and stream processing deploying PySpark workloads to AWS EKS (Kubernetes)
  • Proficiency in testing on Cloudera Hadoop ecosystem (HDFS, Hive) and AWS
  • Hands-on experience with ETL
  • Strong knowledge of Oracle SQL and HiveQL
  • Solid understanding of AWS services like S3, Lambda, EKS, Airflow, and IAM
  • Understanding of architecture on cloud with S3, Lamda, Airflow DAGs to orchestrate ETL jobs
  • Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI)
  • Scripting knowledge in Python
  • Version Control: GIT, Bitbucket, GitHub
  • Experience on BI reports validations e.g., Tableau dashboards and views validation
  • Strong understanding of Wealth domain, data regulatory & governance for APAC, EMEA and NAM
  • Strong problem-solving and debugging skills
  • Excellent communication and collaboration abilities to lead and mentor a large techno-functional team across different geographical locations
  • Manage global teams and ability to support multiple time zones
  • Strong financial Acumen and great presentation skills
  • Able to work in an Agile environment and deliver results independently

Additional Information:

Job Posted:
June 27, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.