CrawlJobs Logo

Test Data Architect

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India, Chennai

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly skilled and self-driven Data Testing Architect to oversee and own the design, build, and deployment of scalable ETL pipelines across hybrid environments including Cloudera Hadoop, Red Hat OpenShift, and AWS Cloud. This role focuses on developing robust PySpark-based data processing solutions, building testing frameworks for ETL jobs, and leveraging containerization and orchestration platforms like Docker and AWS EKS for scalable workloads.

Job Responsibility:

  • Build Data Pipelines
  • Testing and Validation
  • Containerization and Orchestration
  • Cloud Integration
  • Test Data Management
  • Build and maintain ETL validation and testing scripts
  • Work with Hive, HDFS, and Oracle data sources to extract, transform, and load large-scale datasets
  • Develop Dockerfiles and create container images for PySpark jobs
  • Deploy and orchestrate ETL jobs using AWS EKS
  • Leverage AWS services such as S3, Lambda, and Airflow for data ingestion, event-driven processing, and orchestration
  • Design and develop PySpark-based ETL pipelines on Cloudera Hadoop platform
  • Create reusable frameworks, libraries, and templates to accelerate automation and testing of ETL jobs
  • Participate in code reviews, CI/CD pipelines, and maintain best practices in Spark and cloud-native development
  • Ensure tooling can be run in CI/CD providing real-time on demand test execution
  • Lead a team of automation professionals and guide them on projects
  • Own and maintain automation best practices
  • Define the overall strategy for automating data processes and testing
  • Research and implement new automation tools and techniques
  • Work closely with other teams and partners to ensure smooth data operations and regulatory compliance
  • Track key performance indicators (KPIs) related to automation
  • Monitor and review code check-ins from the team

Requirements:

  • 12-15 years of experience on data platform testing across data lineage especially with knowledge of regulatory compliance and risk management
  • Detailed knowledge data flows in relational database and Bigdata
  • Selenium BDD Cucumber using Java, Python
  • Strong experience with Python
  • broader understanding for batch and stream processing deploying PySpark workloads to AWS EKS
  • Proficiency in testing on Cloudera Hadoop ecosystem
  • Hands-on experience with ETL
  • Strong knowledge of Oracle SQL and HiveQL
  • Solid understanding of AWS services like S3, Lambda, EKS, Airflow, and IAM
  • Understanding of architecture on cloud with S3, Lamda, Airflow DAGs to orchestrate ETL jobs
  • Familiarity with CI/CD tools
  • Scripting knowledge in Python
  • Version Control: GIT, Bitbucket, GitHub
  • Experience on BI reports validations e.g., Tableau dashboards and views validation
  • Strong understanding of Wealth domain, data regulatory & governance for APAC, EMEA and NAM
  • Strong problem-solving and debugging skills
  • Excellent communication and collaboration abilities to lead and mentor a large techno-functional team across different geographical locations
  • Manage global teams and ability to support multiple time zones
  • Strong financial Acumen and great presentation skills
  • Able to work in an Agile environment and deliver results independently

Nice to have:

  • Strong problem-solving and debugging skills
  • Excellent communication and collaboration abilities to lead and mentor a large techno-functional team across different geographical locations
  • Strong financial Acumen and great presentation skills
What we offer:
  • Global benefits
  • Equal opportunity employer

Additional Information:

Job Posted:
June 25, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.