Data Engineer - Controls Technology, Citi

Citi

Location:
India, Chennai ▼
Pune

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking a highly skilled and hands-on Data Engineer to join Controls Technology to support the design, development, and implementation of our next-generation Data Mesh and Hybrid Cloud architecture. This role is critical in building scalable, resilient, and future-proof data pipelines and infrastructure that enable the seamless integration of Controls Technology data within a unified platform. The Data Engineer will work closely with the Data Mesh and Cloud Architect Lead to implement data products, ETL/ELT pipelines, hybrid cloud integrations, and governance frameworks that support data-driven decision-making across the enterprise.

Job Responsibility:

Design, build, and optimize ETL/ELT pipelines for structured and unstructured data
Develop real-time and batch data ingestion pipelines using distributed data processing frameworks
Ensure pipelines are highly performant, cost-efficient, and secure
Work extensively with Apache Iceberg for data lake storage optimization and schema evolution
Manage Iceberg Catalogs and ensure seamless integration with query engines
Configure and maintain Hive MetaStore (HMS) for Iceberg-backed tables and ensure proper metadata management
Utilize Starburst and Stargate to enable distributed SQL-based analytics and seamless data federation
Optimize performance tuning for large-scale querying and federated access to structured and semi-structured data
Implement Data Mesh principles by developing domain-specific data products that are discoverable, interoperable, and governed
Collaborate with data domain owners to enable self-service data access while ensuring consistency and quality
Develop and manage data storage, processing, and retrieval solutions across AWS and on-premise environments
Work with cloud-native tools such as AWS S3, RDS, Lambda, Glue, Redshift, and Athena to support scalable data architectures
Ensure hybrid cloud data flows are optimized, secure, and compliant with organizational standards
Implement data governance, lineage tracking, and metadata management solutions
Enforce security best practices for data encryption, role-based access control (RBAC), and compliance with policies such as GDPR and CCPA
Monitor and optimize data workflows, performance tuning of queries, and resource utilization
Implement logging, alerting, and monitoring solutions using CloudWatch, Prometheus, or Grafana to ensure system health
Work closely with data architects, application teams, and business units to ensure seamless integration of data solutions
Maintain clear documentation of data models, transformations, and architecture for internal reference and governance

Requirements:

Strong proficiency in Python, SQL, and Shell scripting
Experience with Scala or Java
Hands-on experience with Apache Spark, Kafka, Flink, or similar distributed processing frameworks
Strong knowledge of relational (PostgreSQL, MySQL, Oracle) and NoSQL databases (DynamoDB, MongoDB)
Expertise in Apache Iceberg for managing large-scale data lakes, schema evolution, and ACID transactions
Experience working with Iceberg Catalogs, Hive MetaStore (HMS), and integrating Iceberg-backed tables with query engines
Familiarity with Starburst and Stargate for federated querying and cross-platform data access
Experience working with AWS data services (S3, Redshift, Glue, Athena, EMR, RDS)
Understanding of hybrid data storage and integration between on-prem and cloud environments
Experience with Terraform, AWS CloudFormation, or Kubernetes for provisioning infrastructure
CI/CD pipeline experience using GitHub Actions, Jenkins, or GitLab CI/CD
Familiarity with data cataloging, lineage tracking, and metadata management
Understanding of RBAC, IAM roles, encryption, and compliance frameworks (GDPR, SOC2, etc.)
Problem-Solving & Analytical Thinking
Collaboration & Communication
Ownership & Proactiveness
Continuous Learning
4-6 years of experience in data engineering, cloud infrastructure, or distributed data processing
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field
Hands-on experience with data pipelines, cloud services, and large-scale data platforms
Strong foundation in SQL, Python, Apache Iceberg, Starburst, and cloud-based data solutions (AWS preferred), Apache Airflow Orchestration

Additional Information:

Job Posted:
June 13, 2025

Employment Type:

Fulltime

Work Type:

On-site work

View All Jobs In This Company

Job Link Share:

Data Engineer - Controls Technology