Data Engineer - Assistant Vice President Job at Citi (Pune)

Data Engineer (Big Data, Python, Databricks) - Assistant Vice President

The Applications Development Senior Programmer Analyst is an intermediate level ...

Location

India , Chennai, Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

5-8 years of relevant handson experience in Big Data technologies like Cloudera, Python, HQL, Java/PySpark
Knowledge on Machine Learning, AI would be added advantage
Experience in systems analysis, data analysis and programming of software applications
Experience in managing and implementing successful projects
Working knowledge of consulting/project management techniques/methods
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor’s degree/University degree or equivalent experience

Job Responsibility

Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
Ensure essential procedures are followed and help define operating standards and processes
Serve as advisor or coach to new or lower level analysts
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Data Engineer (Big Data, Cloud - AWS, Databricks) - Assistant Vice President

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Scala, Spark/Pyspark is must, Hadoop ( BIG Data ), + AWS,Databricks
8 to 11 years’ experience implementing data-intensive solutions using agile methodologies
Experience of relational databases and using SQL for data querying, transformation and manipulation
Experience of modelling data for analytical consumers
Ability to automate and streamline the build, test and deployment of data pipelines
Experience in cloud native technologies and patterns
A passion for learning new technologies, and a desire for personal growth, through self-study, formal classes, or on-the-job training
Excellent communication and problem-solving skills
An inclination to mentor
an ability to lead and deliver medium sized components independently

Job Responsibility

Developing and supporting scalable, extensible, and highly available data solutions
Deliver on critical business priorities while ensuring alignment with the wider architectural vision
Identify and help address potential risks in the data supply chain
Follow and contribute to technical standards
Design and develop analytical data models

Fulltime

Oracle PLSQL Database Data Engineer – Assistant Vice President

Oracle PLSQL Database Data Engineer – Assistant Vice President is an intermediat...

Location

India , Pune, Maharashtra, India, Chennai, Tamil Nadu, India

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years of relevant experience
Very good understanding in Oracle architecture
Strong exposure to Python, Spark, Agentic workflows - Devin, Copilot
Extensively worked on PL/SQL Packages, Procedures, Functions, Triggers, Views, MViews, External tables, partitions and Exception handling for retrieving, manipulating, checking and migrating complex data sets in Oracle
Must have experience in advance PL/SQL, Hints, Cursors to process huge volumes of data, bulk collect for performance improvement process
Must have experience Data Modeling and Warehousing concepts such as Star Schema, OLAP, OLTP, Snowflake schema, Fact Tables for Measurements, Dimension Tables for Descriptive Context, Periodic Snapshot Fact Tables, Junk Dimensions etc.
Hands on experience on SCD Type 1,Type 2,Type 3,Type 4,Type 5,Type 6
Must have very good understanding of Snapshot tables, staging tables, History tables, Audit tables, Granularity, Single Granularity for Facts, Dimension Granularity and Hierarchies, Date Dimension, Degenerate Dimensions, Surrogate Keys, Generalization and Normalization of data, Ragged Hierarchies etc.
Highly proficient in writing complex yet efficient SQL
Knowledge of any data modeling tool

Job Responsibility

Participation in the establishment and implementation of new or revised application systems and programs in coordination with the Technology team
Contribute to applications systems analysis and programming activities

Fulltime

Python Full Stack Data Engineer - Assistant Vice President

We are assembling an A-team of highly skilled, autonomous, and AI-first engineer...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns

Job Responsibility

Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same

Fulltime

Senior Big Data Engineer - Assistant Vice President

The Senior Data Engineer (C12 – AVP) is a senior-level position responsible for ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

9–12 years of relevant experience in data analysis and data engineering, preferably within the Financial Services or Banking industry
Proven interpersonal, diplomatic, management, and prioritization skills
Consistently demonstrates clear and concise written and verbal communication
Proven ability to manage multiple activities, build strong working relationships, and work effectively under pressure
Demonstrated strong problem-solving, analytical, and decision-making skills with a methodical attention to detail
Proven self-motivation to take initiative and master new tasks and technologies quickly
Education: Bachelor's degree / University degree in a technical or business discipline (Computer Science, Information Systems, Engineering, Finance, or equivalent experience)
Functional Skillset: Data Analysis: Extensive experience in analyzing and interpreting complex data from disparate sources to provide actionable insights
Financial/Banking Domain Expertise: Strong understanding of financial products, banking processes, and industry standards
Data Requirements Definition: Proven ability to analyze different data sources and datasets to create comprehensive data mapping documents and define data ingestion requirements

Job Responsibility

Consult with users and clients to solve complex data-related issues through in-depth evaluation of business processes, data sources, and industry standards
Analyze large and diverse datasets from various sources to identify trends, patterns, and anomalies, providing critical input for business and technology initiatives
Develop and document data mapping specifications, transformation logic, and ingestion requirements for new data pipelines and systems
Consult with business clients to determine functional specifications for data-centric systems and provide ongoing operational support
Design and implement scalable data pipelines and batch/streaming workflows using Apache Spark, Spark Streaming, Hive, and Hadoop within enterprise big data ecosystems
Develop and maintain backend services and automation scripts using Java, Spring Boot, JPA, and Shell Scripting to support data processing and operational workflows
Build and manage event-driven data architectures leveraging Apache Kafka for real-time data ingestion and streaming use cases
Automate job scheduling and dependency management using Autosys
manage and optimize Oracle database objects and queries to support analytical workloads
Develop supporting interfaces and data visualization components using JavaScript to enhance data accessibility and reporting capabilities

Fulltime

Data Platform Engineer - Assistant Vice President

We are seeking a talented and passionate engineer to join our growing team. As a...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field
Minimum 5 years of experience developing and deploying production-ready Java applications in a data engineering context
Strong experience with core Java (version 11 or higher), SQL, and database APIs
Proven experience working with distributed stream processing frameworks like Apache Flink, Spark Streaming, or Kafka Streams
Experience with event-driven architectures and real-time data processing
Solid understanding of OOP concepts, multithreading, and thread pools
Familiarity with containerization technologies like Docker and deployment platforms like Openshift, ECS, or Kubernetes is a plus
Experience producing high quality code using agentic coding assistants
Excellent communication and collaboration skills

Job Responsibility

Design, develop, and maintain robust and scalable data platform using Java and related technologies (e.g., Apache Flink, Kafka, Trino)
Advise data engineers on how to build and optimize real-time and batch data processing applications to support low-latency requirements
Extend the platform with data integration solutions between various data sources and targets, including databases, APIs, and streaming platforms
Contribute to the design and development of event-driven architectures
Write clean, well-documented, and testable code
Collaborate effectively with other engineers, product managers, and stakeholders throughout the software development lifecycle (SDLC), adhering to Agile methodologies
Stay up-to-date with the latest trends and technologies in the data engineering space

Fulltime

New

Senior Data Engineer (ETL/Big Data/Python) - Assistant Vice President

CITI Bank Enterprise Analytical Services organization is seeking a highly skille...

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Information Technology, or a related field
8+ years of experience in data engineering, software development, or a similar role, with at least 2 years in a lead capacity
Proven experience with ETL pipeline design and development
Expert proficiency in Python and Spark
Strong experience with advanced scripting
Deep knowledge of relational databases (Oracle) and NoSQL databases (MongoDB), Snowflake
Solid understanding of data modeling, data warehousing, and performance tuning
Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, GitHub Actions)
Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, Splunk, Datadog)
Excellent problem-solving and analytical skills, with a keen eye for detail

Job Responsibility

Design, develop, and implement ETL based reporting framework, focusing on data collection, transformation, and presentation
Utilize a robust technology stack including Databricks, Spark, Hive, Ozone, and Hadoop to process large volumes of data efficiently
Write and optimize complex ETL jobs using PySpark and advanced Python scripting
Design and manage data storage in various databases, including Oracle, MongoDB and Snowflake for flexible data models
Develop and maintain automated job schedules using Autosys, Apache Airflow for seamless data pipeline execution
Ability to develop visualization reporting dashboards
Able to leverage enterprise approved productivity tools like Copilot, etc., in daily analysis & development tasks
Act as the technical subject matter expert (SME) for the reporting framework, providing guidance and mentorship to the development team
Lead requirements gathering discussions with business stakeholders and product owners to understand reporting needs and translate them into technical solutions
Coordinate closely with the QA team to ensure thorough testing and data validation, maintaining high standards of data accuracy and integrity

Fulltime

Genai engineer data science-assistant vice president

Data Science, Assistant Vice President – Analytics & Information Management (AIM...

Location

India , Gurugram, Haryana

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years of experience in data analytics roles
Proficiency in analytics tools/technologies like SQL, SAS, Python, PySpark
Sound knowledge of machine learning/deep learning and statistical modeling techniques
Experience working with Machine Learning software frameworks and relevant Python libraries (e.g., scikit-learn, xgboost, Keras, NLTK, BERT, TensorFlow)
Hands-on experience in PySpark/Python/R programming along with strong experience in SQL
Experience working with large and multiple datasets, data warehouses
Strong background in Statistical Analysis
Experience working on Transformers/ LLMs (OpenAI, Claude, Gemini), Prompt engineering, RAG based architectures and relevant tools/frameworks (TensorFlow, PyTorch, Hugging Face Transformers, LangChain/Graph, LlamaIndex)
Understanding of transformers/language models
Familiarity with vector databases and fine-tuning techniques

Job Responsibility

Drive the development and implementation of analytical solutions to support key business objectives for Banking Operations & Analytics
Work with large, complex and unstructured data using a variety of tools (Python, PySpark, SQL, R) to build modeling solutions
Primary focus areas would be model building, model validations, model implementation and model governance related responsibilities for multiple portfolios
Responsible for documenting data requirements, data collection/processing/cleaning, and exploratory data analysis
Work with other members in the team and business partners to jointly build model driven solutions using traditional methods as well as Machine Learning driven modeling solutions
Work with model governance & fair lending teams to ensure compliance of models in accordance with Citi standards

Fulltime

Select Country

Data Engineer - Assistant Vice President

Job Description

Requirements

Looking for more opportunities?