CrawlJobs Logo

Data Engineering Python and Pyspark - Assistant Vice President

India, Chennai · Job Posted June 10, 2026
Apply Position
Job Link Share

Job Responsibility

  • Build and maintain batch or real-time data pipelines in data platform
  • Maintain and optimize the data infrastructure required for accurate extraction, transformation, and loading of data from a wide variety of data sources
  • Develop ETL (extract, transform, load) processes to help extract and manipulate data from multiple sources
  • Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
  • Automate data workflows such as data ingestion, aggregation, and ETL processing
  • Analyze large datasets to identify trends, anomalies, and behavioral patterns
  • Apply machine learning and AI concepts (supervised / unsupervised learning) to support predictive and exploratory analysis
  • Perform feature engineering and data transformations to enable ML models
  • Support trend analysis, segmentation, clustering, and forecasting use cases
  • Interpret analytical results and translate them into business-friendly insights
  • Prepare raw data in Data Warehouses into a consumable dataset for both technical and non-technical stakeholders
  • Build, maintain, and deploy data products for analytics and data science teams on data platform
  • Ensure data accuracy, integrity, privacy, security, and compliance through quality control procedures
  • Monitor data systems performance and implement optimization solution
  • Has the ability to operate with a limited level of direct supervision
  • Can exercise independence of judgement and autonomy
  • Acts as SME to senior stakeholders and /or other team members
  • Serve as advisor or coach to new or lower level analysts
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency

Requirements

  • 8 to 14 years of relevant experience in Data engineering role
  • Advanced SQL/ RDBMS skills and experience with relational databases and database design
  • Strong proficiency in object-oriented languages: Python, PySpark is must
  • Experience working with Bigdata - Hive/Impala/S3/HDFS
  • Experience working with data ingestion tools such as Talend or Ab Initio
  • Strong experience in data analysis, data exploration, and trend identification
  • Solid understanding of machine learning fundamentals like Regression, classification, clustering, Feature engineering
  • Strong proficiency in scripting languages like Bash, UNIX Shell scripting
  • Strong proficiency in data pipeline and workflow management tools
  • Strong project management and organizational skills
  • Excellent problem-solving, communication, and organizational skills
  • Proven ability to work independently and with a team
  • Experience in managing and implementing successful projects
  • Ability to adjust priorities quickly as circumstances dictate
  • Consistently demonstrates clear and concise written and verbal communication
  • Bachelor's degree/University degree or equivalent experience

Nice to have

Working with data lakehouse architecture such as AWS Cloud/Airflow/Starburst/Iceberg

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Engineering Python and Pyspark - Assistant Vice President

8 matching positions

Data Engineer (Big Data, Python, Databricks) - Assistant Vice President

The Applications Development Senior Programmer Analyst is an intermediate level ...
Location
Location
India , Chennai, Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years of relevant handson experience in Big Data technologies like Cloudera, Python, HQL, Java/PySpark
  • Knowledge on Machine Learning, AI would be added advantage
  • Experience in systems analysis, data analysis and programming of software applications
  • Experience in managing and implementing successful projects
  • Working knowledge of consulting/project management techniques/methods
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Bachelor’s degree/University degree or equivalent experience
Job Responsibility
Job Responsibility
  • Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
  • Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
  • Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
  • Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
  • Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
  • Ensure essential procedures are followed and help define operating standards and processes
  • Serve as advisor or coach to new or lower level analysts
  • Has the ability to operate with a limited level of direct supervision
  • Can exercise independence of judgement and autonomy
  • Acts as SME to senior stakeholders and /or other team members
  • Fulltime
Read More
Arrow Right

Python Full Stack Data Engineer - Assistant Vice President

We are assembling an A-team of highly skilled, autonomous, and AI-first engineer...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
  • Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
  • Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
  • Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
  • Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
  • Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
  • Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
  • Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
  • Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
  • Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns
Job Responsibility
Job Responsibility
  • Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
  • Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
  • Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
  • Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
  • Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
  • Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
  • Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
  • Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
  • Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
  • Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same
  • Fulltime
Read More
Arrow Right
New

Python Engineering AI Lead-Assistant Vice President

We are seeking a highly motivated and experienced Principal Engineer to join our...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of overall experience in large-scale application development with recent mandatory platform for the secure and scalable deployment of AI agents into application contexts
  • Minimum of 5+ years of proven experience in a Python and pyspark Engineering lead role focused on building enterprise-grade, high-volume ELT/ETL processes using the PySpark and Databricks ecosystem
  • Hands-on experience with agentic AI development using YAML, JSON, FAST API or Spring boot, Google ADK, LLM itegrations, including Devin.AI or Github Copilot, and integrating models via platforms like MCP using advanced prompt engineering
  • Proven experience developing and automating microservice integrations to support data-intensive applications
  • Proficiency in at least one programming language commonly used for data analytics, engineering, such as Python or Scala
  • Strong SQL skills and experience with various relational databases
  • Deep understanding of data modeling, data warehousing concepts, Data Mesh architecture, and data federation
  • Excellent communication, collaboration, and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, develop, and maintain scalable, enterprise-grade AI agents, supporting ELT/ETL processes to handle large data volumes using the Python, FAST API, Microservices, PySpark, Kafka and Databricks ecosystem
  • Build and Deploy GEN AI Agents using Googles ADK and Google Flash 2.5+ LLMs to support application automation supports and its deep insights, workflow support with HIL - Human in loop architecture
  • Build and maintain data federation layers for lambda and Data Mesh architectures using tools like Starburst, with a strategy for adopting AI-based use cases (e.g., machine learning, deep learning, NLP) to drive efficiency
  • Develop, deploy, and automate microservice integrations to support data-intensive applications, ensuring scalability, resilience, and maintainability using cloud native infrastructure and openshift or Kubernates architecture including CI/CD pipelines
  • Integrate and leverage agentic AI tools (e.g., Devin.AI, Github Copilot) and platforms (e.g., MCP) through advanced prompt engineering to enhance development and operational efficiency
  • Ensure data quality, integrity, and security throughout the entire data lifecycle
  • Contribute to the continuous improvement of data engineering processes, standards, and best practices within the team
  • Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citi, its clients, and assets by driving compliance with applicable laws, rules, and regulations
  • Adhere to Policy, apply sound ethical judgment, and escalate, manage, and report control issues with transparency
  • Fulltime
Read More
Arrow Right

Business Data Analyst - Assistant Vice President

This role is within enterprise data office and product solution team; focused on...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of combined experience in banking and financial services industry, information technology and/or data controls and governance
  • Preferably Engineering Graduate with Post Graduation in Finance
  • Extensive experience in Capital Markets business and processes
  • Deep understanding of Derivative products (i.e., Equities, FX, IRS, Commodities etc.) Or SFT (Repo, Reverse Repo, Securities Lending and Borrowing)
  • Strong Data analysis skills using Excel, SQL, Python, Pyspark etc.
  • AI-Accelerated Data Analysis: Leverage AI-powered tools (e.g., AutoML, Python libraries) to clean, analyze, and visualize data from structured and unstructured sources
  • Well versed with Prompt Engineering & Automation utilizing GenAI and LLMs
  • Experience with data management processes and tools and applications, including process mapping and lineage toolsets
  • Actively managed various aspects of data initiatives including analysis, planning, execution, and day-to-day production management
  • Ability to identify and solve problems throughout the product development process
Job Responsibility
Job Responsibility
  • Understand Derivatives and SFT data flows within CITI
  • Data analysis for derivatives products and SFT across systems for target state adoption and resolution of data gaps/issues
  • Lead assessment of end-to-end data flows for all data elements used in Regulatory Reports
  • Document current and target states data mapping and produce gap assessment
  • Coordinate with the business for identifying critical data elements, defining standards and quality expectations, and prioritize remediation of data issues
  • Identify appropriate strategic source for critical data elements
  • Design and Implement data governance controls including data quality rules and data reconciliation
  • Design systematic solution for elimination of manual processes/adjustments and remediation of tactical solutions
  • Prepare detailed requirement specifications containing calculations, data transformations and aggregation logic
  • Perform functional testing and data validations
  • Fulltime
Read More
Arrow Right

Business Data Analyst - Assistant Vice President

This role is within enterprise data office and product solution team; focused on...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of combined experience in banking and financial services industry, information technology and/or data controls and governance
  • Preferably Engineering Graduate with Post Graduation in Finance
  • Extensive experience in Capital Markets business and processes
  • Deep understanding of Derivative products (i.e., Equities, FX, IRS, Commodities etc.) Or SFT (Repo, Reverse Repo, Securities Lending and Borrowing)
  • Strong Data analysis skills using Excel, SQL, Python, Pyspark etc.
  • AI-Accelerated Data Analysis: Leverage AI-powered tools (e.g., AutoML, Python libraries) to clean, analyze, and visualize data from structured and unstructured sources
  • Well versed with Prompt Engineering & Automation utilizing GenAI and LLMs
  • Experience with data management processes and tools and applications, including process mapping and lineage toolsets
  • Actively managed various aspects of data initiatives including analysis, planning, execution, and day-to-day production management
  • Ability to identify and solve problems throughout the product development process
Job Responsibility
Job Responsibility
  • Understand Derivatives and SFT data flows within CITI
  • Data analysis for derivatives products and SFT across systems for target state adoption and resolution of data gaps/issues
  • Lead assessment of end-to-end data flows for all data elements used in Regulatory Reports
  • Document current and target states data mapping and produce gap assessment
  • Coordinate with the business for identifying critical data elements, defining standards and quality expectations, and prioritize remediation of data issues
  • Identify appropriate strategic source for critical data elements
  • Design and Implement data governance controls including data quality rules and data reconciliation
  • Design systematic solution for elimination of manual processes/adjustments and remediation of tactical solutions
  • Prepare detailed requirement specifications containing calculations, data transformations and aggregation logic
  • Perform functional testing and data validations
  • Fulltime
Read More
Arrow Right

Genai engineer data science-assistant vice president

Data Science, Assistant Vice President – Analytics & Information Management (AIM...
Location
Location
India , Gurugram, Haryana
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in data analytics roles
  • Proficiency in analytics tools/technologies like SQL, SAS, Python, PySpark
  • Sound knowledge of machine learning/deep learning and statistical modeling techniques
  • Experience working with Machine Learning software frameworks and relevant Python libraries (e.g., scikit-learn, xgboost, Keras, NLTK, BERT, TensorFlow)
  • Hands-on experience in PySpark/Python/R programming along with strong experience in SQL
  • Experience working with large and multiple datasets, data warehouses
  • Strong background in Statistical Analysis
  • Experience working on Transformers/ LLMs (OpenAI, Claude, Gemini), Prompt engineering, RAG based architectures and relevant tools/frameworks (TensorFlow, PyTorch, Hugging Face Transformers, LangChain/Graph, LlamaIndex)
  • Understanding of transformers/language models
  • Familiarity with vector databases and fine-tuning techniques
Job Responsibility
Job Responsibility
  • Drive the development and implementation of analytical solutions to support key business objectives for Banking Operations & Analytics
  • Work with large, complex and unstructured data using a variety of tools (Python, PySpark, SQL, R) to build modeling solutions
  • Primary focus areas would be model building, model validations, model implementation and model governance related responsibilities for multiple portfolios
  • Responsible for documenting data requirements, data collection/processing/cleaning, and exploratory data analysis
  • Work with other members in the team and business partners to jointly build model driven solutions using traditional methods as well as Machine Learning driven modeling solutions
  • Work with model governance & fair lending teams to ensure compliance of models in accordance with Citi standards
  • Fulltime
Read More
Arrow Right

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

The Applications Development Senior Programmer Analyst is a senior-level positio...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7-10 years of relevant experience in Data Engineering or a similar role, preferably within the Financial Services industry
  • Senior-level experience in an Applications Development or Data Engineering role
  • Consistently demonstrates clear and concise written and verbal communication
  • Demonstrated problem-solving and decision-making skills
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Bachelor's degree/University degree or equivalent experience
  • Hands-on expertise in Java (8+), Spring Boot, Python, and PySpark for building high-performance data applications
  • Extensive experience with the BigData ecosystem, including Apache Spark for large-scale data processing
  • Solid understanding of Data Warehouse concepts, design principles, and best practices
  • Strong proficiency with both relational SQL databases and NoSQL databases (e.g., MongoDB, Couchbase)
Job Responsibility
Job Responsibility
  • Utilize expert knowledge of data engineering principles, big data technologies, and software development best practices to design and implement robust data solutions
  • Collaborate with business stakeholders, data scientists, and other technology teams to understand data requirements and deliver effective solutions
  • Apply deep expertise in programming languages like Python and Java for building high-performance data processing applications
  • Ensure data solutions are secure, scalable, and adhere to the firm's security and architectural standards
  • Mentor and guide junior engineers, fostering a culture of technical excellence and continuous learning
  • Lead the analysis of complex data-related issues, identify root causes, and implement sustainable solutions
  • Operate with a high degree of autonomy and independence, exercising sound judgment and decision-making
  • Act as a Subject Matter Expert (SME) in big data technologies for senior stakeholders and other team members
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Fulltime
Read More
Arrow Right
New

Principal Data Genai Platform Engineer - Senior Vice President

Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of relevant experience in enterprise application development, data engineering, or AI platform engineering, with a strong track record of leadership in regulated environments
  • 8+ years of experience leading multi-team Agile organizations (20+ engineers), including managing distributed and hybrid AI-assisted teams
  • Advanced expertise in Python, PySpark, and Databricks ecosystem for large-scale data processing and ELT/ETL pipelines
  • Proven experience architecting and implementing enterprise AI/GenAI platforms, including agentic AI frameworks, LLM integrations, and prompt engineering
  • Hands-on experience with AI-assisted development tools such as Devin.AI and GitHub Copilot and integrating them into engineering workflows
  • Strong experience with microservices architecture, APIs, and cloud-native deployment (Kubernetes/OpenShift)
  • Strong experience with event-driven architectures and streaming platforms (Kafka)
  • Deep understanding of data architecture, data mesh, data federation, and regulatory data requirements
  • Exceptional leadership, communication, stakeholder management, and decision-making capabilities
  • Experience with cloud platforms (AWS, Azure, GCP, Databricks) and modern data ecosystems
Job Responsibility
Job Responsibility
  • Lead multiple agile scrum teams comprising ~15+ engineers, including hybrid teams of human engineers and AI-assisted development (Devin.AI, Copilot), ensuring delivery excellence and alignment with business priorities
  • Define and execute the enterprise strategy for Python engineering, AI agent platforms, and full-stack data applications, aligned with Retail and Wealth Risk objectives
  • Serve as the senior architect and technical authority for enterprise-scale AI agents, data engineering pipelines, and microservices-based applications, ensuring scalability, resilience, and security
  • Drive the adoption and operationalization of AI Product Development Lifecycle (AI PDLC), including model governance, evaluation, deployment, monitoring, and compliance with Model Risk Management (MRM)
  • Lead development of high-volume data pipelines and data federation layers using PySpark, Databricks, Kafka, and Data Mesh architecture to support regulatory reporting (CCAR, FDIC) and risk analytics
  • Architect and oversee GenAI agent ecosystems using LLMs (Google ADK, Gemini/Flash), implementing Human-in-the-Loop (HITL) frameworks to ensure explainability, auditability, and compliance
  • Drive AI-augmented software development lifecycle, integrating tools such as Devin.AI, GitHub Copilot, and MCP platforms through advanced prompt engineering and governance guardrails
  • Lead microservices and cloud-native architecture using FastAPI/Spring Boot, Kubernetes/OpenShift, and CI/CD pipelines, ensuring high availability and performance
  • Drive engineering efficiency and standardization by reusing and repurposing enterprise-level frameworks, platforms, and tools, reducing duplication and accelerating delivery across teams
  • Ensure all engineering solutions incorporate data governance and non-functional requirements, including Data Quality (DQ), data lineage, data tracing, and auditability, aligned with enterprise governance processes and regulatory expectations
  • Fulltime
Read More
Arrow Right