Data Engineering Python and Pyspark - Assistant Vice President Job at Citi (Chennai)

Data Engineer (Big Data, Python, Databricks) - Assistant Vice President

The Applications Development Senior Programmer Analyst is an intermediate level ...

Location

India , Chennai, Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

5-8 years of relevant handson experience in Big Data technologies like Cloudera, Python, HQL, Java/PySpark
Knowledge on Machine Learning, AI would be added advantage
Experience in systems analysis, data analysis and programming of software applications
Experience in managing and implementing successful projects
Working knowledge of consulting/project management techniques/methods
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor’s degree/University degree or equivalent experience

Job Responsibility

Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
Ensure essential procedures are followed and help define operating standards and processes
Serve as advisor or coach to new or lower level analysts
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Python Full Stack Data Engineer - Assistant Vice President

We are assembling an A-team of highly skilled, autonomous, and AI-first engineer...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns

Job Responsibility

Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same

Fulltime

New

Python Engineering AI Lead-Assistant Vice President

We are seeking a highly motivated and experienced Principal Engineer to join our...

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years of overall experience in large-scale application development with recent mandatory platform for the secure and scalable deployment of AI agents into application contexts
Minimum of 5+ years of proven experience in a Python and pyspark Engineering lead role focused on building enterprise-grade, high-volume ELT/ETL processes using the PySpark and Databricks ecosystem
Hands-on experience with agentic AI development using YAML, JSON, FAST API or Spring boot, Google ADK, LLM itegrations, including Devin.AI or Github Copilot, and integrating models via platforms like MCP using advanced prompt engineering
Proven experience developing and automating microservice integrations to support data-intensive applications
Proficiency in at least one programming language commonly used for data analytics, engineering, such as Python or Scala
Strong SQL skills and experience with various relational databases
Deep understanding of data modeling, data warehousing concepts, Data Mesh architecture, and data federation
Excellent communication, collaboration, and problem-solving skills

Job Responsibility

Design, develop, and maintain scalable, enterprise-grade AI agents, supporting ELT/ETL processes to handle large data volumes using the Python, FAST API, Microservices, PySpark, Kafka and Databricks ecosystem
Build and Deploy GEN AI Agents using Googles ADK and Google Flash 2.5+ LLMs to support application automation supports and its deep insights, workflow support with HIL - Human in loop architecture
Build and maintain data federation layers for lambda and Data Mesh architectures using tools like Starburst, with a strategy for adopting AI-based use cases (e.g., machine learning, deep learning, NLP) to drive efficiency
Develop, deploy, and automate microservice integrations to support data-intensive applications, ensuring scalability, resilience, and maintainability using cloud native infrastructure and openshift or Kubernates architecture including CI/CD pipelines
Integrate and leverage agentic AI tools (e.g., Devin.AI, Github Copilot) and platforms (e.g., MCP) through advanced prompt engineering to enhance development and operational efficiency
Ensure data quality, integrity, and security throughout the entire data lifecycle
Contribute to the continuous improvement of data engineering processes, standards, and best practices within the team
Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citi, its clients, and assets by driving compliance with applicable laws, rules, and regulations
Adhere to Policy, apply sound ethical judgment, and escalate, manage, and report control issues with transparency

Fulltime

Business Data Analyst - Assistant Vice President

This role is within enterprise data office and product solution team; focused on...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

9+ years of combined experience in banking and financial services industry, information technology and/or data controls and governance
Preferably Engineering Graduate with Post Graduation in Finance
Extensive experience in Capital Markets business and processes
Deep understanding of Derivative products (i.e., Equities, FX, IRS, Commodities etc.) Or SFT (Repo, Reverse Repo, Securities Lending and Borrowing)
Strong Data analysis skills using Excel, SQL, Python, Pyspark etc.
AI-Accelerated Data Analysis: Leverage AI-powered tools (e.g., AutoML, Python libraries) to clean, analyze, and visualize data from structured and unstructured sources
Well versed with Prompt Engineering & Automation utilizing GenAI and LLMs
Experience with data management processes and tools and applications, including process mapping and lineage toolsets
Actively managed various aspects of data initiatives including analysis, planning, execution, and day-to-day production management
Ability to identify and solve problems throughout the product development process

Job Responsibility

Understand Derivatives and SFT data flows within CITI
Data analysis for derivatives products and SFT across systems for target state adoption and resolution of data gaps/issues
Lead assessment of end-to-end data flows for all data elements used in Regulatory Reports
Document current and target states data mapping and produce gap assessment
Coordinate with the business for identifying critical data elements, defining standards and quality expectations, and prioritize remediation of data issues
Identify appropriate strategic source for critical data elements
Design and Implement data governance controls including data quality rules and data reconciliation
Design systematic solution for elimination of manual processes/adjustments and remediation of tactical solutions
Prepare detailed requirement specifications containing calculations, data transformations and aggregation logic
Perform functional testing and data validations

Fulltime

Business Data Analyst - Assistant Vice President

This role is within enterprise data office and product solution team; focused on...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

9+ years of combined experience in banking and financial services industry, information technology and/or data controls and governance
Preferably Engineering Graduate with Post Graduation in Finance
Extensive experience in Capital Markets business and processes
Deep understanding of Derivative products (i.e., Equities, FX, IRS, Commodities etc.) Or SFT (Repo, Reverse Repo, Securities Lending and Borrowing)
Strong Data analysis skills using Excel, SQL, Python, Pyspark etc.
AI-Accelerated Data Analysis: Leverage AI-powered tools (e.g., AutoML, Python libraries) to clean, analyze, and visualize data from structured and unstructured sources
Well versed with Prompt Engineering & Automation utilizing GenAI and LLMs
Experience with data management processes and tools and applications, including process mapping and lineage toolsets
Actively managed various aspects of data initiatives including analysis, planning, execution, and day-to-day production management
Ability to identify and solve problems throughout the product development process

Job Responsibility

Understand Derivatives and SFT data flows within CITI
Data analysis for derivatives products and SFT across systems for target state adoption and resolution of data gaps/issues
Lead assessment of end-to-end data flows for all data elements used in Regulatory Reports
Document current and target states data mapping and produce gap assessment
Coordinate with the business for identifying critical data elements, defining standards and quality expectations, and prioritize remediation of data issues
Identify appropriate strategic source for critical data elements
Design and Implement data governance controls including data quality rules and data reconciliation
Design systematic solution for elimination of manual processes/adjustments and remediation of tactical solutions
Prepare detailed requirement specifications containing calculations, data transformations and aggregation logic
Perform functional testing and data validations

Fulltime

Genai engineer data science-assistant vice president

Data Science, Assistant Vice President – Analytics & Information Management (AIM...

Location

India , Gurugram, Haryana

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years of experience in data analytics roles
Proficiency in analytics tools/technologies like SQL, SAS, Python, PySpark
Sound knowledge of machine learning/deep learning and statistical modeling techniques
Experience working with Machine Learning software frameworks and relevant Python libraries (e.g., scikit-learn, xgboost, Keras, NLTK, BERT, TensorFlow)
Hands-on experience in PySpark/Python/R programming along with strong experience in SQL
Experience working with large and multiple datasets, data warehouses
Strong background in Statistical Analysis
Experience working on Transformers/ LLMs (OpenAI, Claude, Gemini), Prompt engineering, RAG based architectures and relevant tools/frameworks (TensorFlow, PyTorch, Hugging Face Transformers, LangChain/Graph, LlamaIndex)
Understanding of transformers/language models
Familiarity with vector databases and fine-tuning techniques

Job Responsibility

Drive the development and implementation of analytical solutions to support key business objectives for Banking Operations & Analytics
Work with large, complex and unstructured data using a variety of tools (Python, PySpark, SQL, R) to build modeling solutions
Primary focus areas would be model building, model validations, model implementation and model governance related responsibilities for multiple portfolios
Responsible for documenting data requirements, data collection/processing/cleaning, and exploratory data analysis
Work with other members in the team and business partners to jointly build model driven solutions using traditional methods as well as Machine Learning driven modeling solutions
Work with model governance & fair lending teams to ensure compliance of models in accordance with Citi standards

Fulltime

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

The Applications Development Senior Programmer Analyst is a senior-level positio...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

7-10 years of relevant experience in Data Engineering or a similar role, preferably within the Financial Services industry
Senior-level experience in an Applications Development or Data Engineering role
Consistently demonstrates clear and concise written and verbal communication
Demonstrated problem-solving and decision-making skills
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor's degree/University degree or equivalent experience
Hands-on expertise in Java (8+), Spring Boot, Python, and PySpark for building high-performance data applications
Extensive experience with the BigData ecosystem, including Apache Spark for large-scale data processing
Solid understanding of Data Warehouse concepts, design principles, and best practices
Strong proficiency with both relational SQL databases and NoSQL databases (e.g., MongoDB, Couchbase)

Job Responsibility

Utilize expert knowledge of data engineering principles, big data technologies, and software development best practices to design and implement robust data solutions
Collaborate with business stakeholders, data scientists, and other technology teams to understand data requirements and deliver effective solutions
Apply deep expertise in programming languages like Python and Java for building high-performance data processing applications
Ensure data solutions are secure, scalable, and adhere to the firm's security and architectural standards
Mentor and guide junior engineers, fostering a culture of technical excellence and continuous learning
Lead the analysis of complex data-related issues, identify root causes, and implement sustainable solutions
Operate with a high degree of autonomy and independence, exercising sound judgment and decision-making
Act as a Subject Matter Expert (SME) in big data technologies for senior stakeholders and other team members
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency

Fulltime

New

Principal Data Genai Platform Engineer - Senior Vice President

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

12+ years of relevant experience in enterprise application development, data engineering, or AI platform engineering, with a strong track record of leadership in regulated environments
8+ years of experience leading multi-team Agile organizations (20+ engineers), including managing distributed and hybrid AI-assisted teams
Advanced expertise in Python, PySpark, and Databricks ecosystem for large-scale data processing and ELT/ETL pipelines
Proven experience architecting and implementing enterprise AI/GenAI platforms, including agentic AI frameworks, LLM integrations, and prompt engineering
Hands-on experience with AI-assisted development tools such as Devin.AI and GitHub Copilot and integrating them into engineering workflows
Strong experience with microservices architecture, APIs, and cloud-native deployment (Kubernetes/OpenShift)
Strong experience with event-driven architectures and streaming platforms (Kafka)
Deep understanding of data architecture, data mesh, data federation, and regulatory data requirements
Exceptional leadership, communication, stakeholder management, and decision-making capabilities
Experience with cloud platforms (AWS, Azure, GCP, Databricks) and modern data ecosystems

Job Responsibility

Lead multiple agile scrum teams comprising ~15+ engineers, including hybrid teams of human engineers and AI-assisted development (Devin.AI, Copilot), ensuring delivery excellence and alignment with business priorities
Define and execute the enterprise strategy for Python engineering, AI agent platforms, and full-stack data applications, aligned with Retail and Wealth Risk objectives
Serve as the senior architect and technical authority for enterprise-scale AI agents, data engineering pipelines, and microservices-based applications, ensuring scalability, resilience, and security
Drive the adoption and operationalization of AI Product Development Lifecycle (AI PDLC), including model governance, evaluation, deployment, monitoring, and compliance with Model Risk Management (MRM)
Lead development of high-volume data pipelines and data federation layers using PySpark, Databricks, Kafka, and Data Mesh architecture to support regulatory reporting (CCAR, FDIC) and risk analytics
Architect and oversee GenAI agent ecosystems using LLMs (Google ADK, Gemini/Flash), implementing Human-in-the-Loop (HITL) frameworks to ensure explainability, auditability, and compliance
Drive AI-augmented software development lifecycle, integrating tools such as Devin.AI, GitHub Copilot, and MCP platforms through advanced prompt engineering and governance guardrails
Lead microservices and cloud-native architecture using FastAPI/Spring Boot, Kubernetes/OpenShift, and CI/CD pipelines, ensuring high availability and performance
Drive engineering efficiency and standardization by reusing and repurposing enterprise-level frameworks, platforms, and tools, reducing duplication and accelerating delivery across teams
Ensure all engineering solutions incorporate data governance and non-functional requirements, including Data Quality (DQ), data lineage, data tracing, and auditability, aligned with enterprise governance processes and regulatory expectations

Fulltime

Select Country

Data Engineering Python and Pyspark - Assistant Vice President

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?