Big Data / PySpark Engineering Lead - Vice President Job at Citi (Pune)

Pyspark Big Data Senior Developer - Vice President

We are building an A-team of highly skilled and autonomous engineers, and we are...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of extensive, hands-on experience as a Senior Big Data Developer, with a strong emphasis on PySpark and the Apache Spark ecosystem, operating as a player/coach
Expert proficiency in Python, with a proven track record of developing robust, scalable, and high-performance PySpark applications for large-scale data processing
Deep understanding and extensive hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming) and its ecosystem
Experience with distributed computing frameworks such as Hadoop (HDFS, YARN)
Expert proficiency in SQL and extensive experience with data warehousing concepts and technologies (e.g., Hive, Snowflake, Redshift, Databricks SQL)
Proven experience with various data storage formats (e.g., Parquet, ORC, Avro) and data lake solutions (e.g., Delta Lake, Iceberg)
Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase) is a significant plus
Strong experience with Apache Kafka for building real-time data pipelines and event-driven architectures
Demonstrated experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift, Azure Databricks/Data Factory/Synapse, GCP Dataflow/Dataproc/BigQuery) is highly desirable
Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is a mandatory requirement

Job Responsibility

Operate end-to-end in the design, development, and implementation of robust big data solutions, ensuring optimal performance, scalability, data quality, and security
Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for big data processing and analytical solutions
Act as a player/coach within the team, mentoring junior members and leading by example in the development of efficient and innovative big data architectures
Design, develop, and optimize large-scale data pipelines using PySpark for data ingestion, transformation, and aggregation, always with an eye towards efficiency and domain relevance
Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka
Design and implement sophisticated data warehousing solutions and dimensional models for efficient data storage and retrieval, ensuring alignment with business needs
Work with various distributed data storage technologies, including distributed file systems (e.g., HDFS, S3) and NoSQL databases (e.g., MongoDB, Cassandra), selecting the right tool for the right problem
Implement efficient data processing and storage strategies to optimize the performance and scalability of big data applications, with a strong focus on the 'why' behind the technology choices
Champion best practices in software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines
Demonstrate high autonomy and agency in driving projects forward, making informed decisions, and proactively identifying areas for improvement

Fulltime

New

Python Full Stack Data Engineer - Assistant Vice President

We are assembling an A-team of highly skilled, autonomous, and AI-first engineer...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns

Job Responsibility

Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same

Fulltime

Fullstack Big Data Developer Application Development Technical Lead Analyst Vice President

Discover your future at Citi. Working at Citi is far more than just a job. A car...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of Application development experience
6+ years of experience in full stack development, with a focus on Bigdata and Python/Scala
6+ years experience with big data technologies such as Python, Pyspark, Hadoop, Kafka, etc.
Experience with Core Java/J2EE Application with complete command over OOPs and Design Patterns
Commendable in Data Structures and Algorithms
Worked on Core Application Development of complex size encompassing all areas of Java/J2EE
Thorough knowledge and hands on experience in following technologies Hadoop, Map Reduce Framework, Spark, YARN, Sqoop, Pig , Hue, Unix, Java, Sqoop, Impala, Cassandra on Mesos
Should have implemented or part complex project execution in Big Data Spark eco system, where processing volumes of data thorough understanding of distributed processing and integrated applications
Exposure to ETL and BI tools
Work in an agile environment following through the best practices of agile Scrum

Job Responsibility

Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Design, develop, and maintain scalable and robust architecture for the project using Java/Python/Scala and other full stack technologies
Manage big data technologies such as python, pyspark to ensure seamless data integration, storage, and analysis

Fulltime

Apps Dev Tech Lead Analyst - Vice President

As a key member of our global development team, you will: Innovate & Develop: Pa...

Location

United States , Irving

Salary:

125760.00 - 188640.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6-10 years of progressive experience in systems analysis and programming of software applications
Strong proficiency in Java application technologies, including deep experience with TDD (Test-Driven Development), Spring framework, and Microservices architecture
Extensive hands-on experience with PySpark and advanced Python programming skills
Proven experience with Big Data ecosystems, including Cloudera and/or Data Bricks
Hands-on experience with distributed query engines like Starburst (Trino/Presto)
Proficient in designing and managing complex workflows using scheduling tools, particularly Apache Airflow
Strong expertise in SQL and experience with relational and non-relational databases
Excellent knowledge of algorithms and data structures, design patterns
Strong Java experience: Java core, collections, concurrency, streams
Frameworks and APIs: Spring (Core, Batch, Integration, MVC, Boot, Data), Hibernate, Jackson, JAX RS, JPA, JAXB

Job Responsibility

Innovate & Develop: Partner closely with project managers, business stakeholders, and senior managers to translate complex business requirements into well-architected technical solutions
Drive cross-functional collaboration with diverse management teams
Proactively identify, define, and implement necessary system enhancements
Complex Problem Resolution: Lead the resolution of high-impact problems and critical projects
Consult with users, clients, and other technology groups on issues
Technical Architecture & Standards Leadership: Serve as a subject matter expert in application programming
Leverage an advanced understanding of system flow to develop and enforce robust standards for coding, testing, debugging, and implementation
Mentorship & Talent Development: Act as a trusted advisor and coach for mid-level developers and analysts
Provide technical guidance, mentorship, and code reviews to junior data engineers
Operational Excellence: Ensure adherence to best practices and essential procedures

What we offer

medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages including planned time off (vacation), unplanned time off (sick leave), and paid holidays

Fulltime

Data Analytics Lead - Data Scientist - Vice President

The Data Analytics Lead / Data Scientist is a strategic professional who stays a...

Location

India , Chennai; Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

10-15 years of relevant experience in Data Analytics, Data Science, or Advanced Analytics roles
Advanced proficiency in SQL and relational database concepts
Strong programming experience in Python (required)
PySpark preferred
Hands-on experience building and deploying machine learning models (supervised and unsupervised)
Experience with ML libraries such as scikit-learn, XGBoost, TensorFlow, or PyTorch
Strong knowledge of statistical modeling, feature engineering, and model validation techniques
Experience with BI tools such as Tableau or Power BI
Familiarity with MLOps practices (model deployment, monitoring, versioning) is strongly preferred
Experience working with large-scale enterprise or financial datasets

Job Responsibility

Integrates subject matter and industry expertise within a defined area
Contributes to data analytics standards around which others will operate
Applies in-depth understanding of how data analytics collectively integrate within the sub-function as well as coordinate and contribute to the objectives of the entire function
Employs developed communication and diplomacy skills are required in order to guide, influence and convince others, in particular colleagues in other areas and occasional external customers
Resolves occasionally complex and highly variable issues
Produces detailed analysis of issues where the best course of action is not evident from the information available, but actions must be recommended/ taken
Responsible for volume, quality, timeliness and delivery of data science projects along with short-term planning resource planning
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Lead the design and execution of complex data analysis and AI/ML initiatives across large, structured, and unstructured datasets
Develop and deploy predictive, classification, clustering, and forecasting models to support business strategy and risk management

Fulltime

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

The Applications Development Senior Programmer Analyst is a senior-level positio...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

7-10 years of relevant experience in Data Engineering or a similar role, preferably within the Financial Services industry
Senior-level experience in an Applications Development or Data Engineering role
Consistently demonstrates clear and concise written and verbal communication
Demonstrated problem-solving and decision-making skills
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor's degree/University degree or equivalent experience
Hands-on expertise in Java (8+), Spring Boot, Python, and PySpark for building high-performance data applications
Extensive experience with the BigData ecosystem, including Apache Spark for large-scale data processing
Solid understanding of Data Warehouse concepts, design principles, and best practices
Strong proficiency with both relational SQL databases and NoSQL databases (e.g., MongoDB, Couchbase)

Job Responsibility

Utilize expert knowledge of data engineering principles, big data technologies, and software development best practices to design and implement robust data solutions
Collaborate with business stakeholders, data scientists, and other technology teams to understand data requirements and deliver effective solutions
Apply deep expertise in programming languages like Python and Java for building high-performance data processing applications
Ensure data solutions are secure, scalable, and adhere to the firm's security and architectural standards
Mentor and guide junior engineers, fostering a culture of technical excellence and continuous learning
Lead the analysis of complex data-related issues, identify root causes, and implement sustainable solutions
Operate with a high degree of autonomy and independence, exercising sound judgment and decision-making
Act as a Subject Matter Expert (SME) in big data technologies for senior stakeholders and other team members
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency

Fulltime

New

Senior PySpark Developer - Vice President

We are seeking a highly skilled and experienced Senior PySpark Developer to join...

Location

United States , Tampa

Salary:

113840.00 - 170760.00 USD / Year

Citi

Expiration Date

June 05, 2026

Requirements

10+ years of experience in Applications Development, Systems Analysis, or equivalent senior engineering roles
Extensive hands‑on experience delivering enterprise‑scale, database‑driven platforms in a regulated environment
Expert-level proficiency in Python programming, including object-oriented design, data structures, algorithms, and extensive experience with various Python libraries
Deep expertise in developing, optimizing, and deploying PySpark applications for large-scale data processing, ETL, and real-time analytics on distributed systems (e.g., Spark SQL, Spark Streaming, DataFrames)
Strong understanding of Apache Spark architecture, Hadoop ecosystem, and experience with distributed computing concepts. Familiarity with big data storage formats (e.g., Parquet, ORC)
Solid experience with both relational databases (e.g., Oracle) and NoSQL databases (e.g., MongoDB). Strong SQL writing and optimization skills
Experience in designing, developing, and consuming RESTful APIs using Python frameworks (e.g., Flask, FastAPI, Django REST Framework)
Strong understanding and practical experience with CI/CD tools (e.g., Jenkins) and containerization technologies (Docker, Kubernetes)
Expert-level proficiency with Git
Experience with unit testing (e.g., Pytest), integration testing, and performance testing frameworks for Python and PySpark applications

Job Responsibility

Design, develop, and implement robust, scalable, and high-performance data pipelines and applications using Python, PySpark, and Big Data technologies
Work autonomously to analyze requirements, propose technical solutions, and deliver high-quality code and data products, ensuring alignment with architectural standards and business objectives
Utilize expertise in various Big Data platforms (e.g., Hadoop, Hive, Kafka, Spark) to process, transform, and manage large datasets efficiently
Write complex SQL queries, stored procedures, and optimize database performance for large-scale data warehousing and analytics solutions
Develop and enhance ETL (Extract, Transform, Load) processes, ensuring data quality, integrity, and timely delivery. Experience with various ETL tools and methodologies is a plus
Proactively research, evaluate, and integrate new and emerging technologies, frameworks, and tools to improve development processes and solution capabilities
Ensure adherence to coding standards, conduct thorough code reviews, and implement best practices for software development, data governance, and security
Diagnose and resolve complex technical issues related to data pipelines, performance bottlenecks, and system integrations in a fast-paced environment
Collaborate effectively with cross-functional teams including architects, data scientists, business analysts, and QA engineers. Provide technical guidance and mentorship to junior team members
Identify opportunities to use AI tools to speed up development, code reviews, unit testing and deployment.

What we offer

medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
discretionary and formulaic incentive and retention awards

Fulltime

New

Senior Data Engineer - Vice President

Citi is seeking a highly skilled and experienced Senior Data Engineer to join ou...

Location

United States , Irving

Salary:

125760.00 - 188640.00 USD / Year

Citi

Expiration Date

May 18, 2026

Requirements

Expert-level proficiency with Python and its data ecosystem (e.g., Pandas, NumPy, Dask). Extensive hands-on experience with the Spark framework, including deep knowledge of the DataFrame API, Spark SQL, and performance tuning techniques for distributed data processing
Proven experience developing on the Databricks Lakehouse Platform, including proficiency with Delta Lake, structured streaming, and optimizing Spark jobs within the Databricks environment
Strong, practical experience with the Ab Initio suite of products (GDE, Co>Operating System, Conduct>It) for designing and implementing enterprise-grade ETL workflows
Hands-on experience designing, building, and maintaining data warehouses in Snowflake
Experience using federated query engines like Starburst/Trino
Familiarity or experience with open table formats like Apache Iceberg for managing large analytic datasets
In-depth knowledge and multi-year experience with at least one major cloud provider (AWS, Google Cloud Platform, or Azure)
Practical experience building and managing data pipelines using cloud-native services such as AWS Glue, Lambda, S3, Redshift
Azure Data Factory, Synapse Analytics
or Google Cloud Composer, Dataflow, and BigQuery

Job Responsibility

Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks
Implement and manage data solutions on cloud platforms
Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg
Optimize Spark workloads and Databricks clusters
Implement and manage Lakehouse architecture using Delta Lake
Lead the design and architecture of Starburst-based data solutions
Implement and manage data federation strategies using Starburst connectors
Identify and resolve performance bottlenecks in data pipelines and queries
Develop and optimize robust data pipelines with a strong focus on data governance
Design and implement data models that support business intelligence, analytics, and machine learning use cases

What we offer

Medical, dental & vision coverage
401(k)
Life, accident, and disability insurance
Wellness programs
Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
Discretionary and formulaic incentive and retention awards

Fulltime

Select Country

Big Data / PySpark Engineering Lead - Vice President

Citi

Location:
India , Pune

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
March 01, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Big Data / PySpark Engineering Lead - Vice President

Pyspark Big Data Senior Developer - Vice President

Python Full Stack Data Engineer - Assistant Vice President

Fullstack Big Data Developer Application Development Technical Lead Analyst Vice President

Apps Dev Tech Lead Analyst - Vice President

Data Analytics Lead - Data Scientist - Vice President

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

Senior PySpark Developer - Vice President

Senior Data Engineer - Vice President

Our AI answers in your language

Big Data / PySpark Engineering Lead - Vice President

Citi

Location:India , Pune

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:March 01, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Big Data / PySpark Engineering Lead - Vice President

Pyspark Big Data Senior Developer - Vice President

Python Full Stack Data Engineer - Assistant Vice President

Fullstack Big Data Developer Application Development Technical Lead Analyst Vice President

Apps Dev Tech Lead Analyst - Vice President

Data Analytics Lead - Data Scientist - Vice President

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

Senior PySpark Developer - Vice President

Senior Data Engineer - Vice President

Location:
India , Pune

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
March 01, 2026