Lead Data Engineer Spark and SQL – Vice President Job at Citi (Mississauga)

Big Data / PySpark Engineering Lead - Vice President

The Applications Development Technology Lead Analyst is a senior level position ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Highly experienced and skilled technical lead with 12+years of experience with software building and platform engineering
Experience in Data Engineering, focused on Big Data ecosystems
Knowledge in Hadoop, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
Programming Expert level and hand on experience in Python
Familiarity with data formats like Avro, Parquet, CSV, JSON
Hands-on experience in writing SQL queries
Highly experienced with Unix based operating systems and shell scripting
Experience with source code management tools such as Bitbucket, Git etc
Big Data Tech Proficiency and hands-on in Hadoop, Spark, Hive, Kafka, and NoSQL databases (MongoDB, HBase)
Experience working with query engines like Trino, Presto, Starburst

Job Responsibility

Design and implement scalable, fault-tolerant batch and real-time data processing pipelines
Develop robust data models and schema designs optimized for both performance and storage efficiency
Evaluate and integrate emerging tools and frameworks (e.g., Spark, Flink, Kafka) into the existing stack
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Legacy Systems Decommissioning: Lead the strategic migration of data and logic from legacy platforms (e.g. on-premises SQL Servers) to a modern Data Lakehouse environment
ETL/ELT Transformation: Re-engineer existing stored procedures and complex legacy ETL jobs into scalable, distributed processing frameworks using Spark (Python) and Starburst/Trino
Validation & Parity Testing: Design and implement automated frameworks for Data Parity Testing to ensure 100% accuracy and consistency between legacy outputs and new big data results
Schema Evolution: Map and transform rigid, legacy relational schemas into flexible, high-performance formats optimized for the cloud (e.g., Parquet, Avro, or Iceberg)
Phased Cutover Management: Orchestrate a phased migration strategy (Parallel Run, Shadow Execution) to ensure zero downtime for downstream business applications and reporting tools

Fulltime

Vice President, Data Platform Engineering

Mastercard is seeking a Vice President, Data Platform Engineering, responsible f...

Location

United States of America , O’Fallon, MO; Arlington, VA

Salary:

200000.00 - 368000.00 USD / Year

Mastercard

Expiration Date

Until further notice

Requirements

Hands-on experience in data engineering, data platform strategy, or a related technical domain
Proven experience leading global data engineering or platform engineering teams
Proven experience in building and modernizing distributed data platforms using technologies such as Apache Spark, Kafka, Flink, NiFi, and Cloudera/Hadoop
Strong experience with one or more of data pipeline tools (Nifi, Airflow, dbt, Spark, Kafka, Dagster, etc.) and distributed data processing at scale
Proficiency in Python, SQL, and data ecosystems (Oracle, AWS Glue, Azure Data Factory, BigQuery, Snowflake, etc.)
Deep understanding of data modeling, metadata management, and data governance principles
Proven success in leading technical teams and managing complex, cross-functional projects
Excellent communication skills, with the ability to tailor technical concepts to executive, operational, and technical audiences
Expertise and ability to lead technical decision-making considering scalability, cost efficiency, stakeholder priorities, and time to market
Proven track leading high-performing teams with experience leading and coaching director level reports and experienced individual contributors

Job Responsibility

Drive modernization from legacy and on-prem systems to modern, cloud-native, and hybrid data platforms
Architect and lead the development of a Multi-Agent ETL Platform for batch and event streaming, integrating AI agents to autonomously manage ETL tasks such as data discovery, schema mapping, and error resolution
Define and implement data ingestion, transformation, and delivery pipelines using scalable frameworks (e.g., Apache Airflow, Nifi, dbt, Spark, Kafka, or Dagster)
Leverage LLMs, and agent frameworks (e.g., LangChain, CrewAI, AutoGen) to automate pipeline management and monitoring
Ensure robust data governance, cataloging, versioning, and lineage tracking across the ETL platform
Define project roadmaps, KPIs, and performance metrics for platform efficiency and data reliability
Establish and enforce best practices in data quality, CI/CD for data pipelines, and observability
Collaborate closely with cross-functional teams (Data Science, Analytics, and Application Development) to understand requirements and deliver efficient data ingestion and processing workflows
Establish and enforce best practices, automation standards, and monitoring frameworks to ensure the platform’s reliability, scalability, and security
Build relationships and communicate effectively with internal and external stakeholders, including senior executives, to influence data-driven strategies and decisions

What we offer

Insurance (including medical, prescription drug, dental, vision, disability, life insurance)
flexible spending account and health savings account
paid leaves (including 16 weeks of new parent leave and up to 20 days of bereavement leave)
80 hours of Paid Sick and Safe Time, 25 days of vacation time and 5 personal days, pro-rated based on date of hire
10 annual paid U.S. observed holidays
401k with a best-in-class company match
deferred compensation for eligible roles
fitness reimbursement or on-site fitness facilities
eligibility for tuition reimbursement

Fulltime

New

Apps Dev Tech Lead Analyst - Vice President

The Applications Development Technology Lead Analyst is responsible for establis...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

Proficient on Spark, Data Engineering and architecture for data flows
Proven experience with current DevOps and engineering excellence tools like BitBucket, TeamCity, uDeploy, JIRA, etc.
Experience on design of cloud technology and microservice architecture
10+ years of ETL technology experience and extensive experience with SQL, Analytics, Linux/ AIX Unix Shell scripting, Windows, MAVEN or ANT, Bitbucket, GIT or Source code control equivalent
Proven architecture experience in a complex, large organization, TOGAF certification an asset
Financial background in Canadian banking accounting practices an asset
Self-starter and a "do whatever it takes" attitude
Experience with the following: Hibernate, Ajax programming, IBM WebSphere MQ, C/C++, Perl, Python, Spring, Camel, Autosys, Talend or similar ETL tools, usage of Load balancer /WIP, SSL certificate configuration and deployment process
Bachelor's degree/University degree or equivalent experience

Job Responsibility

Establishing and implementing new or revised application systems and programs
Leading applications systems analysis and programming activities
Being responsible for many fast-changing, moving parts and get them to come together as a product
Excellent communication and collaboration skills
Identifying and managing risks, making sound judgments about quality, and speed of deliverables and deployment to production
Key player in Data transformation to digitize our business
Leading development on Vanguard Big Data platform

Fulltime

New

Python Full Stack Data Engineer - Assistant Vice President

We are assembling an A-team of highly skilled, autonomous, and AI-first engineer...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

Experience: 4+ years of progressive, hands-on experience as a Data Engineer, with a proven track record of delivering complex, large-scale data solutions
Expert-level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production-grade PySpark applications for mission-critical data processing
Deep understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming)
Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex queries
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Proficiency in SQL, complex query optimization, and advanced data warehousing concepts (e.g., dimensional modeling, data vault, data lakes)
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase) and understanding of their architectural trade-offs
Expert-level experience with Apache Kafka, including design and implementation of high-throughput, low-latency real-time data pipelines and event-driven architectures
Extensive experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud-native architectural patterns

Job Responsibility

Operate end-to-end in the design, development, and implementation of full-stack data solutions, ensuring optimal performance, scalability, data quality, security, and compliance across the data lifecycle
Collaborate closely within small, co-located squads (4-7 person teams), fostering an environment of high communication and minimal coordination overhead, to deliver impactful data products
Develop, maintain, and optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for large-scale datasets
Implement sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (e.g., HDFS, S3), and enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB)
Design and implement scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability
Engage effectively with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation
Implement real-time data streaming and complex event-driven architectures using technologies like Apache Kafka, ensuring low-latency data availability for critical business functions
Adhere to and contribute to best practices in data engineering and software development, participating in rigorous code reviews, implementing comprehensive automated testing strategies, and supporting robust CI/CD pipelines within a DevOps culture
Exhibit High Autonomy and Agency, taking ownership of technical challenges, making well-reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape
Innovate with AI-Powered Development, actively leveraging, integrating, and contributing to AI coding tools (e.g., internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to significantly enhance productivity, code quality, and development velocity, and inspiring others to do the same

Fulltime

Vice President, Big Data Scala Engineer

We are seeking an experienced and highly skilled Vice President, Big Data Scala ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field
12+ years of progressive experience in software development, with at least 5+ years focusing on big data technologies
3+ years of experience in a leadership or senior architectural role
Extensive hands-on experience with Scala for big data processing
Demonstrated expertise with Apache Spark (Spark Core, Spark SQL, Spark Streaming)
Strong experience with distributed systems and big data ecosystems (e.g., Hadoop, Kafka, Cassandra, HBase, Delta Lake, Snowflake, Databricks)
Proficiency with cloud platforms (AWS, Azure, GCP) and their big data services (e.g., EMR, Redshift, Glue, DataProc, BigQuery)
Experience with containerization technologies (Docker, Kubernetes) and CI/CD pipelines
Solid understanding of data warehousing concepts, ETL/ELT processes, and data modeling
Familiarity with functional programming paradigms in Scala

Job Responsibility

Lead the architecture, design, and development of high-performance, scalable, and reliable big data processing systems using Scala and Apache Spark
Drive technical vision and strategy for big data initiatives
Evaluate and recommend new technologies and tools
Design, develop, and optimize data pipelines for ingestion, transformation, and storage of massive datasets
Implement robust and efficient data processing jobs using Scala and Spark (batch and streaming)
Ensure data quality, integrity, and security
Promote and enforce best practices in coding, testing, and deployment
Mentor and guide a team of talented big data engineers
Conduct code reviews, provide constructive feedback
Participate in the recruitment and hiring

What we offer

Opportunity to work on cutting-edge big data technologies and impactful projects
A collaborative and innovative work environment
Competitive compensation and benefits package
Opportunities for professional growth and career advancement

Fulltime

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

The Applications Development Senior Programmer Analyst is a senior-level positio...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

7-10 years of relevant experience in Data Engineering or a similar role, preferably within the Financial Services industry
Senior-level experience in an Applications Development or Data Engineering role
Consistently demonstrates clear and concise written and verbal communication
Demonstrated problem-solving and decision-making skills
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor's degree/University degree or equivalent experience
Hands-on expertise in Java (8+), Spring Boot, Python, and PySpark for building high-performance data applications
Extensive experience with the BigData ecosystem, including Apache Spark for large-scale data processing
Solid understanding of Data Warehouse concepts, design principles, and best practices
Strong proficiency with both relational SQL databases and NoSQL databases (e.g., MongoDB, Couchbase)

Job Responsibility

Utilize expert knowledge of data engineering principles, big data technologies, and software development best practices to design and implement robust data solutions
Collaborate with business stakeholders, data scientists, and other technology teams to understand data requirements and deliver effective solutions
Apply deep expertise in programming languages like Python and Java for building high-performance data processing applications
Ensure data solutions are secure, scalable, and adhere to the firm's security and architectural standards
Mentor and guide junior engineers, fostering a culture of technical excellence and continuous learning
Lead the analysis of complex data-related issues, identify root causes, and implement sustainable solutions
Operate with a high degree of autonomy and independence, exercising sound judgment and decision-making
Act as a Subject Matter Expert (SME) in big data technologies for senior stakeholders and other team members
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency

Fulltime

New

Senior Data Engineer - Vice President

Citi is seeking a highly skilled and experienced Senior Data Engineer to join ou...

Location

United States , Irving

Salary:

125760.00 - 188640.00 USD / Year

Citi

Expiration Date

May 18, 2026

Requirements

Expert-level proficiency with Python and its data ecosystem (e.g., Pandas, NumPy, Dask). Extensive hands-on experience with the Spark framework, including deep knowledge of the DataFrame API, Spark SQL, and performance tuning techniques for distributed data processing
Proven experience developing on the Databricks Lakehouse Platform, including proficiency with Delta Lake, structured streaming, and optimizing Spark jobs within the Databricks environment
Strong, practical experience with the Ab Initio suite of products (GDE, Co>Operating System, Conduct>It) for designing and implementing enterprise-grade ETL workflows
Hands-on experience designing, building, and maintaining data warehouses in Snowflake
Experience using federated query engines like Starburst/Trino
Familiarity or experience with open table formats like Apache Iceberg for managing large analytic datasets
In-depth knowledge and multi-year experience with at least one major cloud provider (AWS, Google Cloud Platform, or Azure)
Practical experience building and managing data pipelines using cloud-native services such as AWS Glue, Lambda, S3, Redshift
Azure Data Factory, Synapse Analytics
or Google Cloud Composer, Dataflow, and BigQuery

Job Responsibility

Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks
Implement and manage data solutions on cloud platforms
Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg
Optimize Spark workloads and Databricks clusters
Implement and manage Lakehouse architecture using Delta Lake
Lead the design and architecture of Starburst-based data solutions
Implement and manage data federation strategies using Starburst connectors
Identify and resolve performance bottlenecks in data pipelines and queries
Develop and optimize robust data pipelines with a strong focus on data governance
Design and implement data models that support business intelligence, analytics, and machine learning use cases

What we offer

Medical, dental & vision coverage
401(k)
Life, accident, and disability insurance
Wellness programs
Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
Discretionary and formulaic incentive and retention awards

Fulltime

!

Pyspark Big Data Senior Developer - Vice President

We are building an A-team of highly skilled and autonomous engineers, and we are...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of extensive, hands-on experience as a Senior Big Data Developer, with a strong emphasis on PySpark and the Apache Spark ecosystem, operating as a player/coach
Expert proficiency in Python, with a proven track record of developing robust, scalable, and high-performance PySpark applications for large-scale data processing
Deep understanding and extensive hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming) and its ecosystem
Experience with distributed computing frameworks such as Hadoop (HDFS, YARN)
Expert proficiency in SQL and extensive experience with data warehousing concepts and technologies (e.g., Hive, Snowflake, Redshift, Databricks SQL)
Proven experience with various data storage formats (e.g., Parquet, ORC, Avro) and data lake solutions (e.g., Delta Lake, Iceberg)
Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase) is a significant plus
Strong experience with Apache Kafka for building real-time data pipelines and event-driven architectures
Demonstrated experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift, Azure Databricks/Data Factory/Synapse, GCP Dataflow/Dataproc/BigQuery) is highly desirable
Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is a mandatory requirement

Job Responsibility

Operate end-to-end in the design, development, and implementation of robust big data solutions, ensuring optimal performance, scalability, data quality, and security
Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for big data processing and analytical solutions
Act as a player/coach within the team, mentoring junior members and leading by example in the development of efficient and innovative big data architectures
Design, develop, and optimize large-scale data pipelines using PySpark for data ingestion, transformation, and aggregation, always with an eye towards efficiency and domain relevance
Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka
Design and implement sophisticated data warehousing solutions and dimensional models for efficient data storage and retrieval, ensuring alignment with business needs
Work with various distributed data storage technologies, including distributed file systems (e.g., HDFS, S3) and NoSQL databases (e.g., MongoDB, Cassandra), selecting the right tool for the right problem
Implement efficient data processing and storage strategies to optimize the performance and scalability of big data applications, with a strong focus on the 'why' behind the technology choices
Champion best practices in software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines
Demonstrate high autonomy and agency in driving projects forward, making informed decisions, and proactively identifying areas for improvement

Fulltime

Lead Data Engineer Spark and SQL – Vice President

Citi

Location:
Canada , Mississauga

Category:
IT - Software Development

Contract Type:
Employment contract

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
May 15, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Lead Data Engineer Spark and SQL – Vice President

Big Data / PySpark Engineering Lead - Vice President

Vice President, Data Platform Engineering

Apps Dev Tech Lead Analyst - Vice President

Python Full Stack Data Engineer - Assistant Vice President

Vice President, Big Data Scala Engineer

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

Senior Data Engineer - Vice President

Pyspark Big Data Senior Developer - Vice President

Our AI answers in your language

Lead Data Engineer Spark and SQL – Vice President

Citi

Location:Canada , Mississauga

Category:IT - Software Development

Contract Type:Employment contract

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:May 15, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Lead Data Engineer Spark and SQL – Vice President

Big Data / PySpark Engineering Lead - Vice President

Vice President, Data Platform Engineering

Apps Dev Tech Lead Analyst - Vice President

Python Full Stack Data Engineer - Assistant Vice President

Vice President, Big Data Scala Engineer

Senior Java -Spark-Bigdata Engineer-Assistant Vice President

Senior Data Engineer - Vice President

Pyspark Big Data Senior Developer - Vice President

Location:
Canada , Mississauga

Category:
IT - Software Development

Contract Type:
Employment contract

Job Posted:
May 15, 2026