PySpark Big Data Developer Job at Citi (Pune)

New

PySpark Big Data Developer

The Applications Development Intermediate Programmer Analyst is an intermediate ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

2-5 years of relevant experience in the Financial Service industry
Intermediate level experience in Applications Development role
Consistently demonstrates clear and concise written and verbal communication
Demonstrated problem-solving and decision-making skills
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor’s degree/University degree or equivalent experience
Enterprise Application Development: 6-8 years in developing and managing enterprise-grade applications
Object-Oriented Programming (OOP): Solid foundation in OOP concepts
Big Data Development: Expertise in PySpark, HDFS, Hive, Sqoop, and Hadoop for Big Data environments
Database Technologies: Good exposure to SQL Server and ORACLE databases

Job Responsibility

Utilize knowledge of applications development procedures and concepts, and basic knowledge of other technical areas to identify and define necessary system enhancements, including using script tools and analyzing/interpreting code
Consult with users, clients, and other technology groups on issues, and recommend programming solutions, install, and support customer exposure systems
Apply fundamental knowledge of programming languages for design specifications
Analyze applications to identify vulnerabilities and security issues, as well as conduct testing and debugging
Serve as advisor or coach to new or lower level analysts
Identify problems, analyze information, and make evaluative judgements to recommend and implement solutions
Resolve issues by identifying and selecting solutions through the applications of acquired technical experience and guided by precedents
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Senior Big Data Pyspark Developer

We are looking for a skilled and motivated Full Stack Developer to join our engi...

Location

Canada , Mississauga

Salary:

94300.00 - 141500.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

5-6 years of professional software development experience
Proficiency in Java (including modern Java features)
Strong experience with Node.js
Strong experience with Angular (versions 2+)
Strong experience with Spring Boot and Spring MVC for building web applications and microservices
Proven experience with Microservices architecture design and implementation
Strong experience with Hibernate
Solid command of Oracle Database, including SQL and PL/SQL
Experience with MongoDB for NoSQL data management
Experience with caching mechanisms and technologies like Hazelcast

Job Responsibility

Contribute to the design, development, and implementation of robust software solutions, ensuring performance, scalability, and security
Collaborate with product managers, architects, and senior developers to translate business requirements into technical specifications and develop innovative solutions
Develop and maintain back-end services using Java, Spring Boot, Spring MVC, Node.js, and Microservices architecture
Build responsive and intuitive user interfaces using Angular
Design and manage databases, working with both relational (Oracle) and NoSQL (MongoDB) data stores, leveraging Hibernate for ORM
Implement caching strategies using technologies like Hazelcast to improve application performance
Implement event-driven architectures and data streaming solutions using Kafka
Develop and consume GraphQL APIs, ensuring efficient data exchange between front-end and back-end systems
Adhere to best practices in software development, including participating in code reviews, testing, continuous integration, and continuous deployment (CI/CD)
Actively learn from and contribute to the team, sharing knowledge and helping to maintain high technical standards

Fulltime

Pyspark Big Data Senior Developer - Vice President

We are building an A-team of highly skilled and autonomous engineers, and we are...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of extensive, hands-on experience as a Senior Big Data Developer, with a strong emphasis on PySpark and the Apache Spark ecosystem, operating as a player/coach
Expert proficiency in Python, with a proven track record of developing robust, scalable, and high-performance PySpark applications for large-scale data processing
Deep understanding and extensive hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming) and its ecosystem
Experience with distributed computing frameworks such as Hadoop (HDFS, YARN)
Expert proficiency in SQL and extensive experience with data warehousing concepts and technologies (e.g., Hive, Snowflake, Redshift, Databricks SQL)
Proven experience with various data storage formats (e.g., Parquet, ORC, Avro) and data lake solutions (e.g., Delta Lake, Iceberg)
Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase) is a significant plus
Strong experience with Apache Kafka for building real-time data pipelines and event-driven architectures
Demonstrated experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift, Azure Databricks/Data Factory/Synapse, GCP Dataflow/Dataproc/BigQuery) is highly desirable
Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is a mandatory requirement

Job Responsibility

Operate end-to-end in the design, development, and implementation of robust big data solutions, ensuring optimal performance, scalability, data quality, and security
Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for big data processing and analytical solutions
Act as a player/coach within the team, mentoring junior members and leading by example in the development of efficient and innovative big data architectures
Design, develop, and optimize large-scale data pipelines using PySpark for data ingestion, transformation, and aggregation, always with an eye towards efficiency and domain relevance
Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka
Design and implement sophisticated data warehousing solutions and dimensional models for efficient data storage and retrieval, ensuring alignment with business needs
Work with various distributed data storage technologies, including distributed file systems (e.g., HDFS, S3) and NoSQL databases (e.g., MongoDB, Cassandra), selecting the right tool for the right problem
Implement efficient data processing and storage strategies to optimize the performance and scalability of big data applications, with a strong focus on the 'why' behind the technology choices
Champion best practices in software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines
Demonstrate high autonomy and agency in driving projects forward, making informed decisions, and proactively identifying areas for improvement

Fulltime

Big Data Developer

The Applications Development Intermediate Programmer Analyst is an intermediate ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Minimum of 5 years of overall IT experience, with at least 3 years of hands-on experience in Big Data technologies
Proven experience working with large-scale, high-volume datasets in distributed environments
Strong proficiency in Hadoop ecosystem tools, including: HDFS (Hadoop Distributed File System) for data storage
Hive for data querying and warehousing
Sqoop for data ingestion from relational databases
Advanced knowledge of Apache Spark, including: Spark Core, Spark SQL, and Spark Streaming (preferred)
Performance tuning and optimization techniques (e.g., partitioning, caching, memory management)
Solid programming skills in Python and PySpark for data processing and pipeline development
Strong command of SQL for complex queries, data transformations, and performance tuning
Hands-on experience in data sourcing, ingestion, and extraction from multiple structured and unstructured data sources

Job Responsibility

Participation in the establishment and implementation of new or revised application systems and programs in coordination with the Technology team
Contribute to applications systems analysis and programming activities
Design, develop, and optimize large-scale data processing systems
Work closely with cross-functional teams to build efficient data pipelines, perform data analysis, and support business-critical financial solutions

Fulltime

Big Data / PySpark Engineering Lead - Vice President

The Applications Development Technology Lead Analyst is a senior level position ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Highly experienced and skilled technical lead with 12+years of experience with software building and platform engineering
Experience in Data Engineering, focused on Big Data ecosystems
Knowledge in Hadoop, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
Programming Expert level and hand on experience in Python
Familiarity with data formats like Avro, Parquet, CSV, JSON
Hands-on experience in writing SQL queries
Highly experienced with Unix based operating systems and shell scripting
Experience with source code management tools such as Bitbucket, Git etc
Big Data Tech Proficiency and hands-on in Hadoop, Spark, Hive, Kafka, and NoSQL databases (MongoDB, HBase)
Experience working with query engines like Trino, Presto, Starburst

Job Responsibility

Design and implement scalable, fault-tolerant batch and real-time data processing pipelines
Develop robust data models and schema designs optimized for both performance and storage efficiency
Evaluate and integrate emerging tools and frameworks (e.g., Spark, Flink, Kafka) into the existing stack
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Legacy Systems Decommissioning: Lead the strategic migration of data and logic from legacy platforms (e.g. on-premises SQL Servers) to a modern Data Lakehouse environment
ETL/ELT Transformation: Re-engineer existing stored procedures and complex legacy ETL jobs into scalable, distributed processing frameworks using Spark (Python) and Starburst/Trino
Validation & Parity Testing: Design and implement automated frameworks for Data Parity Testing to ensure 100% accuracy and consistency between legacy outputs and new big data results
Schema Evolution: Map and transform rigid, legacy relational schemas into flexible, high-performance formats optimized for the cloud (e.g., Parquet, Avro, or Iceberg)
Phased Cutover Management: Orchestrate a phased migration strategy (Parallel Run, Shadow Execution) to ensure zero downtime for downstream business applications and reporting tools

Fulltime

Big Data Developer

The Big Data Developer is a senior level position responsible for establishing a...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of relevant experience in Big Data/Application Development or systems analysis roles, including building and operating production-grade data pipelines on Hadoop/Spark
Extensive experience in system analysis and in programming of big data applications and data platforms
Proven experience designing and managing Hadoop-based architectures, including cluster configuration, resource management (YARN), and ecosystem integration
Strong understanding and hands-on expertise with the Hadoop ecosystem: HDFS, YARN, MapReduce, Hive, HBase, and Spark
Strong hands-on and architectural knowledge of Python, PySpark, Unix/Linux, and SQL
Experience with data modeling, ETL processes, and data warehousing concepts and implementation
Experience implementing data security and governance (e.g., RBAC, encryption, data quality, data lineage, catalog)
Exposure to AI/ML lifecycle management, MLOps, and GenAI solution patterns and integration points
Experience with major cloud platforms—AWS, Azure, Google Cloud—and related big data services (e.g., EMR, HDInsight, Dataproc, Databricks)
Subject Matter Expert (SME) in at least one area of Big Data/Application Development (e.g., Spark performance tuning, Hive optimization, HBase administration, data security)

Job Responsibility

Partner with multiple management teams to ensure appropriate integration of functions to meet goals, and to identify and define necessary platform and system enhancements to deploy new data products and process improvements
Design and implement scalable and efficient Hadoop architecture solutions encompassing core ecosystem components, including HDFS, YARN, MapReduce, Hive, HBase, and Spark
Collaborate with data engineers, data scientists, and analytics stakeholders to understand data requirements and deliver robust, reliable pipelines and analytical datasets
Develop Spark/PySpark solutions to support near real-time data ingestion, analytics, and reporting, ensuring high performance and reliability
Optimize Hadoop and Spark clusters for performance and resource utilization, including capacity planning, tuning, and job orchestration best practices
Maintain and monitor Hadoop infrastructure to ensure high availability, reliability, and observability
implement proactive alerting, logging, and issue resolution
Implement and enforce data security and governance policies (e.g., access controls, encryption, data quality, lineage, and cataloging) across big data platforms
Troubleshoot and resolve issues across the Hadoop ecosystem (jobs, services, resource management), driving root-cause analysis and permanent fixes
Provide expertise in the area and advanced knowledge of applications programming, ensuring application and data solution design adheres to the overall architecture blueprint and cloud reference patterns

Fulltime

Senior Big Data Developer

The Applications Development Senior Programmer Analyst is an intermediate level ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8-14 years of relevant experience
Experience in systems analysis and programming of software applications
Experience in managing and implementing successful projects
Working knowledge of consulting/project management techniques/methods
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor’s degree/University degree or equivalent experience
Strong Object-Oriented Programming (OOP) concepts
proficient in Python (specifically for PySpark)
Extensive experience with Apache Spark (PySpark), Hadoop, and related components like Hive and Sqoop
skilled in writing shell scripts

Job Responsibility

Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
Ensure essential procedures are followed and help define operating standards and processes
Serve as advisor or coach to new or lower level analysts
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Senior Python Big Data Developer

The Applications Development Senior Programmer Analyst is an intermediate level ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

7 - 12 years of relevant experience
Experience in systems analysis and programming of software applications
Experience in managing and implementing successful projects
Working knowledge of consulting/project management techniques/methods
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Bachelor’s degree/University degree or equivalent experience
Strong expertise in Big Data technologies (Spark, Hadoop, Hive, Impala, Kafka, Scala, Cloudera)
Design, develop, and maintain robust and scalable data pipelines using Python, SQL, PySpark, and streaming technologies like Kafka
Strong SQL and NoSQL experience (Oracle, MongoDB, PostgreSQL) for data extraction, reconciliation, and transformation
Proficiency in Python and Shell scripting for data processing and automation

Job Responsibility

Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
Ensure essential procedures are followed and help define operating standards and processes
Serve as advisor or coach to new or lower level analysts
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Select Country

PySpark Big Data Developer

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

PySpark Big Data Developer

PySpark Big Data Developer

Senior Big Data Pyspark Developer

Pyspark Big Data Senior Developer - Vice President

Big Data Developer

Big Data / PySpark Engineering Lead - Vice President

Big Data Developer

Senior Big Data Developer

Senior Python Big Data Developer

Our AI answers in your language