CrawlJobs Logo

Pyspark Data Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly motivated and intuitive Python Developer to join our dynamic team, focusing on critical data migration and profiling initiatives. The ideal candidate will be a self-starter with strong engineering principles, capable of designing and implementing robust solutions for handling large datasets and complex data flows. This role offers an exciting opportunity to work on challenging projects that drive significant impact within our data ecosystem.

Job Responsibility:

  • Develop, test, and deploy high-quality Python code for data migration, data profiling, and data processing
  • Design and implement scalable solutions for working with large and complex datasets, ensuring data integrity and performance
  • Utilize PySpark for distributed data processing and analytics on large-scale data platforms
  • Develop and optimize SQL queries for various database systems, including Oracle, to extract, transform, and load data efficiently
  • Integrate Python applications with JDBC-compliant databases (e.g., Oracle) for seamless data interaction
  • Implement data streaming solutions to process real-time or near real-time data efficiently
  • Perform in-depth data analysis using Python libraries, especially Pandas, to understand data characteristics, identify anomalies, and support profiling efforts
  • Collaborate with data architects, data engineers, and business stakeholders to understand requirements and translate them into technical specifications
  • Contribute to the design and architecture of data solutions, ensuring best practices in data management and engineering
  • Troubleshoot and resolve technical issues related to data pipelines, performance, and data quality

Requirements:

  • 4-7 years of relevant experience in the Financial Service industry
  • Strong Proficiency in Python: Excellent command of Python programming, including object-oriented principles, data structures, and algorithms
  • PySpark Experience: Demonstrated experience with PySpark for big data processing and analysis
  • Database Expertise: Proven experience working with relational databases, specifically Oracle, and connecting applications using JDBC
  • SQL Mastery: Advanced SQL querying skills for complex data extraction, manipulation, and optimization
  • Big Data Handling: Experience in working with and processing large datasets efficiently
  • Data Streaming: Familiarity with data streaming concepts and technologies (e.g., Kafka, Spark Streaming) for processing continuous data flows
  • Data Analysis Libraries: Proficient in using data analysis libraries such as Pandas for data manipulation and exploration
  • Software Engineering Principles: Solid understanding of software engineering best practices, including version control (Git), testing, and code review
  • Problem-Solving: Intuitive problem-solver with a self-starter mindset and the ability to work independently and as part of a team
  • Education: Bachelor’s degree/University degree or equivalent experience

Nice to have:

  • Experience in developing and maintaining reusable Python packages or libraries for data engineering tasks
  • Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and their data services
  • Knowledge of data warehousing concepts and ETL/ELT processes
  • Experience with CI/CD pipelines for automated deployment

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Pyspark Data Engineer

Pyspark Data Engineer

The Data Analytics Intmd Analyst is a developing professional role. Deals with m...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4-8 years relevant experience in Data Analytics and Big Data
  • SQL, Python, Pyspark, with Spark components
  • Minimum 4 years of experience as a python developer with expertise in automation testing to design, develop, and automate robust software solutions and testing frameworks like Pytest, Behave
  • 2-4 years of experience as Big Data Engineer to develop, optimize, and manage large-scale data processing systems and analytics platforms
  • 4 years of experience in distributed data processing & near real-time data analytics using PySpark
  • Strong understanding of PySpark execution plans, partitioning & optimization techniques
Job Responsibility
Job Responsibility
  • Integrates in-depth data analysis knowledge with a solid understanding of industry standards and practices
  • Demonstrates a Good understanding of how data analytics teams and area integrate with others in accomplishing objectives
  • Applies project management skills
  • Applies analytical thinking and knowledge of data analysis tools and methodologies
  • Analyzes factual information to make accurate judgments and recommendations focused on local operations and broader impacts
  • Applies professional judgment when interpreting data and results breaking down information in a systematic and communicable manner
  • Employs developed communication and diplomacy skills to exchange potentially complex/sensitive information
  • Demonstrates attention to quality and timeliness of service to ensure the effectiveness of the team and group
  • Provides informal guidance or on-the-job-training to new team members
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Fulltime
Read More
Arrow Right

Senior Data Engineer – Data Engineering & AI Platforms

We are looking for a highly skilled Senior Data Engineer (L2) who can design, bu...
Location
Location
India , Chennai, Madurai, Coimbatore
Salary
Salary:
Not provided
optisolbusiness.com Logo
OptiSol Business Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on expertise in cloud ecosystems (Azure / AWS / GCP)
  • Excellent Python programming skills with data engineering libraries and frameworks
  • Advanced SQL capabilities including window functions, CTEs, and performance tuning
  • Solid understanding of distributed processing using Spark/PySpark
  • Experience designing and implementing scalable ETL/ELT workflows
  • Good understanding of data modeling concepts (dimensional, star, snowflake)
  • Familiarity with GenAI/LLM-based integration for data workflows
  • Experience working with Git, CI/CD, and Agile delivery frameworks
  • Strong communication skills for interacting with clients, stakeholders, and internal teams
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL/ELT pipelines across cloud and big data platforms
  • Contribute to architectural discussions by translating business needs into data solutions spanning ingestion, transformation, and consumption layers
  • Work closely with solutioning and pre-sales teams for technical evaluations and client-facing discussions
  • Lead squads of L0/L1 engineers—ensuring delivery quality, mentoring, and guiding career growth
  • Develop cloud-native data engineering solutions using Python, SQL, PySpark, and modern data frameworks
  • Ensure data reliability, performance, and maintainability across the pipeline lifecycle—from development to deployment
  • Support long-term ODC/T&M projects by demonstrating expertise during technical discussions and interviews
  • Integrate emerging GenAI tools where applicable to enhance data enrichment, automation, and transformations
What we offer
What we offer
  • Opportunity to work at the intersection of Data Engineering, Cloud, and Generative AI
  • Hands-on exposure to modern data stacks and emerging AI technologies
  • Collaboration with experts across Data, AI/ML, and cloud practices
  • Access to structured learning, certifications, and leadership mentoring
  • Competitive compensation with fast-track career growth and visibility
  • Fulltime
Read More
Arrow Right

Senior Data Engineering Architect

Location
Location
Poland
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven work experience as a Data Engineering Architect or a similar role and strong experience in in the Data & Analytics area
  • Strong understanding of data engineering concepts, including data modeling, ETL processes, data pipelines, and data governance
  • Expertise in designing and implementing scalable and efficient data processing frameworks
  • In-depth knowledge of various data technologies and tools, such as relational databases, NoSQL databases, data lakes, data warehouses, and big data frameworks (e.g., Hadoop, Spark)
  • Experience in selecting and integrating appropriate technologies to meet business requirements and long-term data strategy
  • Ability to work closely with stakeholders to understand business needs and translate them into data engineering solutions
  • Strong analytical and problem-solving skills, with the ability to identify and address complex data engineering challenges
  • Proficiency in Python, PySpark, SQL
  • Familiarity with cloud platforms and services, such as AWS, GCP, or Azure, and experience in designing and implementing data solutions in a cloud environment
  • Knowledge of data governance principles and best practices, including data privacy and security regulations
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business requirements and translate them into data engineering solutions
  • Design and oversee the overall data architecture and infrastructure, ensuring scalability, performance, security, maintainability, and adherence to industry best practices
  • Define data models and data schemas to meet business needs, considering factors such as data volume, velocity, variety, and veracity
  • Select and integrate appropriate data technologies and tools, such as databases, data lakes, data warehouses, and big data frameworks, to support data processing and analysis
  • Create scalable and efficient data processing frameworks, including ETL (Extract, Transform, Load) processes, data pipelines, and data integration solutions
  • Ensure that data engineering solutions align with the organization's long-term data strategy and goals
  • Evaluate and recommend data governance strategies and practices, including data privacy, security, and compliance measures
  • Collaborate with data scientists, analysts, and other stakeholders to define data requirements and enable effective data analysis and reporting
  • Provide technical guidance and expertise to data engineering teams, promoting best practices and ensuring high-quality deliverables. Support to team throughout the implementation process, answering questions and addressing issues as they arise
  • Oversee the implementation of the solution, ensuring that it is implemented according to the design documents and technical specifications
What we offer
What we offer
  • Stable employment. On the market since 2008, 1500+ talents currently on board in 7 global sites
  • Workation. Enjoy working from inspiring locations in line with our workation policy
  • Great Place to Work® certified employer
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs. Lingarians earn 500+ technology certificates yearly
  • Upskilling support. Capability development programs, Competency Centers, knowledge sharing sessions, community webinars, 110+ training opportunities yearly
  • Grow as we grow as a company. 76% of our managers are internal promotions
Read More
Arrow Right

Data Engineering Architect

Data engineering involves the development of solutions for the collection, trans...
Location
Location
India
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years’ experience in the Data & Analytics area
  • 4+ years’ experience into Data Engineering Architecture
  • Proficiency in Python, PySpark, SQL
  • Strong expertise in Azure cloud services such as: ADF, databricks, pyspark, Logic app
  • Strong understanding of data engineering concepts, including data modeling, ETL processes, data pipelines, and data governance
  • Expertise in designing and implementing scalable and efficient data processing frameworks
  • In-depth knowledge of various data technologies and tools, such as relational databases, NoSQL databases, data lakes, data warehouses, and big data frameworks (e.g., Hadoop, Spark)
  • Experience in selecting and integrating appropriate technologies to meet business requirements and long-term data strategy
  • Ability to work closely with stakeholders to understand business needs and translate them into data engineering solutions
  • Strong analytical and problem-solving skills, with the ability to identify and address complex data engineering challenges
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business requirements and translate them into data engineering solutions
  • Design and oversee the overall data architecture and infrastructure, ensuring scalability, performance, security, maintainability, and adherence to industry best practices
  • Define data models and data schemas to meet business needs, considering factors such as data volume, velocity, variety, and veracity
  • Select and integrate appropriate data technologies and tools, such as databases, data lakes, data warehouses, and big data frameworks, to support data processing and analysis
  • Create scalable and efficient data processing frameworks, including ETL (Extract, Transform, Load) processes, data pipelines, and data integration solutions
  • Ensure that data engineering solutions align with the organization's long-term data strategy and goals
  • Evaluate and recommend data governance strategies and practices, including data privacy, security, and compliance measures
  • Collaborate with data scientists, analysts, and other stakeholders to define data requirements and enable effective data analysis and reporting
  • Provide technical guidance and expertise to data engineering teams, promoting best practices and ensuring high-quality deliverables
  • Support to team throughout the implementation process, answering questions and addressing issues as they arise
What we offer
What we offer
  • Stable employment
  • “Office as an option” model
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs
  • Upskilling support
  • Internal Gallup Certified Strengths Coach to support your growth
  • Grow as we grow as a company
Read More
Arrow Right

Software Engineer (Data Engineering)

We are seeking a Software Engineer (Data Engineering) who can seamlessly integra...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years in Data Engineering and AI/ML roles
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
  • Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
  • Amazon S3 (Parquet) with lifecycle management to Glacier
  • AWS Glue Catalog and Crawlers
  • AWS Step Functions, AWS Lambda, Amazon EventBridge
  • CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
  • Amazon Redshift and Redshift Spectrum
  • IAM (least privilege), Secrets Manager, SSM
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
  • Develop and optimize data architectures supporting analytics and ML workflows
  • Ensure data integrity, security, and compliance with organizational and industry standards
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
  • Build predictive and prescriptive models leveraging AI and ML techniques
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
  • Perform feature engineering, statistical analysis, and data preprocessing
  • Continuously monitor and optimize models for accuracy and scalability
  • Integrate AI-driven insights into business processes and strategies
  • Serve as the technical liaison between NStarX and client teams
What we offer
What we offer
  • Competitive salary and performance-based incentives
  • Opportunity to work on cutting-edge AI and ML projects
  • Exposure to global clients and international project delivery
  • Continuous learning and professional development opportunities
  • Competitive base + commission
  • Fast growth into leadership roles
  • Fulltime
Read More
Arrow Right

Data Engineer

At Adyen, we treat data and data artifacts as first-class citizens. They form ou...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience working as a Data Engineer or in a similar role
  • Solid understanding of both Software and Data Engineering practices
  • Proficient in tools and languages such as: Python, PySpark, Airflow, Hadoop, Spark, Kafka, SQL, Git
  • Able to effectively communicate complex data-related concepts and outcomes to a diverse range of stakeholders
  • Capable of identifying opportunities, devising solutions, and handling projects independently
  • Experimental mindset with a ‘launch fast and iterate’ mentality
  • Skilled in promoting a data-centric culture within technical teams and advocating for setting standards and continuous improvement
Job Responsibility
Job Responsibility
  • Collaborative Solution Development: Engage with a diverse range of stakeholders, including data scientists, analysts, software engineers, product managers, and customers, to understand their requirements and craft effective solutions
  • Quality Pipelines and Architecture: Design, develop, deploy and operate high-quality production ELT pipelines and data architectures. Integrate data from various sources and formats, ensuring compatibility, consistency, and reliability
  • Data Best Practices: Help establish and share best practices in performance, code quality, data validation, data governance, and discoverability in your team and in other teams. Participate in mentoring and knowledge sharing initiatives
  • High Quality Data and Code: Ensure data is accurate, complete, reliable, relevant, and timely. Implement testing, monitoring and validation protocols for your code and data, leveraging tools such as Pytest
  • Performance Optimization: Identify and resolve performance bottlenecks in data pipelines and systems. Improve query performance and resource utilization to meet SLAs and performance requirements, using technologies Spark optimizations
Read More
Arrow Right

Senior Data Engineer

Senior Data Engineer position at Checkr, building the data platform to power saf...
Location
Location
United States , San Francisco
Salary
Salary:
162000.00 - 190000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of development experience in the field of data engineering
  • 5+ years writing PySpark
  • Experience building large-scale (100s of Terabytes and Petabytes) data processing pipelines - batch and stream
  • Experience with ETL/ELT, stream and batch processing of data at scale
  • Strong proficiency in PySpark and Python
  • Expertise in understanding of database systems, data modeling, relational databases, NoSQL (such as MongoDB)
  • Experience with big data technologies such as Kafka, Spark, Iceberg, Datalake and AWS stack (EKS, EMR, Serverless, Glue, Athena, S3, etc.)
  • Knowledge of security best practices and data privacy concerns
  • Strong problem-solving skills and attention to detail
Job Responsibility
Job Responsibility
  • Create and maintain data pipelines and foundational datasets to support product/business needs
  • Design and build database architectures with massive and complex data, balancing with computational load and cost
  • Develop audits for data quality at scale, implementing alerting as necessary
  • Create scalable dashboards and reports to support business objectives and enable data-driven decision-making
  • Troubleshoot and resolve complex issues in production environments
  • Work closely with product managers and other stakeholders to define and implement new features
What we offer
What we offer
  • Learning and development reimbursement allowance
  • Competitive compensation and opportunity for professional and personal advancement
  • 100% medical, dental, and vision coverage for employees and dependents
  • Additional vacation benefits of 5 extra days and flexibility to take time off
  • Reimbursement for work from home equipment
  • Lunch four times a week
  • Commuter stipend
  • Abundance of snacks and beverages
  • Fulltime
Read More
Arrow Right

Senior Big Data Engineer

The Big Data Engineer is a senior level position responsible for establishing an...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ Years of Experience in Big Data Engineering (PySpark)
  • Data Pipeline Development: Design, build, and maintain scalable ETL/ELT pipelines to ingest, transform, and load data from multiple sources
  • Big Data Infrastructure: Develop and manage large-scale data processing systems using frameworks like Apache Spark, Hadoop, and Kafka
  • Proficiency in programming languages like Python, or Scala
  • Strong expertise in data processing frameworks such as Apache Spark, Hadoop
  • Expertise in Data Lakehouse technologies (Apache Iceberg, Apache Hudi, Trino)
  • Experience with cloud data platforms like AWS (Glue, EMR, Redshift), Azure (Synapse), or GCP (BigQuery)
  • Expertise in SQL and database technologies (e.g., Oracle, PostgreSQL, etc.)
  • Experience with data orchestration tools like Apache Airflow or Prefect
  • Familiarity with containerization (Docker, Kubernetes) is a plus
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citigroup, its clients and assets
  • Fulltime
Read More
Arrow Right