CrawlJobs Logo

Data Engineer - Python and Spark

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

This is a data engineer position - a programmer responsible for the design, development implementation and maintenance of data flow channels and data processing systems that support the collection, storage, batch and real-time processing, and analysis of information in a scalable, repeatable, and secure manner in coordination with the Data & Analytics team. The overall objective is defining optimal solutions to data collection, processing, and warehousing. Must be a Spark Java development expertise in big data processing, Python and Apache spark particularly within banking & finance domain. He/She designs, codes and tests data systems and works on implementing those into the internal infrastructure.

Job Responsibility:

  • Ensuring high quality software development, with complete documentation and traceability
  • Develop and optimize scalable Spark Java-based data pipelines for processing and analyzing large scale financial data
  • Design and implement distributed computing solutions for risk modeling, pricing and regulatory compliance
  • Ensure efficient data storage and retrieval using Big Data
  • Implement best practices for spark performance tuning including partition, caching and memory management
  • Maintain high code quality through testing, CI/CD pipelines and version control (Git, Jenkins)
  • Work on batch processing frameworks for Market risk analytics
  • Promoting unit/functional testing and code inspection processes
  • Work with business stakeholders and Business Analysts to understand the requirements
  • Work with other data scientists to understand and interpret complex datasets

Requirements:

  • 5-8 Years of experience in working in data eco systems
  • 4-5 years of hands-on experience in Hadoop, Scala, Java, Spark, Hive, Kafka, Impala, Unix Scripting and other Big data frameworks
  • 3+ years of experience with relational SQL and NoSQL databases: Oracle, MongoDB, HBase
  • Strong proficiency in Python and Spark Java with knowledge of core spark concepts (RDDs, Dataframes, Spark Streaming, etc) and Scala and SQL
  • Data Integration, Migration & Large Scale ETL experience (Common ETL platforms such as PySpark/DataStage/AbInitio etc.) - ETL design & build, handling, reconciliation and normalization
  • Data Modeling experience (OLAP, OLTP, Logical/Physical Modeling, Normalization, knowledge on performance tuning)
  • Experienced in working with large and multiple datasets and data warehouses
  • Experience building and optimizing ‘big data’ data pipelines, architectures, and datasets
  • Strong analytic skills and experience working with unstructured datasets
  • Ability to effectively use complex analytical, interpretive, and problem-solving techniques
  • Experience with Confluent Kafka, Redhat JBPM, CI/CD build pipelines and toolchain – Git, BitBucket, Jira
  • Experience with external cloud platform such as OpenShift, AWS & GCP
  • Experience with container technologies (Docker, Pivotal Cloud Foundry) and supporting frameworks (Kubernetes, OpenShift, Mesos)
  • Experienced in integrating search solution with middleware & distributed messaging - Kafka
  • Highly effective interpersonal and communication skills with tech/non-tech stakeholders
  • Experienced in software development life cycle and good problem-solving skills
  • Excellent problem-solving skills and strong mathematical and analytical mindset
  • Ability to work in a fast-paced financial environment
  • Bachelor’s/University degree or equivalent experience in computer science, engineering, or similar domain

Additional Information:

Job Posted:
December 28, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Data Engineer - Python and Spark

New

Software Engineer (Data Engineering)

We are seeking a Software Engineer (Data Engineering) who can seamlessly integra...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years in Data Engineering and AI/ML roles
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
  • Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
  • Amazon S3 (Parquet) with lifecycle management to Glacier
  • AWS Glue Catalog and Crawlers
  • AWS Step Functions, AWS Lambda, Amazon EventBridge
  • CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
  • Amazon Redshift and Redshift Spectrum
  • IAM (least privilege), Secrets Manager, SSM
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
  • Develop and optimize data architectures supporting analytics and ML workflows
  • Ensure data integrity, security, and compliance with organizational and industry standards
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
  • Build predictive and prescriptive models leveraging AI and ML techniques
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
  • Perform feature engineering, statistical analysis, and data preprocessing
  • Continuously monitor and optimize models for accuracy and scalability
  • Integrate AI-driven insights into business processes and strategies
  • Serve as the technical liaison between NStarX and client teams
What we offer
What we offer
  • Competitive salary and performance-based incentives
  • Opportunity to work on cutting-edge AI and ML projects
  • Exposure to global clients and international project delivery
  • Continuous learning and professional development opportunities
  • Competitive base + commission
  • Fast growth into leadership roles
  • Fulltime
Read More
Arrow Right
New

Senior AWS Data Engineer / Data Platform Engineer

We are seeking a highly experienced Senior AWS Data Engineer to design, build, a...
Location
Location
United Arab Emirates , Dubai
Salary
Salary:
Not provided
northbaysolutions.com Logo
NorthBay
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in data engineering and data platform development
  • Strong hands-on experience with: AWS Glue
  • Amazon EMR (Spark)
  • AWS Lambda
  • Apache Airflow (MWAA)
  • Amazon EC2
  • Amazon CloudWatch
  • Amazon Redshift
  • Amazon DynamoDB
  • AWS DataZone
Job Responsibility
Job Responsibility
  • Design, develop, and optimize scalable data pipelines using AWS native services
  • Lead the implementation of batch and near-real-time data processing solutions
  • Architect and manage data ingestion, transformation, and storage layers
  • Build and maintain ETL/ELT workflows using AWS Glue and Apache Spark on EMR
  • Orchestrate complex data workflows using Apache Airflow (MWAA)
  • Develop and manage serverless data processing using AWS Lambda
  • Design and optimize data warehouses using Amazon Redshift
  • Implement and manage NoSQL data models using Amazon DynamoDB
  • Utilize AWS DataZone for data governance, cataloging, and access management
  • Monitor, log, and troubleshoot data pipelines using Amazon CloudWatch
  • Fulltime
Read More
Arrow Right

Software Engineer, Data Engineering

Join us in building the future of finance. Our mission is to democratize finance...
Location
Location
Canada , Toronto
Salary
Salary:
124000.00 - 145000.00 CAD / Year
robinhood.com Logo
Robinhood
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional experience building end-to-end data pipelines
  • Hands-on software engineering experience, with the ability to write production-level code in Python for user-facing applications, services, or systems (not just data scripting or automation)
  • Expert at building and maintaining large-scale data pipelines using open source frameworks (Spark, Flink, etc)
  • Strong SQL (Presto, Spark SQL, etc) skills
  • Experience solving problems across the data stack (Data Infrastructure, Analytics and Visualization platforms)
  • Expert collaborator with the ability to democratize data through actionable insights and solutions
Job Responsibility
Job Responsibility
  • Help define and build key datasets across all Robinhood product areas. Lead the evolution of these datasets as use cases grow
  • Build scalable data pipelines using Python, Spark and Airflow to move data from different applications into our data lake
  • Partner with upstream engineering teams to enhance data generation patterns
  • Partner with data consumers across Robinhood to understand consumption patterns and design intuitive data models
  • Ideate and contribute to shared data engineering tooling and standards
  • Define and promote data engineering best practices across the company
What we offer
What we offer
  • bonus opportunities
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Data Engineering

Join us in building the future of finance. Our mission is to democratize finance...
Location
Location
United States , Menlo Park
Salary
Salary:
146000.00 - 198000.00 USD / Year
robinhood.com Logo
Robinhood
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience building end-to-end data pipelines
  • Hands-on software engineering experience, with the ability to write production-level code in Python for user-facing applications, services, or systems (not just data scripting or automation)
  • Expert at building and maintaining large-scale data pipelines using open source frameworks (Spark, Flink, etc)
  • Strong SQL (Presto, Spark SQL, etc) skills
  • Experience solving problems across the data stack (Data Infrastructure, Analytics and Visualization platforms)
  • Expert collaborator with the ability to democratize data through actionable insights and solutions
Job Responsibility
Job Responsibility
  • Help define and build key datasets across all Robinhood product areas. Lead the evolution of these datasets as use cases grow
  • Build scalable data pipelines using Python, Spark and Airflow to move data from different applications into our data lake
  • Partner with upstream engineering teams to enhance data generation patterns
  • Partner with data consumers across Robinhood to understand consumption patterns and design intuitive data models
  • Ideate and contribute to shared data engineering tooling and standards
  • Define and promote data engineering best practices across the company
What we offer
What we offer
  • Market competitive and pay equity-focused compensation structure
  • 100% paid health insurance for employees with 90% coverage for dependents
  • Annual lifestyle wallet for personal wellness, learning and development, and more
  • Lifetime maximum benefit for family forming and fertility benefits
  • Dedicated mental health support for employees and eligible dependents
  • Generous time away including company holidays, paid time off, sick time, parental leave, and more
  • Lively office environment with catered meals, fully stocked kitchens, and geo-specific commuter benefits
  • Bonus opportunities
  • Equity
  • Fulltime
Read More
Arrow Right

Big Data / Scala / Python Engineering Lead

The Applications Development Technology Lead Analyst is a senior level position ...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least two years (Over all 10+ hands on Data Engineering experience) of experience building and leading highly complex, technical data engineering teams
  • Lead data engineering team, from sourcing to closing
  • Drive strategic vision for the team and product
  • Experience managing an data focused product, ML platform
  • Hands on experience relevant experience in design, develop, and optimize scalable distributed data processing pipelines using Apache Spark and Scala
  • Experience managing, hiring and coaching software engineering teams
  • Experience with large-scale distributed web services and the processes around testing, monitoring, and SLAs to ensure high product quality
  • 7 to 10+ years of hands-on experience in big data development, focusing on Apache Spark, Scala, and distributed systems
  • Proficiency in Functional Programming: High proficiency in Scala-based functional programming for developing robust and efficient data processing pipelines
  • Proficiency in Big Data Technologies: Strong experience with Apache Spark, Hadoop ecosystem tools such as Hive, HDFS, and YARN
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Fulltime
Read More
Arrow Right

Data Platform Engineer

Data Platform Engineers at Adyen build the foundational layer of tooling and pro...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in Python
  • Experience developing and maintaining distributed data and compute systems like Spark, Trino, Druid, etc
  • Experience developing and maintaining DevOps pipelines and development ecosystems
  • Experience developing and maintaining real-time and batch data pipelines (via Kafka, Spark streaming)
  • Experience with Kubernetes ecosystem (k8s, docker), and/or Hadoop ecosystems (Hive, Yarn, HDFS, Kerberos)
  • Team player with strong communication skills
  • Ability to work closely with diverse stakeholders
Job Responsibility
Job Responsibility
  • Develop and maintain scalable and high performance big data platforms
  • Work with distributed systems in all shapes and flavors (databases, filesystems, compute, etc.)
  • Identify opportunities to improve continuous release and deployment environments
  • Build or deploy tools to enhance data discoverability through the collection and presentation of metadata
  • Introduce and extend tools to enhance the quality of our data, platform-wide
  • Explore and introduce technologies and practices to reduce the time to insight for analysts and data scientists
  • Develop streaming processing applications and frameworks
  • Build the foundational layer of tooling and processes for on-premise Big Data Platforms
  • Collaborate with data engineers and ML scientists and engineers to build and roll-out tools
  • Develop and operate multiple big data platforms
Read More
Arrow Right

Senior Data Engineer

Are you an experienced Data Engineer ready to tackle complex, high-load, and dat...
Location
Location
Salary
Salary:
Not provided
sigma.software Logo
Sigma Software Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Apache Spark / expert
  • Python / expert
  • SQL / expert
  • Kafka / good
  • Data Governance (Apache Ranger/Atlas) / good
What we offer
What we offer
  • Diversity of Domains & Businesses
  • Variety of technology
  • Health & Legal support
  • Active professional community
  • Continuous education and growing
  • Flexible schedule
  • Remote work
  • Outstanding offices (if you choose it)
  • Sports and community activities
  • Fulltime
Read More
Arrow Right

Lead Data Engineer

As a Lead Data Engineer at Rearc, you'll play a pivotal role in establishing and...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
rearc.io Logo
Rearc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in data engineering, data architecture, or related fields
  • Extensive experience in writing and testing Java and/or Python
  • Proven experience with data pipeline orchestration using platforms such as Airflow, Databricks, DBT or AWS Glue
  • Hands-on experience with data analysis tools and libraries like Pyspark, NumPy, Pandas, or Dask
  • Proficiency with Spark and Databricks is highly desirable
  • Proven track record of leading complex data engineering projects, including designing and implementing scalable data solutions
  • Hands-on experience with ETL processes, data warehousing, and data modeling tools
  • In-depth knowledge of data integration tools and best practices
  • Strong understanding of cloud-based data services and technologies (e.g., AWS Redshift, Azure Synapse Analytics, Google BigQuery)
  • Strong strategic and analytical skills
Job Responsibility
Job Responsibility
  • Understand Requirements and Challenges: Collaborate with stakeholders to deeply understand their data requirements and challenges
  • Implement with a DataOps Mindset: Embrace a DataOps mindset and utilize modern data engineering tools and frameworks, such as Apache Airflow, Apache Spark, or similar, to build scalable and efficient data pipelines and architectures
  • Lead Data Engineering Projects: Take the lead in managing and executing data engineering projects, providing technical guidance and oversight to ensure successful project delivery
  • Mentor Data Engineers: Share your extensive knowledge and experience in data engineering with junior team members, guiding and mentoring them to foster their growth and development in the field
  • Promote Knowledge Sharing: Contribute to our knowledge base by writing technical blogs and articles, promoting best practices in data engineering, and contributing to a culture of continuous learning and innovation
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.