CrawlJobs Logo

Senior Data Engineer - Python & Pyspark

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Senior Data Engineer will be responsible for the architecture, design, development, and maintenance of our data platforms, with a strong focus on leveraging Python and PySpark for data processing and transformation. This role requires a strong technical leader who can work independently and as part of a team, contributing to the overall data strategy and helping to drive data-driven decision-making across the organization.

Job Responsibility:

  • Design, develop, and optimize data architectures, pipelines, and data models to support various business needs, including analytics, reporting, and machine learning
  • Build, test, and deploy highly scalable and efficient ETL/ELT processes using Python and PySpark to ingest, transform, and load data from diverse sources into data warehouses and data lakes
  • Develop and optimize complex data transformations using PySpark
  • Implement best practices for data quality, data governance, and data security to ensure the integrity, reliability, and privacy of our data assets
  • Monitor, troubleshoot, and optimize data pipeline performance, ensuring data availability and timely delivery, particularly for PySpark jobs
  • Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters
  • Provide technical guidance, mentorship, and code reviews to junior data engineers, particularly in Python and PySpark best practices, fostering a culture of excellence and continuous improvement
  • Work closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver solutions that meet business objectives
  • Research and evaluate new data technologies, tools, and methodologies to enhance our data capabilities and stay ahead of industry trends
  • Create and maintain comprehensive documentation for data pipelines, data models, and data infrastructure

Requirements:

  • Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field
  • 5+ years of professional experience in data engineering, with a strong emphasis on building and maintaining large-scale data systems
  • Extensive hands-on experience with Python for data engineering tasks
  • Proven experience with PySpark for big data processing and transformation
  • Proven experience with cloud data platforms (e.g., AWS Redshift, S3, EMR, Glue
  • Azure Data Lake, Databricks, Synapse
  • Google BigQuery, Dataflow)
  • Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra)
  • Extensive experience with distributed data processing frameworks, especially Apache Spark
  • Expert proficiency in Python is mandatory
  • Strong SQL mastery is essential
  • In-depth knowledge and hands-on experience with Apache Spark (PySpark) for data processing, including Spark SQL, Spark Streaming, and DataFrame API
  • In-depth knowledge of data warehousing concepts, dimensional modeling, and ETL/ELT processes
  • Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, particularly those supporting Spark/PySpark workloads
  • Proficient with Git and CI/CD pipelines
  • Excellent problem-solving and analytical abilities
  • Strong communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders
  • Ability to work effectively in a fast-paced, agile environment
  • Proactive and self-motivated with a strong sense of ownership

Nice to have:

  • Familiarity with Scala or Java is a plus
  • Familiarity with Docker and Kubernetes is a plus
  • Experience with real-time data streaming and processing using PySpark Structured Streaming
  • Knowledge of machine learning concepts and MLOps practices, especially integrating ML workflows with PySpark
  • Familiarity with data visualization tools (e.g., Tableau, Power BI)
  • Contributions to open-source data projects

Additional Information:

Job Posted:
December 28, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Data Engineer - Python & Pyspark

Senior Data Engineer

Senior Data Engineer – Dublin (Hybrid) Contract Role | 3 Days Onsite. We are see...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
solasit.ie Logo
Solas IT Recruitment
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience as a Data Engineer working with distributed data systems
  • 4+ years of deep Snowflake experience, including performance tuning, SQL optimization, and data modelling
  • Strong hands-on experience with the Hadoop ecosystem: HDFS, Hive, Impala, Spark (PySpark preferred)
  • Oozie, Airflow, or similar orchestration tools
  • Proven expertise with PySpark, Spark SQL, and large-scale data processing patterns
  • Experience with Databricks and Delta Lake (or equivalent big-data platforms)
  • Strong programming background in Python, Scala, or Java
  • Experience with cloud services (AWS preferred): S3, Glue, EMR, Redshift, Lambda, Athena, etc.
Job Responsibility
Job Responsibility
  • Build, enhance, and maintain large-scale ETL/ELT pipelines using Hadoop ecosystem tools including HDFS, Hive, Impala, and Oozie/Airflow
  • Develop distributed data processing solutions with PySpark, Spark SQL, Scala, or Python to support complex data transformations
  • Implement scalable and secure data ingestion frameworks to support both batch and streaming workloads
  • Work hands-on with Snowflake to design performant data models, optimize queries, and establish solid data governance practices
  • Collaborate on the migration and modernization of current big-data workloads to cloud-native platforms and Databricks
  • Tune Hadoop, Spark, and Snowflake systems for performance, storage efficiency, and reliability
  • Apply best practices in data modelling, partitioning strategies, and job orchestration for large datasets
  • Integrate metadata management, lineage tracking, and governance standards across the platform
  • Build automated validation frameworks to ensure accuracy, completeness, and reliability of data pipelines
  • Develop unit, integration, and end-to-end testing for ETL workflows using Python, Spark, and dbt testing where applicable
Read More
Arrow Right

Senior Data Engineer

At Rearc, we're committed to empowering engineers to build awesome products and ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
rearc.io Logo
Rearc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in data engineering, showcasing expertise in diverse architectures, technology stacks, and use cases
  • Strong expertise in designing and implementing data warehouse and data lake architectures, particularly in AWS environments
  • Extensive experience with Python for data engineering tasks, including familiarity with libraries and frameworks commonly used in Python-based data engineering workflows
  • Proven experience with data pipeline orchestration using platforms such as Airflow, Databricks, DBT or AWS Glue
  • Hands-on experience with data analysis tools and libraries like Pyspark, NumPy, Pandas, or Dask
  • Proficiency with Spark and Databricks is highly desirable
  • Experience with SQL and NoSQL databases, including PostgreSQL, Amazon Redshift, Delta Lake, Iceberg and DynamoDB
  • In-depth knowledge of data architecture principles and best practices, especially in cloud environments
  • Proven experience with AWS services, including expertise in using AWS CLI, SDK, and Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or AWS CDK
  • Exceptional communication skills, capable of clearly articulating complex technical concepts to both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Strategic Data Engineering Leadership: Provide strategic vision and technical leadership in data engineering, guiding the development and execution of advanced data strategies that align with business objectives
  • Architect Data Solutions: Design and architect complex data pipelines and scalable architectures, leveraging advanced tools and frameworks (e.g., Apache Kafka, Kubernetes) to ensure optimal performance and reliability
  • Drive Innovation: Lead the exploration and adoption of new technologies and methodologies in data engineering, driving innovation and continuous improvement across data processes
  • Technical Expertise: Apply deep expertise in ETL processes, data modelling, and data warehousing to optimize data workflows and ensure data integrity and quality
  • Collaboration and Mentorship: Collaborate closely with cross-functional teams to understand requirements and deliver impactful data solutions—mentor and coach junior team members, fostering their growth and development in data engineering practices
  • Thought Leadership: Contribute to thought leadership in the data engineering domain through technical articles, conference presentations, and participation in industry forums
Read More
Arrow Right

Senior Data Engineer

As a Senior Data Engineer at Rearc, you'll play a pivotal role in establishing a...
Location
Location
United States , New York
Salary
Salary:
160000.00 - 200000.00 USD / Year
rearc.io Logo
Rearc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional experience in data engineering across modern cloud architectures and diverse data systems
  • Expertise in designing and implementing data warehouses and data lakes across modern cloud environments (e.g., AWS, Azure, or GCP), with experience in technologies such as Redshift, BigQuery, Snowflake, Delta Lake, or Iceberg
  • Strong Python experience for data engineering, including libraries like Pandas, PySpark, NumPy, or Dask
  • Hands-on experience with Spark and Databricks (highly desirable)
  • Experience building and orchestrating data pipelines using Airflow, Databricks, DBT, or AWS Glue
  • Strong SQL skills and experience with both SQL and NoSQL databases (PostgreSQL, DynamoDB, Redshift, Delta Lake, Iceberg)
  • Solid understanding of data architecture principles, data modeling, and best practices for scalable data systems
  • Experience with cloud provider services (AWS, Azure, or GCP) and comfort using command-line interfaces or SDKs as part of development workflows
  • Familiarity with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, ARM/Bicep, or AWS CDK
  • Excellent communication skills, able to explain technical concepts to technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Provide strategic data engineering leadership by shaping the vision, roadmap, and technical direction of data initiatives to align with business goals
  • Architect and build scalable, reliable data solutions, including complex data pipelines and distributed systems, using modern frameworks and technologies (e.g., Spark, Kafka, Kubernetes, Databricks, DBT)
  • Drive innovation by evaluating, proposing, and adopting new tools, patterns, and methodologies that improve data quality, performance, and efficiency
  • Apply deep technical expertise in ETL/ELT design, data modeling, data warehousing, and workflow optimization to ensure robust, high-quality data systems
  • Collaborate across teams—partner with engineering, product, analytics, and customer stakeholders to understand requirements and deliver impactful, scalable solutions
  • Mentor and coach junior engineers, fostering growth, knowledge-sharing, and best practices within the data engineering team
  • Contribute to thought leadership through knowledge-sharing, writing technical articles, speaking at meetups or conferences, or representing the team in industry conversations
What we offer
What we offer
  • Health Benefits
  • Generous time away
  • Maternity and Paternity leave
  • Educational resources and reimbursements
  • 401(k) plan with a company contribution
  • Fulltime
Read More
Arrow Right

Senior Big Data Engineer

Location
Location
United States , Flowood
Salary
Salary:
Not provided
phasorsoft.com Logo
PhasorSoft Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in Python programming for data manipulation and analysis
  • Experience with PySpark for processing large-scale data
  • Strong understanding and practical experience with big data technologies such as Hadoop, Spark, Kafka, etc.
  • Knowledge of designing and implementing ETL processes for data integration
  • Ability to work with large datasets, perform data cleansing, transformations, and aggregations
  • Familiarity with machine learning concepts and experience implementing ML models
  • Understanding of data governance principles and experience implementing data security measures
  • Ability to create clear and concise documentation for data pipelines and processes
  • Strong teamwork and collaboration skills to work with cross-functional teams
  • Analytical and problem-solving skills to optimize data workflows and processes
Job Responsibility
Job Responsibility
  • Design and develop scalable data pipelines and solutions using Python and PySpark
  • Utilize big data technologies such as Hadoop, Spark, Kafka, or similar tools for processing and analyzing large datasets
  • Develop and maintain ETL processes to extract, transform, and load data into data lakes or warehouses
  • Collaborate with data engineers and scientists to implement machine learning models and algorithms
  • Optimize and tune data processing workflows for performance and efficiency
  • Implement data governance and security measures to ensure data integrity and privacy
  • Create and maintain documentation for data pipelines, workflows, and processes
  • Provide technical leadership and mentorship to junior team members
  • Fulltime
Read More
Arrow Right

Senior Data Engineer – Data Engineering & AI Platforms

We are looking for a highly skilled Senior Data Engineer (L2) who can design, bu...
Location
Location
India , Chennai, Madurai, Coimbatore
Salary
Salary:
Not provided
optisolbusiness.com Logo
OptiSol Business Solutions
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on expertise in cloud ecosystems (Azure / AWS / GCP)
  • Excellent Python programming skills with data engineering libraries and frameworks
  • Advanced SQL capabilities including window functions, CTEs, and performance tuning
  • Solid understanding of distributed processing using Spark/PySpark
  • Experience designing and implementing scalable ETL/ELT workflows
  • Good understanding of data modeling concepts (dimensional, star, snowflake)
  • Familiarity with GenAI/LLM-based integration for data workflows
  • Experience working with Git, CI/CD, and Agile delivery frameworks
  • Strong communication skills for interacting with clients, stakeholders, and internal teams
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL/ELT pipelines across cloud and big data platforms
  • Contribute to architectural discussions by translating business needs into data solutions spanning ingestion, transformation, and consumption layers
  • Work closely with solutioning and pre-sales teams for technical evaluations and client-facing discussions
  • Lead squads of L0/L1 engineers—ensuring delivery quality, mentoring, and guiding career growth
  • Develop cloud-native data engineering solutions using Python, SQL, PySpark, and modern data frameworks
  • Ensure data reliability, performance, and maintainability across the pipeline lifecycle—from development to deployment
  • Support long-term ODC/T&M projects by demonstrating expertise during technical discussions and interviews
  • Integrate emerging GenAI tools where applicable to enhance data enrichment, automation, and transformations
What we offer
What we offer
  • Opportunity to work at the intersection of Data Engineering, Cloud, and Generative AI
  • Hands-on exposure to modern data stacks and emerging AI technologies
  • Collaboration with experts across Data, AI/ML, and cloud practices
  • Access to structured learning, certifications, and leadership mentoring
  • Competitive compensation with fast-track career growth and visibility
  • Fulltime
Read More
Arrow Right

Senior Data Engineering Architect

Location
Location
Poland
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven work experience as a Data Engineering Architect or a similar role and strong experience in in the Data & Analytics area
  • Strong understanding of data engineering concepts, including data modeling, ETL processes, data pipelines, and data governance
  • Expertise in designing and implementing scalable and efficient data processing frameworks
  • In-depth knowledge of various data technologies and tools, such as relational databases, NoSQL databases, data lakes, data warehouses, and big data frameworks (e.g., Hadoop, Spark)
  • Experience in selecting and integrating appropriate technologies to meet business requirements and long-term data strategy
  • Ability to work closely with stakeholders to understand business needs and translate them into data engineering solutions
  • Strong analytical and problem-solving skills, with the ability to identify and address complex data engineering challenges
  • Proficiency in Python, PySpark, SQL
  • Familiarity with cloud platforms and services, such as AWS, GCP, or Azure, and experience in designing and implementing data solutions in a cloud environment
  • Knowledge of data governance principles and best practices, including data privacy and security regulations
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business requirements and translate them into data engineering solutions
  • Design and oversee the overall data architecture and infrastructure, ensuring scalability, performance, security, maintainability, and adherence to industry best practices
  • Define data models and data schemas to meet business needs, considering factors such as data volume, velocity, variety, and veracity
  • Select and integrate appropriate data technologies and tools, such as databases, data lakes, data warehouses, and big data frameworks, to support data processing and analysis
  • Create scalable and efficient data processing frameworks, including ETL (Extract, Transform, Load) processes, data pipelines, and data integration solutions
  • Ensure that data engineering solutions align with the organization's long-term data strategy and goals
  • Evaluate and recommend data governance strategies and practices, including data privacy, security, and compliance measures
  • Collaborate with data scientists, analysts, and other stakeholders to define data requirements and enable effective data analysis and reporting
  • Provide technical guidance and expertise to data engineering teams, promoting best practices and ensuring high-quality deliverables. Support to team throughout the implementation process, answering questions and addressing issues as they arise
  • Oversee the implementation of the solution, ensuring that it is implemented according to the design documents and technical specifications
What we offer
What we offer
  • Stable employment. On the market since 2008, 1500+ talents currently on board in 7 global sites
  • Workation. Enjoy working from inspiring locations in line with our workation policy
  • Great Place to Work® certified employer
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs. Lingarians earn 500+ technology certificates yearly
  • Upskilling support. Capability development programs, Competency Centers, knowledge sharing sessions, community webinars, 110+ training opportunities yearly
  • Grow as we grow as a company. 76% of our managers are internal promotions
Read More
Arrow Right

Senior Azure Data Engineer

Seeking a Lead AI DevOps Engineer to oversee design and delivery of advanced AI/...
Location
Location
Poland
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 6 years of professional experience in the Data & Analytics area
  • 1+ years of experience (or acting as) in the Senior Consultant or above role with a strong focus on data solutions build in Azure and Databricks/Synapse/(MS Fabric is nice to have)
  • Proven experience in Azure cloud-based infrastructure, Databricks and one of SQL implementation (e.g., Oracle, T-SQL, MySQL, etc.)
  • Proficiency in programming languages such as SQL, Python, PySpark is essential (R or Scala nice to have)
  • Very good level of communication including ability to convey information clearly and specifically to co-workers and business stakeholders
  • Working experience in the agile methodologies – supporting tools (JIRA, Azure DevOps)
  • Experience in leading and managing a team of data engineers, providing guidance, mentorship, and technical support
  • Knowledge of data management principles and best practices, including data governance, data quality, and data integration
  • Good project management skills, with the ability to prioritize tasks, manage timelines, and deliver high-quality results within designated deadlines
  • Excellent problem-solving and analytical skills, with the ability to identify and resolve complex data engineering issues
Job Responsibility
Job Responsibility
  • Act as a senior member of the Data Science & AI Competency Center, AI Engineering team, guiding delivery and coordinating workstreams
  • Develop and execute a cloud data strategy aligned with organizational goals
  • Lead data integration efforts, including ETL processes, to ensure seamless data flow
  • Implement security measures and compliance standards in cloud environments
  • Continuously monitor and optimize data solutions for cost-efficiency
  • Establish and enforce data governance and quality standards
  • Leverage Azure services, as well as tools like dbt and Databricks, for efficient data pipelines and analytics solutions
  • Work with cross-functional teams to understand requirements and provide data solutions
  • Maintain comprehensive documentation for data architecture and solutions
  • Mentor junior team members in cloud data architecture best practices
What we offer
What we offer
  • Stable employment
  • “Office as an option” model
  • Workation
  • Great Place to Work® certified employer
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs
  • Upskilling support
Read More
Arrow Right

Senior Data Engineer

Figure is an AI Robotics company developing a general-purpose humanoid. Our huma...
Location
Location
United States , San Jose
Salary
Salary:
140000.00 - 350000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Data Engineering, or a related field
  • 3+ years of experience in data engineering, preferably with time-series or log data processing
  • Proficiency in Python with experience in Pandas, Polars, or PySpark for large-scale data processing
  • Strong understanding of database design, indexing, and query optimization (SQL and NoSQL)
  • Experience handling complex data formats such as Parquet, MCAP, or protobuf
  • Experience building custom web based data visualization tools (JavaScript, React…)
  • Familiarity with data visualization tools like Grafana for real-time analysis and monitoring
  • Experience with distributed computing frameworks and cloud-based data storage solutions
  • Strong debugging skills and ability to work with lab teams to interpret robotic system logs
Job Responsibility
Job Responsibility
  • Develop and maintain pipelines and tools to transform robot logs to make it easier to access, visualize, and automatically detect events of interest
  • Optimize data processing to reduce the time needed between data offload and the availability of the data to our engineering teams
  • Design and optimize data storage solutions for handling complex, high-volume time-series and structured data
  • Build and maintain database schemas and queries to support analytics and visualization of extracted patterns
  • Support mechanical, electrical, software, integration and test engineers with their needs to extract and visualize data
  • Develop dashboards and custom data visualizations tools to enable engineers to quickly extract information from the data and track robot performance
  • Integrate your solutions with existing data pipelines and our robot testing framework
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.