CrawlJobs Logo

Senior PySpark Data Engineer

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Pune

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly skilled and experienced Senior PySpark Data Engineer to join our dynamic data engineering team. The ideal candidate will have a strong background in building and managing large-scale data processing systems and a proven track record of working with cutting-edge Big Data technologies. You will be responsible for designing, developing, and maintaining our data pipelines, ensuring they are efficient, reliable, and scalable to meet our growing business needs.

Job Responsibility:

  • Design, develop, and maintain robust, scalable, and high-performance data pipelines using PySpark
  • Develop, schedule, and monitor complex data workflows using orchestration tools like Apache Airflow
  • Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions
  • Optimize and tune Spark jobs for performance and efficiency
  • Implement data quality checks and ensure data integrity across all data pipelines
  • Design and implement data models for optimal storage and retrieval
  • Mentor junior data engineers and promote best practices in data engineering
  • Ensure compliance with data governance and security policies
  • Troubleshoot and resolve data-related issues in a timely manner

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field
  • 6+ years of professional experience in a data engineering role
  • Extensive hands-on experience with PySpark and advanced Python programming skills
  • Proven experience with Big Data ecosystems, including Cloudera and/or DataBricks
  • Hands-on experience with distributed query engines like Starburst (Trino/Presto)
  • Proficient in designing and managing complex workflows using scheduling tools, particularly Apache Airflow
  • Strong expertise in SQL and experience with relational and non-relational databases
  • Solid understanding of data warehousing concepts, ETL/ELT processes, and data modeling techniques
  • Experience working in a Linux/Unix environment
  • GIT HUB, CI/CD Pipeline

Additional Information:

Job Posted:
April 11, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior PySpark Data Engineer

Senior Data Engineer

At Ingka Investments (Part of Ingka Group – the largest owner and operator of IK...
Location
Location
Netherlands , Leiden
Salary
Salary:
Not provided
https://www.ikea.com Logo
IKEA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Formal qualifications (BSc, MSc, PhD) in computer science, software engineering, informatics or equivalent
  • Minimum 3 years of professional experience as a (Junior) Data Engineer
  • Strong knowledge in designing efficient, robust and automated data pipelines, ETL workflows, data warehousing and Big Data processing
  • Hands-on experience with Azure data services like Azure Databricks, Unity Catalog, Azure Data Lake Storage, Azure Data Factory, DBT and Power BI
  • Hands-on experience with data modeling for BI & ML for performance and efficiency
  • The ability to apply such methods to solve business problems using one or more Azure Data and Analytics services in combination with building data pipelines, data streams, and system integration
  • Experience in driving new data engineering developments (e.g. apply new cutting edge data engineering methods to improve performance of data integration, use new tools to improve data quality and etc.)
  • Knowledge of DevOps practices and tools including CI/CD pipelines and version control systems (e.g., Git)
  • Proficiency in programming languages such as Python, SQL, PySpark and others relevant to data engineering
  • Hands-on experience to deploy code artifacts into production
Job Responsibility
Job Responsibility
  • Contribute to the development of D&A platform and analytical tools, ensuring easy and standardized access and sharing of data
  • Subject matter expert for Azure Databrick, Azure Data factory and ADLS
  • Help design, build and maintain data pipelines (accelerators)
  • Document and make the relevant know-how & standard available
  • Ensure pipelines and consistency with relevant digital frameworks, principles, guidelines and standards
  • Support in understand needs of Data Product Teams and other stakeholders
  • Explore ways create better visibility on data quality and Data assets on the D&A platform
  • Identify opportunities for data assets and D&A platform toolchain
  • Work closely together with partners, peers and other relevant roles like data engineers, analysts or architects across IKEA as well as in your team
What we offer
What we offer
  • Opportunity to develop on a cutting-edge Data & Analytics platform
  • Opportunities to have a global impact on your work
  • A team of great colleagues to learn together with
  • An environment focused on driving business and personal growth together, with focus on continuous learning
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Senior Data Engineer position at Checkr, building the data platform to power saf...
Location
Location
United States , San Francisco
Salary
Salary:
162000.00 - 190000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of development experience in the field of data engineering
  • 5+ years writing PySpark
  • Experience building large-scale (100s of Terabytes and Petabytes) data processing pipelines - batch and stream
  • Experience with ETL/ELT, stream and batch processing of data at scale
  • Strong proficiency in PySpark and Python
  • Expertise in understanding of database systems, data modeling, relational databases, NoSQL (such as MongoDB)
  • Experience with big data technologies such as Kafka, Spark, Iceberg, Datalake and AWS stack (EKS, EMR, Serverless, Glue, Athena, S3, etc.)
  • Knowledge of security best practices and data privacy concerns
  • Strong problem-solving skills and attention to detail
Job Responsibility
Job Responsibility
  • Create and maintain data pipelines and foundational datasets to support product/business needs
  • Design and build database architectures with massive and complex data, balancing with computational load and cost
  • Develop audits for data quality at scale, implementing alerting as necessary
  • Create scalable dashboards and reports to support business objectives and enable data-driven decision-making
  • Troubleshoot and resolve complex issues in production environments
  • Work closely with product managers and other stakeholders to define and implement new features
What we offer
What we offer
  • Learning and development reimbursement allowance
  • Competitive compensation and opportunity for professional and personal advancement
  • 100% medical, dental, and vision coverage for employees and dependents
  • Additional vacation benefits of 5 extra days and flexibility to take time off
  • Reimbursement for work from home equipment
  • Lunch four times a week
  • Commuter stipend
  • Abundance of snacks and beverages
  • Fulltime
Read More
Arrow Right

Senior Big Data Engineer

The Big Data Engineer is a senior level position responsible for establishing an...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ Years of Experience in Big Data Engineering (PySpark)
  • Data Pipeline Development: Design, build, and maintain scalable ETL/ELT pipelines to ingest, transform, and load data from multiple sources
  • Big Data Infrastructure: Develop and manage large-scale data processing systems using frameworks like Apache Spark, Hadoop, and Kafka
  • Proficiency in programming languages like Python, or Scala
  • Strong expertise in data processing frameworks such as Apache Spark, Hadoop
  • Expertise in Data Lakehouse technologies (Apache Iceberg, Apache Hudi, Trino)
  • Experience with cloud data platforms like AWS (Glue, EMR, Redshift), Azure (Synapse), or GCP (BigQuery)
  • Expertise in SQL and database technologies (e.g., Oracle, PostgreSQL, etc.)
  • Experience with data orchestration tools like Apache Airflow or Prefect
  • Familiarity with containerization (Docker, Kubernetes) is a plus
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citigroup, its clients and assets
  • Fulltime
Read More
Arrow Right

Senior Big Data Engineer

The Big Data Engineer is a senior level position responsible for establishing an...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ Years of Experience in Big Data Engineering (PySpark)
  • Data Pipeline Development: Design, build, and maintain scalable ETL/ELT pipelines to ingest, transform, and load data from multiple sources
  • Big Data Infrastructure: Develop and manage large-scale data processing systems using frameworks like Apache Spark, Hadoop, and Kafka
  • Proficiency in programming languages like Python, or Scala
  • Strong expertise in data processing frameworks such as Apache Spark, Hadoop
  • Expertise in Data Lakehouse technologies (Apache Iceberg, Apache Hudi, Trino)
  • Experience with cloud data platforms like AWS (Glue, EMR, Redshift), Azure (Synapse), or GCP (BigQuery)
  • Expertise in SQL and database technologies (e.g., Oracle, PostgreSQL, etc.)
  • Experience with data orchestration tools like Apache Airflow or Prefect
  • Familiarity with containerization (Docker, Kubernetes) is a plus
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
What we offer
What we offer
  • Well-being support
  • Growth opportunities
  • Work-life balance support
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

At Blue Margin, we are on a mission to build the go-to data platform for PE-back...
Location
Location
United States , Fort Collins
Salary
Salary:
110000.00 - 140000.00 USD / Year
bluemargin.com Logo
Blue Margin
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
  • 5+ years of professional experience in data engineering, with emphasis on Python & PySpark/Apache Spark
  • Proven ability to manage large datasets and optimize for speed, scalability, and reliability
  • Strong SQL skills and understanding of relational and distributed data systems
  • Experience with Azure Data Factory, Synapse Pipelines, Fivetran, Delta Lake, Microsoft Fabric, or Snowflake
  • Knowledge of data modeling, orchestration, and Delta/Parquet file management best practices
  • Familiarity with CI/CD, version control, and DevOps practices for data pipelines
  • Experience leveraging AI-assisted tools to accelerate engineering workflows
  • Strong communication skills
  • ability to convey complex technical details to both engineers and business stakeholders
Job Responsibility
Job Responsibility
  • Architect, design, and optimize large-scale data pipelines using tools like PySpark, SparkSQL, Delta Lake, and cloud-native tools
  • Drive efficiency in incremental/delta data loading, partitioning, and performance tuning
  • Lead implementations across Azure Synapse, Microsoft Fabric, and/or Snowflake environments
  • Collaborate with stakeholders and analysts to translate business needs into scalable data solutions
  • Evaluate and incorporate AI/automation to improve development speed, testing, and data quality
  • Oversee and mentor junior data engineers, establishing coding standards and best practices
  • Ensure high standards for data quality, security, and governance
  • Participate in solution design for client engagements, balancing technical depth with practical outcomes
What we offer
What we offer
  • Competitive pay
  • strong benefits
  • flexible hybrid work setup
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Join a leading energy sector analytics company as we expand our innovative data ...
Location
Location
Poland
Salary
Salary:
Not provided
edvantis.com Logo
Edvantis
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of experience as a Data Engineer, with a proven track record of successful projects
  • Solid experience with relational database systems, particularly SQL Server
  • Advanced proficiency in Python and PySpark – the languages of data manipulation and analysis
  • Expertise in Databricks as a distributed data engineering platform
  • Expertise with Airflow and Grafana
  • Ability to collaborate effectively within a team environment and meet project deadlines
  • Strong communication skills and fluency in English
Job Responsibility
Job Responsibility
  • Develop and maintain scalable data pipelines using Python, SQL, AWS services(Amazon Bedrock, S3), and Databricks
  • Build and optimize ETL jobs in Databricks using PySpark, ensuring efficient processing of large-scale distributed datasets
  • Play a pivotal role in enhancing the breadth and depth of our courthouse data products
  • Utilize your Python expertise to parse complex datasets, manipulate intricate image data, and craft innovative data products that meet our customers’ evolving needs
  • Champion data quality, consistency, and reliability throughout our product lifecycle
  • Contribute to the development of new features and the continuous improvement of existing data systems
  • Design and implement distributed data engineering solutions in Databricks, leveraging PySpark for optimized workflows
What we offer
What we offer
  • Remote-first work model with flexible working hours (we provide all equipment)
  • Comfortable and fully equipped offices in Lviv and Rzeszów
  • Competitive compensation with regular performance reviews
  • 18 paid vacation days per year + all state holidays
  • 12 days of paid sick leave per year without a medical certificate + extra paid leave for blood donation
  • Medical insurance with an affordable family coverage option
  • Mental health program which includes free and confidential consultations with a psychologist
  • English, German, and Polish language courses
  • Corporate subscription to learning platforms, regular meetups and webinars
  • Friendly team that values accountability, innovation, teamwork, and customer satisfaction
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Figure is an AI Robotics company developing a general-purpose humanoid. Our huma...
Location
Location
United States , San Jose
Salary
Salary:
140000.00 - 350000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Data Engineering, or a related field
  • 3+ years of experience in data engineering, preferably with time-series or log data processing
  • Proficiency in Python with experience in Pandas, Polars, or PySpark for large-scale data processing
  • Strong understanding of database design, indexing, and query optimization (SQL and NoSQL)
  • Experience handling complex data formats such as Parquet, MCAP, or protobuf
  • Experience building custom web based data visualization tools (JavaScript, React…)
  • Familiarity with data visualization tools like Grafana for real-time analysis and monitoring
  • Experience with distributed computing frameworks and cloud-based data storage solutions
  • Strong debugging skills and ability to work with lab teams to interpret robotic system logs
Job Responsibility
Job Responsibility
  • Develop and maintain pipelines and tools to transform robot logs to make it easier to access, visualize, and automatically detect events of interest
  • Optimize data processing to reduce the time needed between data offload and the availability of the data to our engineering teams
  • Design and optimize data storage solutions for handling complex, high-volume time-series and structured data
  • Build and maintain database schemas and queries to support analytics and visualization of extracted patterns
  • Support mechanical, electrical, software, integration and test engineers with their needs to extract and visualize data
  • Develop dashboards and custom data visualizations tools to enable engineers to quickly extract information from the data and track robot performance
  • Integrate your solutions with existing data pipelines and our robot testing framework
  • Fulltime
Read More
Arrow Right

Senior Azure Data Engineer

Seeking a Lead AI DevOps Engineer to oversee design and delivery of advanced AI/...
Location
Location
Poland
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 6 years of professional experience in the Data & Analytics area
  • 1+ years of experience (or acting as) in the Senior Consultant or above role with a strong focus on data solutions build in Azure and Databricks/Synapse/(MS Fabric is nice to have)
  • Proven experience in Azure cloud-based infrastructure, Databricks and one of SQL implementation (e.g., Oracle, T-SQL, MySQL, etc.)
  • Proficiency in programming languages such as SQL, Python, PySpark is essential (R or Scala nice to have)
  • Very good level of communication including ability to convey information clearly and specifically to co-workers and business stakeholders
  • Working experience in the agile methodologies – supporting tools (JIRA, Azure DevOps)
  • Experience in leading and managing a team of data engineers, providing guidance, mentorship, and technical support
  • Knowledge of data management principles and best practices, including data governance, data quality, and data integration
  • Good project management skills, with the ability to prioritize tasks, manage timelines, and deliver high-quality results within designated deadlines
  • Excellent problem-solving and analytical skills, with the ability to identify and resolve complex data engineering issues
Job Responsibility
Job Responsibility
  • Act as a senior member of the Data Science & AI Competency Center, AI Engineering team, guiding delivery and coordinating workstreams
  • Develop and execute a cloud data strategy aligned with organizational goals
  • Lead data integration efforts, including ETL processes, to ensure seamless data flow
  • Implement security measures and compliance standards in cloud environments
  • Continuously monitor and optimize data solutions for cost-efficiency
  • Establish and enforce data governance and quality standards
  • Leverage Azure services, as well as tools like dbt and Databricks, for efficient data pipelines and analytics solutions
  • Work with cross-functional teams to understand requirements and provide data solutions
  • Maintain comprehensive documentation for data architecture and solutions
  • Mentor junior team members in cloud data architecture best practices
What we offer
What we offer
  • Stable employment
  • “Office as an option” model
  • Workation
  • Great Place to Work® certified employer
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs
  • Upskilling support
Read More
Arrow Right