CrawlJobs Logo

Pyspark Technical Lead

https://www.soprasteria.com Logo

Sopra Steria

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will collaborate closely with our Data Scientists to develop and deploy machine learning models. Proficiency in the listed skills will be crucial in building and maintaining pipelines for training and inference datasets.

Job Responsibility:

  • Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines
  • Utilize PySpark for data processing, transformation, and preparation for model training
  • Leverage AWS EMR and S3 for scalable and efficient data storage and processing
  • Implement and manage ETL workflows using Streamsets for data ingestion and transformation
  • Design and construct pipelines to deliver high-quality training and inference datasets
  • Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities
  • Optimize and fine-tune pipelines for performance, scalability, and reliability
  • Ensure IAM policies and permissions are appropriately configured for secure data access and management
  • Implement Spark architecture and optimize Spark jobs for scalable data processing

Requirements:

  • Proficiency in Advanced SQL (Window functions), Spark Architecture, Pyspark or Scala with Spark, Hadoop
  • Proven expertise in designing and deploying data pipelines
  • Strong problem-solving skills and ability to work effectively in a collaborative team environment
  • Excellent communication skills and ability to translate technical concepts to non-technical stakeholder

Nice to have:

  • Hands-on experience with Airflow, S3, and Streamsets or similar ETL tools
  • Understanding of real-time or near real-time inferencing architectures
  • Basic Knowledge on Kafka, AWS IAM, AWS EMR, and Snowflake
What we offer:
  • Inclusive and respectful work environment
  • Open positions for people with disabilities

Additional Information:

Job Posted:
April 26, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Pyspark Technical Lead

Pyspark Module Lead

We are seeking a highly skilled and motivated Data Engineer to join our dynamic ...
Location
Location
India , Noida
Salary
Salary:
Not provided
https://www.soprasteria.com Logo
Sopra Steria
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in Advanced SQL (Window functions), Spark Architecture, Pyspark or Scala with Spark, Hadoop
  • Proven expertise in designing and deploying data pipelines
  • Strong problem-solving skills and ability to work effectively in a collaborative team environment
  • Excellent communication skills and ability to translate technical concepts to non-technical stakeholders
Job Responsibility
Job Responsibility
  • Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines
  • Utilize PySpark for data processing, transformation, and preparation for model training
  • Leverage AWS EMR and S3 for scalable and efficient data storage and processing
  • Implement and manage ETL workflows using Streamsets for data ingestion and transformation
  • Design and construct pipelines to deliver high-quality training and inference datasets
  • Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities
  • Optimize and fine-tune pipelines for performance, scalability, and reliability
  • Ensure IAM policies and permissions are appropriately configured for secure data access and management
  • Implement Spark architecture and optimize Spark jobs for scalable data processing
What we offer
What we offer
  • All positions are open to people with disabilities
  • Commitment to fighting against all forms of discrimination
  • Inclusive and respectful work environment
Read More
Arrow Right

Big Data Lead Developer

We are seeking a highly skilled and experienced Big Data Lead Developer to estab...
Location
Location
Canada , Mississauga
Salary
Salary:
170.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of relevant experience in Big Data application development or systems analysis role
  • Experience in leading and mentoring big data engineering teams
  • Strong understanding of big data concepts, architectures, and technologies (e.g., Hadoop, PySpark, Hive, Kafka, NoSQL databases)
  • Proficiency in programming languages such as Java, Scala, or Python
  • Excellent problem-solving and analytical skills
  • Strong presentation, communication and interpersonal skills
  • Experience with data warehousing and business intelligence tools
  • Experience with data visualization and reporting
  • Knowledge of cloud-based big data platforms (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc)
  • Proficiency in Unix/Linux environments
Job Responsibility
Job Responsibility
  • Lead and mentor a team of big data engineers, fostering a collaborative and high-performing environment
  • Provide technical guidance, code reviews, and support for professional development
  • Design and implement scalable and robust big data architectures and pipelines to handle large volumes of data from various sources
  • Evaluate and select appropriate big data technologies and tools based on project requirements and industry best practices
  • Implement and integrate these technologies into our existing infrastructure
  • Develop and optimize data processing and analysis workflows using technologies such as Spark, Hadoop, Hive, and other relevant tools
  • Implement data quality checks and ensure adherence to data governance policies and procedures
  • Continuously monitor and optimize the performance of big data systems and pipelines to ensure efficient data processing and retrieval
  • Collaborate effectively with cross-functional teams, including data scientists, business analysts, and product managers, to understand their data needs and deliver impactful solutions
  • Stay up to date with the latest advancements in big data technologies and explore new tools and techniques to improve our data infrastructure
What we offer
What we offer
  • Global benefits designed to support your well-being, growth, and work-life balance
  • Fulltime
Read More
Arrow Right

Technical Planning Architect

Job Description
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
genzeon.com Logo
Genzeon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in supply chain planning, business analysis, or related fields
  • O9 integrations experience & familiarity with PySpark, SQL, Big Data Environments, managing integration pipelines using Airflow
  • Hands-on experience with o9 Solutions or similar advanced planning tools (e.g., SAP IBP, Blue Yonder, Kinaxis)
  • Strong analytical skills with a deep understanding of supply chain processes (demand, supply, inventory, and S&OP)
  • Excellent problem-solving abilities and attention to detail
  • Proficiency in data analysis tools (Excel, SQL,R, Python, or similar)
  • Ability to effectively communicate complex concepts to technical and non-technical audiences
  • Bachelor’s degree in supply chain management, Computers, Business Administration, or related field
Job Responsibility
Job Responsibility
  • Collaborate with cross-functional teams to design and implement scalable supply chain planning solutions
  • Leverage o9’s platform to develop integrated planning processes, including demand forecasting, inventory optimization, and supply planning
  • Engage with stakeholders to gather and document business requirements, ensuring alignment with strategic goals
  • Conduct gap analyses to identify areas of improvement and develop actionable insights
  • Lead supply chain planning initiatives, ensuring timely delivery of high-quality solutions
  • Act as the bridge between business teams and technical teams, translating business needs into system functionalities
  • Analyze current supply chain processes to identify inefficiencies and recommend best practices for optimization
  • Implement key metrics and KPIs to measure and improve supply chain performance
  • Provide training and support to end-users on planning systems and tools
  • Create and maintain documentation, including user guides and standard operating procedures
Read More
Arrow Right

Senior Data Engineering Architect

Location
Location
Poland
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven work experience as a Data Engineering Architect or a similar role and strong experience in in the Data & Analytics area
  • Strong understanding of data engineering concepts, including data modeling, ETL processes, data pipelines, and data governance
  • Expertise in designing and implementing scalable and efficient data processing frameworks
  • In-depth knowledge of various data technologies and tools, such as relational databases, NoSQL databases, data lakes, data warehouses, and big data frameworks (e.g., Hadoop, Spark)
  • Experience in selecting and integrating appropriate technologies to meet business requirements and long-term data strategy
  • Ability to work closely with stakeholders to understand business needs and translate them into data engineering solutions
  • Strong analytical and problem-solving skills, with the ability to identify and address complex data engineering challenges
  • Proficiency in Python, PySpark, SQL
  • Familiarity with cloud platforms and services, such as AWS, GCP, or Azure, and experience in designing and implementing data solutions in a cloud environment
  • Knowledge of data governance principles and best practices, including data privacy and security regulations
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business requirements and translate them into data engineering solutions
  • Design and oversee the overall data architecture and infrastructure, ensuring scalability, performance, security, maintainability, and adherence to industry best practices
  • Define data models and data schemas to meet business needs, considering factors such as data volume, velocity, variety, and veracity
  • Select and integrate appropriate data technologies and tools, such as databases, data lakes, data warehouses, and big data frameworks, to support data processing and analysis
  • Create scalable and efficient data processing frameworks, including ETL (Extract, Transform, Load) processes, data pipelines, and data integration solutions
  • Ensure that data engineering solutions align with the organization's long-term data strategy and goals
  • Evaluate and recommend data governance strategies and practices, including data privacy, security, and compliance measures
  • Collaborate with data scientists, analysts, and other stakeholders to define data requirements and enable effective data analysis and reporting
  • Provide technical guidance and expertise to data engineering teams, promoting best practices and ensuring high-quality deliverables. Support to team throughout the implementation process, answering questions and addressing issues as they arise
  • Oversee the implementation of the solution, ensuring that it is implemented according to the design documents and technical specifications
What we offer
What we offer
  • Stable employment. On the market since 2008, 1500+ talents currently on board in 7 global sites
  • Workation. Enjoy working from inspiring locations in line with our workation policy
  • Great Place to Work® certified employer
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs. Lingarians earn 500+ technology certificates yearly
  • Upskilling support. Capability development programs, Competency Centers, knowledge sharing sessions, community webinars, 110+ training opportunities yearly
  • Grow as we grow as a company. 76% of our managers are internal promotions
Read More
Arrow Right

Data Engineering Architect

Data engineering involves the development of solutions for the collection, trans...
Location
Location
India
Salary
Salary:
Not provided
lingarogroup.com Logo
Lingaro
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years’ experience in the Data & Analytics area
  • 4+ years’ experience into Data Engineering Architecture
  • Proficiency in Python, PySpark, SQL
  • Strong expertise in Azure cloud services such as: ADF, databricks, pyspark, Logic app
  • Strong understanding of data engineering concepts, including data modeling, ETL processes, data pipelines, and data governance
  • Expertise in designing and implementing scalable and efficient data processing frameworks
  • In-depth knowledge of various data technologies and tools, such as relational databases, NoSQL databases, data lakes, data warehouses, and big data frameworks (e.g., Hadoop, Spark)
  • Experience in selecting and integrating appropriate technologies to meet business requirements and long-term data strategy
  • Ability to work closely with stakeholders to understand business needs and translate them into data engineering solutions
  • Strong analytical and problem-solving skills, with the ability to identify and address complex data engineering challenges
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business requirements and translate them into data engineering solutions
  • Design and oversee the overall data architecture and infrastructure, ensuring scalability, performance, security, maintainability, and adherence to industry best practices
  • Define data models and data schemas to meet business needs, considering factors such as data volume, velocity, variety, and veracity
  • Select and integrate appropriate data technologies and tools, such as databases, data lakes, data warehouses, and big data frameworks, to support data processing and analysis
  • Create scalable and efficient data processing frameworks, including ETL (Extract, Transform, Load) processes, data pipelines, and data integration solutions
  • Ensure that data engineering solutions align with the organization's long-term data strategy and goals
  • Evaluate and recommend data governance strategies and practices, including data privacy, security, and compliance measures
  • Collaborate with data scientists, analysts, and other stakeholders to define data requirements and enable effective data analysis and reporting
  • Provide technical guidance and expertise to data engineering teams, promoting best practices and ensuring high-quality deliverables
  • Support to team throughout the implementation process, answering questions and addressing issues as they arise
What we offer
What we offer
  • Stable employment
  • “Office as an option” model
  • Flexibility regarding working hours and your preferred form of contract
  • Comprehensive online onboarding program with a “Buddy” from day 1
  • Cooperation with top-tier engineers and experts
  • Unlimited access to the Udemy learning platform from day 1
  • Certificate training programs
  • Upskilling support
  • Internal Gallup Certified Strengths Coach to support your growth
  • Grow as we grow as a company
Read More
Arrow Right

Big Data / PySpark Engineering Lead - Vice President

The Applications Development Technology Lead Analyst is a senior level position ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly experienced and skilled technical lead with 12+years of experience with software building and platform engineering
  • Experience in Data Engineering, focused on Big Data ecosystems
  • Knowledge in Hadoop, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
  • Programming Expert level and hand on experience in Python
  • Familiarity with data formats like Avro, Parquet, CSV, JSON
  • Hands-on experience in writing SQL queries
  • Highly experienced with Unix based operating systems and shell scripting
  • Experience with source code management tools such as Bitbucket, Git etc
  • Big Data Tech Proficiency and hands-on in Hadoop, Spark, Hive, Kafka, and NoSQL databases (MongoDB, HBase)
  • Experience working with query engines like Trino, Presto, Starburst
Job Responsibility
Job Responsibility
  • Design and implement scalable, fault-tolerant batch and real-time data processing pipelines
  • Develop robust data models and schema designs optimized for both performance and storage efficiency
  • Evaluate and integrate emerging tools and frameworks (e.g., Spark, Flink, Kafka) into the existing stack
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Legacy Systems Decommissioning: Lead the strategic migration of data and logic from legacy platforms (e.g. on-premises SQL Servers) to a modern Data Lakehouse environment
  • ETL/ELT Transformation: Re-engineer existing stored procedures and complex legacy ETL jobs into scalable, distributed processing frameworks using Spark (Python) and Starburst/Trino
  • Validation & Parity Testing: Design and implement automated frameworks for Data Parity Testing to ensure 100% accuracy and consistency between legacy outputs and new big data results
  • Schema Evolution: Map and transform rigid, legacy relational schemas into flexible, high-performance formats optimized for the cloud (e.g., Parquet, Avro, or Iceberg)
  • Phased Cutover Management: Orchestrate a phased migration strategy (Parallel Run, Shadow Execution) to ensure zero downtime for downstream business applications and reporting tools
  • Fulltime
Read More
Arrow Right

Team Lead, Performance Optimization

We are looking for a Team Lead to build, mentor, and guide a hybrid team of Opti...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience bridging Data Analytics/Data Science and payments
  • 5+ years of experience in a formal people leadership role, with a proven ability to lead teams that balance technical depth, customer impact, and operational excellence
  • Technical expertise in Python, SQL, PySpark, and scalable data processing
  • hands-on experience with big data platforms and ETL pipelines
  • Experience in automation, analytics tooling, and driving scalable solutions in a data-driven organization
  • Excellent stakeholder management skills, with the ability to influence Product and Commercial partners through data storytelling, structured thinking, and actionable insights
  • Clear communicator and confident presenter, able to engage senior external and internal stakeholders
  • Entrepreneurial mindset with strong prioritization skills, a high sense of ownership, and the ability to thrive in a fast-paced, global environment
Job Responsibility
Job Responsibility
  • Build, coach, and scale a high-performing hybrid team of strategists and data analysts, setting clear expectations around quality, impact, and technical excellence
  • Lead merchant-facing optimization engagements, ensuring recommendations are data-driven, product-aligned, and deliver measurable improvements in risk, fraud, and cost efficiency
  • Guide the team in designing and implementing automated data solutions, utilizing Adyen products, big data platforms, and analytics tools to identify risks and opportunities. Champion scalable solutions (e.g., automated processes, dashboards, ETL pipelines) that improve operational efficiency without sacrificing quality
  • Own team prioritization across merchant work, upskilling, and strategic initiatives
  • manage senior stakeholders and communicate clear trade-offs
  • Partner cross-functionally with Commercial, Product, Risk, and Operations to translate merchant insights into roadmap influence and position the team as subject matter experts in payments, fraud, and risk
Read More
Arrow Right

Team Lead, Performance Optimization

We are looking for a Team Lead to build, mentor, and guide a hybrid team of Opti...
Location
Location
Singapore
Salary
Salary:
Not provided
adyen.com Logo
Adyen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience bridging Data Analytics/Data Science and payments
  • 2+ years of experience in a formal people leadership role, with a proven ability to lead teams that balance technical depth, customer impact, and operational excellence
  • Strong technical expertise in Python, SQL, PySpark, and scalable data processing
  • hands-on experience with big data platforms and ETL pipelines
  • Experience in automation, analytics tooling, and driving scalable solutions in a data-driven organization
  • Excellent stakeholder management skills, with the ability to influence Product and Commercial partners through data storytelling, structured thinking, and actionable insights
  • Clear communicator and confident presenter, able to engage senior external and internal stakeholders
  • Entrepreneurial mindset with strong prioritization skills, a high sense of ownership, and the ability to thrive in a fast-paced, global environment
Job Responsibility
Job Responsibility
  • Build, coach, and scale a high-performing hybrid team of strategists and data analysts, setting clear expectations around quality, impact, and technical excellence
  • Lead merchant-facing optimization engagements, ensuring recommendations are data-driven, product-aligned, and deliver measurable improvements in risk, fraud, and cost efficiency
  • Guide the team in designing and implementing automated data solutions, utilizing Adyen products, big data platforms, and analytics tools to identify risks and opportunities. Champion scalable solutions (e.g., automated processes, dashboards, ETL pipelines) that improve operational efficiency without sacrificing quality
  • Own team prioritization across merchant work, up-skilling, and strategic initiatives
  • manage senior stakeholders and communicate clear trade-offs
  • Partner cross-functionally with Commercial, Product, Risk, and Operations to translate merchant insights into roadmap influence and position the team as subject matter experts in payments, fraud, and risk
Read More
Arrow Right