CrawlJobs Logo

Pyspark Module Lead

https://www.soprasteria.com Logo

Sopra Steria

Location Icon

Location:
India, Noida

Category Icon
Category:
IT - Software Development

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will collaborate closely with our Data Scientists to develop and deploy machine learning models. Responsibilities include working with PySpark, AWS EMR, and S3 for data processing, designing machine learning pipelines, optimizing pipelines for performance, and managing ETL workflows using Streamsets.

Job Responsibility:

  • Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines
  • Utilize PySpark for data processing, transformation, and preparation for model training
  • Leverage AWS EMR and S3 for scalable and efficient data storage and processing
  • Implement and manage ETL workflows using Streamsets for data ingestion and transformation
  • Design and construct pipelines to deliver high-quality training and inference datasets
  • Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities
  • Optimize and fine-tune pipelines for performance, scalability, and reliability
  • Ensure IAM policies and permissions are appropriately configured for secure data access and management
  • Implement Spark architecture and optimize Spark jobs for scalable data processing

Requirements:

  • Proficiency in Advanced SQL (Window functions), Spark Architecture, Pyspark or Scala with Spark, Hadoop
  • Proven expertise in designing and deploying data pipelines
  • Strong problem-solving skills and ability to work effectively in a collaborative team environment
  • Excellent communication skills and ability to translate technical concepts to non-technical stakeholders

Nice to have:

  • Hands-on experience with Airflow, S3, and Stream sets or similar ETL tools
  • Understanding of real-time or near real-time inferencing architectures
  • Basic knowledge on Kafka, AWS IAM, AWS EMR and Snowflake
What we offer:
  • All positions are open to people with disabilities
  • Commitment to fighting against all forms of discrimination
  • Inclusive and respectful work environment

Additional Information:

Job Posted:
April 26, 2025

Job Link Share:
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.