CrawlJobs Logo

Python PySpark Developer

India, Bangalore · Job Posted May 04, 2026
Apply Position
Job Link Share

Job Description

Key Responsibilities: • Designing and developing robust PySpark applications for large-scale data processing. • Building and optimizing data ingestion, transformation, and storage processes. • Implementing efficient algorithms and data structures for distributed computing. • Collaborating with cross-functional teams to integrate data-driven solutions into business processes. • Troubleshooting performance bottlenecks and ensuring high availability and reliability of data pipelines. • Writing and optimizing SQL queries for data extraction and manipulation. Required Skills and Qualifications: • Bachelor’s/Master’s degree in Computer Science, Engineering, Proven experience (3-10 years) in Python development with a focus on PySpark. • Strong understanding of distributed computing principles and experience with Apache Spark. • Proficiency in SQL and experience with relational databases (MySQL, PostgreSQL, etc.). • Experience with data serialization formats such as JSON, Parquet, Avro. • Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus. • Excellent problem-solving skills and ability to work independently or as part of a team. • Good communication skills with the ability to effectively collaborate with stakeholders. Overture Rede is an Equal Opportunity Employer and does not discriminate on the basis of race or ethnicity, religion, sex, national origin, age, veteran disability or genetic information or any other reason prohibited by law in employment.

Job Responsibility

  • Designing and developing robust PySpark applications for large-scale data processing
  • Building and optimizing data ingestion, transformation, and storage processes
  • Implementing efficient algorithms and data structures for distributed computing
  • Collaborating with cross-functional teams to integrate data-driven solutions into business processes
  • Troubleshooting performance bottlenecks and ensuring high availability and reliability of data pipelines
  • Writing and optimizing SQL queries for data extraction and manipulation

Requirements

  • Bachelor’s/Master’s degree in Computer Science, Engineering, Proven experience (3-10 years) in Python development with a focus on PySpark
  • Strong understanding of distributed computing principles and experience with Apache Spark
  • Proficiency in SQL and experience with relational databases (MySQL, PostgreSQL, etc.)
  • Experience with data serialization formats such as JSON, Parquet, Avro
  • Excellent problem-solving skills and ability to work independently or as part of a team
  • Good communication skills with the ability to effectively collaborate with stakeholders

Nice to have

Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Python PySpark Developer

8 matching positions

Python Developer (PySpark)

We are looking for a skilled Python Developer with hands-on experience in PySpar...
Location
Location
India , Bangalore South
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience: 2 to 4 Years
  • BE/BTech/BCA/MCA or equivalent degree in Computer Science, IT, or related field
Job Responsibility
Job Responsibility
  • Develop and maintain applications/scripts using Python and PySpark
  • Work on data processing, transformation, and optimization of large datasets
  • Develop scalable ETL/data pipeline solutions
  • Collaborate with business and technical teams to understand requirements
  • Write clean, efficient, and reusable code following best practices
  • Perform debugging, troubleshooting, and performance tuning
  • Manage code repositories and version control using GitHub
  • Participate in code reviews and Agile development activities
  • Ensure proper documentation and adherence to coding standards
  • Fulltime
Read More
Arrow Right

Big Data Pyspark Developer

Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4-8 years of experience with software building and platform engineering
  • Experience in Data Engineering, focused on Big Data ecosystems
  • Knowledge in Hadoop, Pyspark, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
  • Programming Expert level and hand on experience in Python
  • Familiarity with data formats like Avro, Parquet, CSV, JSON
  • Hands-on experience in writing SQL queries
  • Experienced with Unix based operating systems and shell scripting
  • Experience with source code management tools such as Bitbucket, Git etc.
  • Strong computer science fundamentals in data structures, algorithms, databases, and operating systems
  • Reverse Engineering, ability to read spaghetti SQL or old scripts and document the business logic before moving it
Job Responsibility
Job Responsibility
  • Formulate and define systems scope and project objectives through research activities
  • Analyze business client needs and document requirements by utilizing business analysis procedures and concepts
  • Assess applicability of previous or similar experiences and evaluate options under circumstances not covered by procedures or precedents
  • Identify risk and consider business implications of the application of technology to the current business environment
  • Create and prepare reports, metrics and presentations
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
What we offer
What we offer
  • Programs and services for physical and mental well-being including access to telehealth options, health advocates, confidential counseling and more
  • Access to an array of learning and development resources to help broaden and deepen your skills and knowledge as your career progresses
  • Fulltime
Read More
Arrow Right

Clinical Python Developer

ICON plc is a world-leading healthcare intelligence and clinical research organi...
Location
Location
Ireland , Dublin
Salary
Salary:
Not provided
iconplc.com Logo
iconplc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years’ experience as a Python Programmer / Visualisation Developer / Business Analyst utilising data warehouse pipelines
  • Strong hands-on experience with Python (and relevant libraries/packages) for data transformations and manipulations
  • PySpark experience preferred
  • Pandas experienced also considered
  • Exposure to and understanding of CDSIC guided data structures within a regulated clinical trial environment
  • Mapping experience and domain specific therapeutic area knowledge preferred
  • Solid development skills within Tibco Spotfire or equivalent data visualization tools (Power BI, Tableau, Qlik etc.)
  • Databricks experience preferred
Job Responsibility
Job Responsibility
  • Be responsible for delivery in a continuous improvement framework with respect to systems deployment, life cycle management and enhancements supported user groups
  • Contribute to initiatives to yield faster and more effective report development, efficient change management services for analytics platforms
  • Use your technical expertise with clinical data and report building, and experience gained through working with diverse and complex business processes and associated system infrastructures
  • Work in a matrix environment across functions and departments
What we offer
What we offer
  • Various annual leave entitlements
  • A range of health insurance offerings to suit you and your family’s needs
  • Competitive retirement planning offerings to maximize savings and plan with confidence for the years ahead
  • Global Employee Assistance Programme, LifeWorks, offering 24-hour access to a global network of over 80,000 independent specialized professionals who are there to support you and your family’s well-being
  • Life assurance
  • Flexible country-specific optional benefits, including childcare vouchers, bike purchase schemes, discounted gym memberships, subsidized travel passes, health assessments, among others
  • Fulltime
Read More
Arrow Right

Senior Python Developer

FinXL by Randstad Digital focuses on developing client's Networking, Digital and...
Location
Location
Australia , North Sydney
Salary
Salary:
Not provided
finxl.com.au Logo
FinXL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Python Engineer experience
  • Python & ETL: Essential expert-level proficiency
  • Airflow Mastery: A solid grasp of Airflow concepts is mandatory including: Creating and operating Airflow tasks, orchestrating complex workflows, handling various data flow challenges-bottlenecks
  • Airflow orchestration experience
  • PySpark experience
  • Data pipelines experience
  • Candidates must demonstrate a logical approach to ETL pipelines & the ability to think through edge-case scenarios
  • Experience troubleshooting complex ETL scenarios
Read More
Arrow Right

Python Developer (Data Engineering/AI)

We are looking for a mid-level Python Developer with combined experience in Data...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of hands-on Python programming experience
  • Strong fundamentals in Python, OOP, and design patterns
  • Experience with NLP libraries such as Flair, BERT, HuggingFace Transformers, or similar
  • Solid experience with PySpark, Pandas, PyArrow, and distributed data pipelines
  • Proficient in working with Parquet using FastParquet or pyarrow.parquet
  • Familiarity with fast JSON parsing libraries (json, ujson, orjson)
  • Experience building APIs using Flask (FastAPI is a plus)
  • Experience with MLflow for model tracking and deployment
  • Good understanding of CI/CD practices and Git workflows
  • Experience working with Redis or similar in-memory stores
Job Responsibility
Job Responsibility
  • Develop and optimize ETL/data processing jobs using PySpark, Pandas, PyArrow, and related libraries
  • Work with Parquet files using FastParquet or pyarrow.parquet for efficient data processing
  • Implement data parsing and serialization using json, ujson, or orjson for high-performance JSON handling
  • Build and maintain NLP pipelines using Flair, BERT, and LLM-based models
  • Develop scalable ingestion and data transformation pipelines for AI and analytics use cases
  • Build and maintain Flask-based APIs for model inference and service integrations
  • Use regular expressions for text cleaning, parsing, and NLP preprocessing
  • Integrate caching and fast lookups using Redis
  • Manage and deploy ML models using MLflow for tracking and versioning
  • Support CI/CD workflows using GitHub, LightSpeed Enterprise, and deployment pipelines
  • Fulltime
Read More
Arrow Right

Python Developer - NLP, ML, Gen AI

Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of hands-on Python programming experience
  • Strong fundamentals in Python, OOP, and design patterns
  • Experience with NLP libraries such as Flair, BERT, HuggingFace Transformers, or similar
  • Solid experience with PySpark, Pandas, PyArrow, and distributed data pipelines
  • Proficient in working with Parquet using FastParquet or pyarrow.parquet
  • Familiarity with fast JSON parsing libraries (json, ujson, orjson)
  • Experience building APIs using Flask (FastAPI is a plus)
  • Experience with MLflow for model tracking and deployment
  • Good understanding of CI/CD practices and Git workflows
  • Experience working with Redis or similar in-memory stores
Job Responsibility
Job Responsibility
  • Develop and optimize ETL/data processing jobs using PySpark, Pandas, PyArrow, and related libraries
  • Work with Parquet files using FastParquet or pyarrow.parquet for efficient data processing
  • Implement data parsing and serialization using json, ujson, or orjson for high-performance JSON handling
  • Build and maintain NLP pipelines using Flair, BERT, and LLM-based models
  • Develop scalable ingestion and data transformation pipelines for AI and analytics use cases
  • Build and maintain Flask-based APIs for model inference and service integrations
  • Use regular expressions for text cleaning, parsing, and NLP preprocessing
  • Integrate caching and fast lookups using Redis
  • Manage and deploy ML models using MLflow for tracking and versioning
  • Support CI/CD workflows using GitHub, LightSpeed Enterprise, and deployment pipelines
  • Fulltime
Read More
Arrow Right

Python Developer - NLP, ML, Gen AI

We are looking for a mid-level Python Developer - NLP, ML, Gen AI with combined ...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3–5 years of hands-on Python programming experience
  • Strong fundamentals in Python, OOP, and design patterns
  • Experience with NLP libraries such as Flair, BERT, HuggingFace Transformers, or similar
  • Solid experience with PySpark, Pandas, PyArrow, and distributed data pipelines
  • Proficient in working with Parquet using FastParquet or pyarrow.parquet
  • Familiarity with fast JSON parsing libraries (json, ujson, orjson)
  • Experience building APIs using Flask (FastAPI is a plus)
  • Experience with MLflow for model tracking and deployment
  • Good understanding of CI/CD practices and Git workflows
  • Experience working with Redis or similar in-memory stores
Job Responsibility
Job Responsibility
  • Develop and optimize ETL/data processing jobs using PySpark, Pandas, PyArrow, and related libraries
  • Work with Parquet files using FastParquet or pyarrow.parquet for efficient data processing
  • Implement data parsing and serialization using json, ujson, or orjson for high-performance JSON handling
  • Build and maintain NLP pipelines using Flair, BERT, and LLM-based models
  • Develop scalable ingestion and data transformation pipelines for AI and analytics use cases
  • Build and maintain Flask-based APIs for model inference and service integrations
  • Use regular expressions for text cleaning, parsing, and NLP preprocessing
  • Integrate caching and fast lookups using Redis
  • Manage and deploy ML models using MLflow for tracking and versioning
  • Support CI/CD workflows using GitHub, LightSpeed Enterprise, and deployment pipelines
  • Fulltime
Read More
Arrow Right

Junior Python Developer

We are looking for a motivated and detail‑oriented Junior Python / PySpark Devel...
Location
Location
Ireland , Dublin
Salary
Salary:
57330.00 - 80270.00 EUR / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2-5 years of relevant experience in the Financial Service industry
  • Intermediate level experience in Applications Development role
  • Consistently demonstrates clear and concise written and verbal communication
  • Demonstrated problem-solving and decision-making skills
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Bachelor’s degree/University degree or equivalent experience
Job Responsibility
Job Responsibility
  • Assist in the development, testing, and maintenance of Python applications and data pipelines
  • Work with Python data libraries such as Pandas and NumPy for data analysis, transformation, and validation
  • Support PySpark‑based data processing for large‑scale datasets under guidance from senior engineers
  • Write clean, efficient, and well‑documented Python code
  • Debug and fix issues in batch and data‑processing jobs
  • Participate in code reviews and follow best practices in performance and data handling
  • Collaborate with cross‑functional teams including data engineers, QA, and DevOps
What we offer
What we offer
  • Hybrid working model (up to 2 days working at home per week)
  • Competitive base salary (annually reviewed)
  • Fulltime
Read More
Arrow Right