CrawlJobs Logo

Python/Pyspark Engineer

realign-llc.com Logo

Realign

Location Icon

Location:
United States , Jersey City

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

115000.00 USD / Year

Job Responsibility:

  • Design and develop scalable Python/PySpark ingestion and transformation pipelines
  • Implement schema evolution logic, validation frameworks, and resilient error-handling mechanisms
  • Optimize Spark jobs for performance, cost efficiency, and production readiness
  • Integrate all jobs into automated CI/CD pipelines, ensuring versioning and release governance
  • Work closely with Ops teams to ensure proper monitoring, logging, and operational supportability
  • Participate in Agile ceremonies, sprint planning, code reviews, and demo sessions

Requirements:

  • Strong proficiency in Python, packaging, dependency management, and virtual environments
  • Hands-on experience with PySpark, including Spark performance tuning (partitioning, caching, broadcast joins, memory optimization)
  • Expertise in data ingestion (batch/stream), schema management, and robust error-handling/retry logic
  • Solid unit and integration testing practices, including data quality validations
  • Experience with CI/CD pipelines (Azure DevOps/Jenkins), Git branching strategies, and artifact versioning
  • Working experience with Cloudera/Hadoop (HDFS, Spark, Hive/Impala) and Databricks (Delta Lake, clusters, jobs, notebooks)
  • Knowledge of observability techniques: structured logging, metrics, tracing, and debugging in distributed systems
  • Secure coding practices including secrets management, PII protection, and compliance-aware development
  • Strong documentation discipline for frameworks, reusable components, and best-practice patterns
  • Effective collaboration with Cloud Architects and Data Ops to ensure stable and supportable pipelines
  • Clear communication of technical ideas and solution approaches
  • Comfort working in Agile environments with iterative development and frequent releases

Additional Information:

Job Posted:
March 21, 2026

Employment Type:
Fulltime
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Python/Pyspark Engineer

Python/Pyspark Engineer

Location
Location
Slovakia , Bratislava
Salary
Salary:
Not provided
signifytechnology.com Logo
Signify Technology
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • min. 4-year demonstrable project experience in the field of software Python engineering
  • language SQL for searching and manipulating data
  • framework PySpark or equivalent for creating and optimizing complex data pipelines
  • Scrum/Agile development methodologies
  • working in a global distributed team in a multicultural environment
  • ability to clearly express technical topics to a non-technical audience
  • active knowledge of English at a communicative level (min. B2-C1)
  • min. Bachelor's or equivalent degree in computer science, data science or a similar discipline
Job Responsibility
Job Responsibility
  • Development of a modern Lakehouse architecture based on Azure Datalake using Python and the PySpark framework for implementing business services in the field of insurance
  • implementation of business functions that will allow you to run accounting processes and generate data to meet reporting requirements
  • designing, developing, automating and supporting backend applications that combine data elements from multiple domains and systems
  • cooperation with: other engineers, analysts, product owners and stakeholders to deliver value-added solutions that meet business needs and expectations
  • team lead engineer to create a target architecture for products within the team's scope
  • design of data transformation and data flow services and active participation in coding
  • presentation and communication of ideas and proposals to various stakeholders for the purpose of evaluation and brainstorming
  • implementation of software engineering practices to ensure the quality, performance and sustainability of applications
  • performing peer code reviews
Read More
Arrow Right

Associate MLOps Analyst

The Associate MLOps Analyst will be a key member of Circle K's Data & Analytics ...
Location
Location
India , Gurugram
Salary
Salary:
Not provided
https://www.circlek.com Logo
Circle K
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree required, preferably with a quantitative focus (Statistics, Business Analytics, Data Science, Math, Economics, etc.)
  • Master’s degree preferred (MBA/MS Computer Science/M.Tech Computer Science, etc.)
  • 1-2 years of relevant working experience in MLOps
  • Knowledge of core computer science concepts such as common data structures and algorithms, OOPs
  • Programming languages (R, Python, PySpark, etc.)
  • Big data technologies & framework (AWS, Azure, GCP, Hadoop, Spark, etc.)
  • Enterprise reporting systems, relational (MySQL, Microsoft SQL Server etc.), non-relational (MongoDB, DynamoDB) database management systems and Data Engineering tools
  • Exposure to ETL tools and version controlling
  • Experience in building and maintaining CI/CD pipelines for ML models
  • Understanding of machine-learning, information retrieval or recommendation systems
Job Responsibility
Job Responsibility
  • Collaborate with data scientists to deploy ML models into production environments
  • Implement and maintain CI/CD pipelines for machine learning workflows
  • Use version control tools (e.g., Git) and ML lifecycle management tools (e.g., MLflow) for model tracking, versioning, and management
  • Design, build as well as optimize applications containerization and orchestration with Docker and Kubernetes and cloud platforms like AWS or Azure
  • Automating pipelines using understanding of Apache Spark and ETL tools like Informatica PowerCenter, Informatica BDM or DEI, Stream Sets and Apache Airflow
  • Implement model monitoring and alerting systems to track model performance, accuracy, and data drift in production environments
  • Work closely with data scientists to ensure that models are production-ready
  • Collaborate with Data Engineering and Tech teams to ensure infrastructure is optimized for scaling ML applications
  • Optimize ML pipelines for performance and cost-effectiveness
  • Help the Data teams leverage best practices to implement Enterprise level solutions
Read More
Arrow Right

Data Engineer

This role involves designing, building, and optimizing data ingestion, transform...
Location
Location
United States , Radnor
Salary
Salary:
120000.00 - 150000.00 USD / Year
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional data engineering experience
  • Strong hands‑on expertise with: Azure Databricks (Spark/PySpark), Azure Data Factory (pipelines, data flows, orchestration), Azure Data Lake Storage, SQL and Python/PySpark scripting
  • Experience building scalable, reliable ETL/ELT solutions in cloud environments
  • Familiarity with CI/CD, version control, and DevOps workflows for data solutions
Job Responsibility
Job Responsibility
  • Designing, building, and optimizing data ingestion, transformation, and delivery pipelines that support enterprise analytics, reporting, and operational data needs
  • Fulltime
Read More
Arrow Right

Data Engineer

We are seeking a Data Engineer with strong experience in Azure cloud technologie...
Location
Location
United States , Radnor
Salary
Salary:
120000.00 - 150000.00 USD / Year
bhsg.com Logo
Beacon Hill
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of professional data engineering experience
  • Strong hands‑on expertise with: Azure Databricks (Spark/PySpark), Azure Data Factory (pipelines, data flows, orchestration), Azure Data Lake Storage, SQL and Python/PySpark scripting
  • Experience building scalable, reliable ETL/ELT solutions in cloud environments
  • Familiarity with CI/CD, version control, and DevOps workflows for data solutions
Job Responsibility
Job Responsibility
  • Designing, building, and optimizing data ingestion, transformation, and delivery pipelines that support enterprise analytics, reporting, and operational data needs
  • Fulltime
Read More
Arrow Right

Graduate Data Engineer

As a Graduate Data Engineer, you will build and maintain scalable data pipelines...
Location
Location
United Kingdom , Marlow
Salary
Salary:
Not provided
srgtalent.com Logo
SRG
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Engineering, Mathematics, or similar, or similar work experience
  • Up to 2 years of experience building data pipelines at work or through internships
  • Can write clear and reliable Python/PySpark code
  • Familiar with popular analytics tools (like pandas, numpy, matplotlib), big data frameworks (like Spark), and cloud services (like Palantir, AWS, Azure, or Google Cloud)
  • Deep understanding of data models, relational and non-relational databases, and how they are used to organize, store, and retrieve data efficiently for analytics and machine learning
  • Knowledge about software engineering methods, including DevOps, DataOps, or MLOps is a plus
  • Master's degree in engineering (such as AI/ML, Data Systems, Computer Science, Mathematics, Biotechnology, Physics), or minimum 2 years of relevant technology experience
  • Experience with Generative AI (GenAI) and agentic systems will be considered a strong plus
  • Have a proactive and adaptable mindset: willing to take initiative, learn new skills, and contribute to different aspects of a project as needed to drive solutions from start to finish, even beyond the formal job description
  • Show a strong ability to thrive in situations of ambiguity, taking initiative to create clarity for yourself and the team, and proactively driving progress even when details are uncertain or evolving
Job Responsibility
Job Responsibility
  • Build and maintain data pipelines, leveraging PySpark and/or Typescript within Foundry, to transform raw data into reliable, usable datasets
  • Assist in preparing and optimizing data pipelines to support machine learning and AI model development, ensuring datasets are clean, well-structured, and readily usable by Data Science teams
  • Support the integration and management of feature engineering processes and model outputs into Foundry's data ecosystem, helping enable scalable deployment and monitoring of AI/ML solutions
  • Engaged in gathering and translating stakeholder requirements for key data models and reporting, with a focus on Palantir Foundry workflows and tools
  • Participate in developing and refining dashboards and reports in Foundry to visualize key metrics and insights
  • Collaborate with Product, Engineering, and GTM teams to align data architecture and solutions, learning to support scalable, self-serve analytics across the organization
  • Have some prompt engineering experience with large language models, including writing and evaluating complex multi-step prompts
  • Continuously develop your understanding of the company's data landscape, including Palantir Foundry's ontology-driven approach and best practices for data management
Read More
Arrow Right

Data Test Engineer

Jump into a critical 6-month contract where your expertise in data integrity wil...
Location
Location
New Zealand , Auckland
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
March 20, 2026
Flip Icon
Requirements
Requirements
  • Hands-on experience in Data Quality Testing or QA roles
  • Solid understanding of ETL concepts, data transformations, and relational database principles (SQL)
  • Practical exposure to Python scripting for building reusable validation frameworks
  • Ability to contribute to CI/CD testing processes
  • Valid New Zealand work rights
Job Responsibility
Job Responsibility
  • Design and Execute data quality tests to ensure processed data meets all functional and business requirements
  • Validate the accuracy and integrity of transformed datasets across various ETL pipelines
  • Develop Automation scripts using Python/PySpark to streamline validation within our existing framework
  • Identify and Escalate data anomalies or pipeline issues before they impact the business
  • Collaborate closely with Data Engineers and Data Analysts to resolve technical issues and document findings
What we offer
What we offer
  • Hybrid (at least 3 days in-office) with flexible hours
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Sr Data Engineer. SR DE-I: Highly skilled Data Engineer with minimum 5+ years of...
Location
Location
India , Kolkata
Salary
Salary:
Not provided
inxiteout.ai Logo
InXiteOut
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 5+ years of relevant experience in SQL, PySpark, ETL, Data Lakes and Azure Tech Stack
  • 3+ years of experience in building data Pipelines with Python/PySpark
  • 4+ years of experience in Azure ETL stack (eg. Blog Storage, Data Lake, Data Factory, Synapse)
  • 4+ years of experience with SQL
  • Proficient understanding of code versioning tools such as Git and PM tool like Jira
  • Excellent verbal and written communication skills
  • UG: B.Sc in Any Specialization, BCA in Any Specialization, B.Tech/B.E. in Any Specialization
  • A good internet connection is a must
  • Fulltime
Read More
Arrow Right

Test Engineer - Data & Analytics

A leading New Zealand organisation is seeking a skilled Test Engineer to join a ...
Location
Location
New Zealand , Auckland
Salary
Salary:
Not provided
https://www.randstad.com Logo
Randstad
Expiration Date
March 20, 2026
Flip Icon
Requirements
Requirements
  • Hands-on experience in data quality testing, QA, or related data-centric roles
  • Solid understanding of ETL concepts, data transformations, and pipeline logic
  • Proficiency in SQL and relational database principles for querying and validating datasets
  • Exposure to Python scripting for building automation and validation frameworks
  • Strong grasp of functional testing principles and a methodical, edge-case-driven approach
  • Proactive problem-solver with a mindset geared toward improving test reliability
  • Excellent collaboration skills, with the ability to work effectively across technical teams
  • Commitment to continuous learning and adapting to evolving data technologies
Job Responsibility
Job Responsibility
  • Design and execute data quality tests to ensure processed data meets functional and business requirements
  • Validate the accuracy and completeness of transformed datasets across ETL pipelines
  • Identify and escalate data anomalies, inconsistencies, and pipeline quality issues
  • Develop and run test cases for data pipelines, including business logic validation
  • Implement automated validation scripts using Python/PySpark within reusable frameworks
  • Contribute to CI/CD testing processes to enable continuous delivery of reliable data products
  • Document test plans and findings, collaborating with technical teams to resolve issues
What we offer
What we offer
  • Competitive market rates
  • Chance to work on modern data stacks
  • Hybrid working (at least 3 days in-office)
  • Fulltime
Read More
Arrow Right