Hadoop PySpark, Python, Apache Kafka Job at Realign (Charlotte, NC / New York, NY / Dallas, TX / Jersey City, NJ)

Hadoop PySpark, Python, Apache Kafka

Architectural Leadership: Define end-to-end architecture for data platforms, str...

Location

United States , Charlotte, NC / New York, NY/ Dallas, TX / Jersey City, NJ

Salary:

160000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Minimum 9 years experience
Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
Proficiency in PySpark for distributed data processing
Advanced programming skills in Python
Hands-on experience with Apache Kafka for real-time streaming
Frontend development using Angular (TypeScript, HTML, CSS)
Expertise in designing scalable, secure, and high-performance systems
Familiarity with microservices, API design, and cloud-native architectures
Knowledge of CI/CD pipelines, containerization (Docker/Kubernetes)
Exposure to cloud platforms (AWS, Azure, GCP)

Job Responsibility

Define end-to-end architecture for data platforms, streaming systems, and web applications
Ensure alignment with enterprise standards, security, and compliance requirements
Evaluate emerging technologies and recommend adoption strategies
Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
Optimize ETL workflows for large-scale datasets and real-time streaming
Integrate Apache Kafka for event-driven architectures and messaging
Build and maintain backend services using Python and microservices architecture
Develop responsive, dynamic front-end applications using Angular
Implement RESTful APIs and ensure seamless integration between components
Work closely with product owners, business analysts, and DevOps teams

Fulltime

Hadoop PySpark, Python, Apache Kafka

Location

United States , Charlotte; New York; Dallas; Jersey City

Salary:

160000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Primary Skill: Hadoop ecosystem (HDFS, Hive, Spark), PySpark, Python, Apache Kafka
Secondary: UI – Angular
Experience: Minimum 9 years
Technical Expertise: Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
Proficiency in PySpark for distributed data processing
Advanced programming skills in Python
Hands-on experience with Apache Kafka for real-time streaming
Frontend development using Angular (TypeScript, HTML, CSS)
Architectural Skills: Expertise in designing scalable, secure, and high-performance systems
Familiarity with microservices, API design, and cloud-native architectures

Job Responsibility

Architectural Leadership: Define end-to-end architecture for data platforms, streaming systems, and web applications
Ensure alignment with enterprise standards, security, and compliance requirements
Evaluate emerging technologies and recommend adoption strategies
Data Engineering: Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
Optimize ETL workflows for large-scale datasets and real-time streaming
Integrate Apache Kafka for event-driven architectures and messaging
Application Development: Build and maintain backend services using Python and microservices architecture
Develop responsive, dynamic front-end applications using Angular
Implement RESTful APIs and ensure seamless integration between components
Collaboration & Leadership: Work closely with product owners, business analysts, and DevOps teams

Fulltime

Hadoop PySpark, Python, Apache Kafka

Architectural Leadership, Data Engineering, Application Development, Collaborati...

Location

United States , Charlotte, NC / New York, NY/ Dallas, TX / Jersey City, NJ

Salary:

160000.00 USD / Year

Realign

Expiration Date

Until further notice

Requirements

Minimum 9 years experience
Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
Proficiency in PySpark for distributed data processing
Advanced programming skills in Python
Hands-on experience with Apache Kafka for real-time streaming
Frontend development using Angular (TypeScript, HTML, CSS)
Expertise in designing scalable, secure, and high-performance systems
Familiarity with microservices, API design, and cloud-native architectures
Knowledge of CI/CD pipelines, containerization (Docker/Kubernetes)
Exposure to cloud platforms (AWS, Azure, GCP)

Job Responsibility

Define end-to-end architecture for data platforms, streaming systems, and web applications
Ensure alignment with enterprise standards, security, and compliance requirements
Evaluate emerging technologies and recommend adoption strategies
Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
Optimize ETL workflows for large-scale datasets and real-time streaming
Integrate Apache Kafka for event-driven architectures and messaging
Build and maintain backend services using Python and microservices architecture
Develop responsive, dynamic front-end applications using Angular
Implement RESTful APIs and ensure seamless integration between components
Work closely with product owners, business analysts, and DevOps teams

Fulltime

Senior Data Analytics Engineer

We are looking for a Senior Data Analytics Engineer to help shape and scale mode...

Location

United States , Reston

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

At least 5 years of experience in data engineering, analytics engineering, or a comparable data-centric role
Demonstrated expertise in dimensional modeling, including the design of fact tables, dimension tables, and slowly changing dimensions
Strong hands-on capability with Python, PySpark, SparkSQL, and notebook-based data development
Experience delivering end-to-end data solutions built on medallion architecture principles
Working knowledge of modern big data technologies such as Apache Spark, Hadoop, Kafka, and ETL frameworks
Familiarity with machine learning and AI concepts within contemporary analytics environments
Experience using Git within team-based software or data development workflows
Strong communication skills with the ability to align technical delivery with business objectives

Job Responsibility

Design, build, and maintain scalable data pipelines that ingest, transform, and deliver trusted datasets for analytics and operational use
Develop curated data layers using medallion-style architecture to improve data quality, accessibility, and consistency across the platform
Create and optimize dimensional models, including fact and dimension structures, to support reporting, trend analysis, and business intelligence needs
Use Python, PySpark, SparkSQL, and notebook-based development environments to engineer efficient data processing workflows
Partner with business and technical stakeholders to gather requirements and translate them into practical data products and analytics solutions
Apply data governance, security, and enterprise data management standards to protect information and support compliant data usage
Contribute to collaborative development practices through version control, code review, and shared engineering standards using Git
Support advanced analytics initiatives by preparing data foundations that can be used for machine learning and AI-driven use cases

What we offer

medical, vision, dental, and life and disability insurance
401(k) plan

Data Engineer

We are looking for an experienced Data Engineer to join a team delivering modern...

Location

United States , Poughkeepsie

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Hands-on experience building data engineering solutions with Databricks and Apache Spark
Strong programming ability in Python, including development of ETL and data transformation workflows
Knowledge of lakehouse and big data technologies such as Delta Lake, Apache Hadoop, and Apache Kafka
Experience working with Azure Data Lake Storage Gen2 or comparable cloud-based data storage platforms
Ability to optimize distributed data processing jobs and troubleshoot performance issues in Spark environments
Familiarity with data governance, data quality, and security practices for enterprise data platforms
Comfortable working independently and collaborating with cross-functional teams in an agile delivery model
Proven ability to analyze technical problems, break them into manageable components, and implement effective solutions.

Job Responsibility

Create and support scalable data pipelines in Databricks using Spark technologies such as PySpark or Scala to process and deliver high-quality data
Develop lakehouse architectures on Azure Data Lake Storage Gen2 and ensure strong integration with Databricks for efficient data management
Establish and monitor data quality controls and governance practices within the platform using validation methods and Delta Lake capabilities
Investigate pipeline and application inefficiencies, then implement tuning strategies to improve Spark and Databricks performance
Work closely with analysts and other stakeholders to translate business data needs into refined, analytics-ready datasets
Automate ingestion, transformation, testing, and release processes, including integration with CI/CD workflows where appropriate
Provide guidance to less experienced engineers by sharing best practices for Databricks development, optimization, and support
Maintain clear technical documentation for notebooks, workflows, data models, configurations, and operational procedures
Protect data assets by applying security controls and compliance standards across the Databricks environment
Contribute to design sessions, solve complex data issues, and uphold change management and data integrity standards while delivering large assignments on schedule

What we offer

Medical, vision, dental, and life and disability insurance
enrollment in company 401(k) plan

Fulltime

Senior Data Analytics Engineer

We are looking for a Senior Data Analytics Engineer to help shape and scale mode...

Location

United States , Reston

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

At least 5 years of experience in data engineering, analytics engineering, or a comparable data-centric role
Demonstrated expertise in dimensional modeling, including the design of fact tables, dimension tables, and slowly changing dimensions
Strong hands-on capability with Python, PySpark, SparkSQL, and notebook-based data development
Experience delivering end-to-end data solutions built on medallion architecture principles
Working knowledge of modern big data technologies such as Apache Spark, Hadoop, Kafka, and ETL frameworks
Familiarity with machine learning and AI concepts within contemporary analytics environments
Experience using Git within team-based software or data development workflows
Strong communication skills with the ability to align technical delivery with business objectives

Job Responsibility

Design, build, and maintain scalable data pipelines that ingest, transform, and deliver trusted datasets for analytics and operational use
Develop curated data layers using medallion-style architecture to improve data quality, accessibility, and consistency across the platform
Create and optimize dimensional models, including fact and dimension structures, to support reporting, trend analysis, and business intelligence needs
Use Python, PySpark, SparkSQL, and notebook-based development environments to engineer efficient data processing workflows
Partner with business and technical stakeholders to gather requirements and translate them into practical data products and analytics solutions
Apply data governance, security, and enterprise data management standards to protect information and support compliant data usage
Contribute to collaborative development practices through version control, code review, and shared engineering standards using Git
Support advanced analytics initiatives by preparing data foundations that can be used for machine learning and AI-driven use cases

What we offer

medical
vision
dental
life and disability insurance
401(k) plan

Fulltime

Pyspark Data Engineer

The Pyspark Data Engineer is responsible for participation in the establishment ...

Location

Canada , Mississauga

Salary:

79320.00 - 110680.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

2-5 years of relevant experience in the Financial Service industry
Intermediate level experience in Applications Development role
Consistently demonstrates clear and concise written and verbal communication
Demonstrated problem-solving and decision-making skills
Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
Big Data Infrastructure: Develop and manage large-scale data processing systems using frameworks like Apache Spark, Hadoop, and Kafka
Proficiency in Python programming
Expertise in data processing frameworks such as Apache Spark, Hadoop
Expertise in Data Lakehouse technologies (Apache Iceberg, Trino, Deltalake)
Expertise in SQL and database technologies (e.g., Oracle, PostgreSQL, etc.)

Job Responsibility

Utilize knowledge of applications development procedures and concepts, and basic knowledge of other technical areas to identify and define necessary system enhancements, including using script tools and analyzing/interpreting code
Consult with users, clients, and other technology groups on issues, and recommend programming solutions, install, and support customer exposure systems
Apply fundamental knowledge of programming languages for design specifications
Analyze applications to identify vulnerabilities and security issues, as well as conduct testing and debugging
Serve as advisor or coach to new or lower level analysts
Identify problems, analyze information, and make evaluative judgements to recommend and implement solutions
Resolve issues by identifying and selecting solutions through the applications of acquired technical experience and guided by precedents
Has the ability to operate with a limited level of direct supervision
Can exercise independence of judgement and autonomy
Acts as SME to senior stakeholders and /or other team members

Fulltime

Lead Python Full Stack Data Engineer

We are assembling an A-team of highly skilled, autonomous, and visionary enginee...

Location

Canada , Mississauga

Salary:

120800.00 - 170800.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

6+ years of progressive, hands-on experience as a Senior/Lead Data Engineer
Expert-level proficiency in Python
Deep expertise in developing highly optimized, scalable, and production-grade PySpark applications
Deep architectural understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming, Spark MLlib)
Advanced proficiency with Hive for enterprise data warehousing
Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
Master-level proficiency in SQL, complex query optimization, and advanced data warehousing concepts
Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase)
Expert-level experience with Apache Kafka

Job Responsibility

Lead and Architect end-to-end data solutions
Drive Strategic Initiatives within small, co-located squads
Act as a Player/Coach
Design, Develop, and Optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques
Architect and Implement sophisticated data storage solutions leveraging a diverse set of big data technologies
Champion Data Modeling and Governance
Strategically Engage with data consumers, data scientists, and business stakeholders
Lead the Implementation of real-time data streaming and complex event-driven architectures
Enforce and Evolve Best Practices in data engineering and software development
Exhibit High Autonomy and Agency

Fulltime

Select Country

Hadoop PySpark, Python, Apache Kafka

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?

Hadoop PySpark, Python, Apache Kafka

Hadoop PySpark, Python, Apache Kafka

Hadoop PySpark, Python, Apache Kafka

Hadoop PySpark, Python, Apache Kafka

Senior Data Analytics Engineer

Data Engineer

Senior Data Analytics Engineer

Pyspark Data Engineer

Lead Python Full Stack Data Engineer

Our AI answers in your language