CrawlJobs Logo

Hadoop PySpark, Python, Apache Kafka

United States, Charlotte, NC / New York, NY / Dallas, TX / Jersey City, NJ 160000.00 USD / Year · Job Posted March 21, 2026
Apply Position
Job Link Share

Job Description

Role: Hadoop PySpark, Python, Apache Kafka. FTE only. Architectural Leadership, Data Engineering, Application Development, Collaboration & Leadership.

Job Responsibility

  • Define end-to-end architecture for data platforms, streaming systems, and web applications
  • Ensure alignment with enterprise standards, security, and compliance requirements
  • Evaluate emerging technologies and recommend adoption strategies
  • Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
  • Optimize ETL workflows for large-scale datasets and real-time streaming
  • Integrate Apache Kafka for event-driven architectures and messaging
  • Build and maintain backend services using Python and microservices architecture
  • Develop responsive, dynamic front-end applications using Angular
  • Implement RESTful APIs and ensure seamless integration between components
  • Work closely with product owners, business analysts, and DevOps teams
  • Mentor junior developers and data engineers
  • Participate in agile ceremonies, code reviews, and design discussions

Requirements

  • Minimum 9 years experience in software development
  • Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
  • Proficiency in PySpark for distributed data processing
  • Advanced programming skills in Python
  • Hands-on experience with Apache Kafka for real-time streaming
  • Frontend development using Angular (TypeScript, HTML, CSS)
  • Expertise in designing scalable, secure, and high-performance systems
  • Familiarity with microservices, API design, and cloud-native architectures
  • Knowledge of CI/CD pipelines, containerization (Docker/Kubernetes)
  • Exposure to cloud platforms (AWS, Azure, GCP)
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
  • 9+ years in software development, with at least 4+ years in architecture and Big Data technologies

Nice to have

  • BFSI domain experience or large-scale enterprise systems
  • Understanding of data governance, security, and compliance standards

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Hadoop PySpark, Python, Apache Kafka

8 matching positions

Hadoop PySpark, Python, Apache Kafka

Architectural Leadership: Define end-to-end architecture for data platforms, str...
Location
Location
United States , Charlotte, NC / New York, NY/ Dallas, TX / Jersey City, NJ
Salary
Salary:
160000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 9 years experience
  • Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
  • Proficiency in PySpark for distributed data processing
  • Advanced programming skills in Python
  • Hands-on experience with Apache Kafka for real-time streaming
  • Frontend development using Angular (TypeScript, HTML, CSS)
  • Expertise in designing scalable, secure, and high-performance systems
  • Familiarity with microservices, API design, and cloud-native architectures
  • Knowledge of CI/CD pipelines, containerization (Docker/Kubernetes)
  • Exposure to cloud platforms (AWS, Azure, GCP)
Job Responsibility
Job Responsibility
  • Define end-to-end architecture for data platforms, streaming systems, and web applications
  • Ensure alignment with enterprise standards, security, and compliance requirements
  • Evaluate emerging technologies and recommend adoption strategies
  • Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
  • Optimize ETL workflows for large-scale datasets and real-time streaming
  • Integrate Apache Kafka for event-driven architectures and messaging
  • Build and maintain backend services using Python and microservices architecture
  • Develop responsive, dynamic front-end applications using Angular
  • Implement RESTful APIs and ensure seamless integration between components
  • Work closely with product owners, business analysts, and DevOps teams
  • Fulltime
Read More
Arrow Right

Hadoop PySpark, Python, Apache Kafka

Location
Location
United States , Charlotte; New York; Dallas; Jersey City
Salary
Salary:
160000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Primary Skill: Hadoop ecosystem (HDFS, Hive, Spark), PySpark, Python, Apache Kafka
  • Secondary: UI – Angular
  • Experience: Minimum 9 years
  • Technical Expertise: Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
  • Proficiency in PySpark for distributed data processing
  • Advanced programming skills in Python
  • Hands-on experience with Apache Kafka for real-time streaming
  • Frontend development using Angular (TypeScript, HTML, CSS)
  • Architectural Skills: Expertise in designing scalable, secure, and high-performance systems
  • Familiarity with microservices, API design, and cloud-native architectures
Job Responsibility
Job Responsibility
  • Architectural Leadership: Define end-to-end architecture for data platforms, streaming systems, and web applications
  • Ensure alignment with enterprise standards, security, and compliance requirements
  • Evaluate emerging technologies and recommend adoption strategies
  • Data Engineering: Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
  • Optimize ETL workflows for large-scale datasets and real-time streaming
  • Integrate Apache Kafka for event-driven architectures and messaging
  • Application Development: Build and maintain backend services using Python and microservices architecture
  • Develop responsive, dynamic front-end applications using Angular
  • Implement RESTful APIs and ensure seamless integration between components
  • Collaboration & Leadership: Work closely with product owners, business analysts, and DevOps teams
  • Fulltime
Read More
Arrow Right

Hadoop PySpark, Python, Apache Kafka

Architectural Leadership, Data Engineering, Application Development, Collaborati...
Location
Location
United States , Charlotte, NC / New York, NY/ Dallas, TX / Jersey City, NJ
Salary
Salary:
160000.00 USD / Year
realign-llc.com Logo
Realign
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum 9 years experience
  • Strong experience with Hadoop ecosystem (HDFS, Hive, Spark)
  • Proficiency in PySpark for distributed data processing
  • Advanced programming skills in Python
  • Hands-on experience with Apache Kafka for real-time streaming
  • Frontend development using Angular (TypeScript, HTML, CSS)
  • Expertise in designing scalable, secure, and high-performance systems
  • Familiarity with microservices, API design, and cloud-native architectures
  • Knowledge of CI/CD pipelines, containerization (Docker/Kubernetes)
  • Exposure to cloud platforms (AWS, Azure, GCP)
Job Responsibility
Job Responsibility
  • Define end-to-end architecture for data platforms, streaming systems, and web applications
  • Ensure alignment with enterprise standards, security, and compliance requirements
  • Evaluate emerging technologies and recommend adoption strategies
  • Design and implement data ingestion, transformation, and processing pipelines using Hadoop, PySpark, and related tools
  • Optimize ETL workflows for large-scale datasets and real-time streaming
  • Integrate Apache Kafka for event-driven architectures and messaging
  • Build and maintain backend services using Python and microservices architecture
  • Develop responsive, dynamic front-end applications using Angular
  • Implement RESTful APIs and ensure seamless integration between components
  • Work closely with product owners, business analysts, and DevOps teams
  • Fulltime
Read More
Arrow Right

Senior Data Analytics Engineer

We are looking for a Senior Data Analytics Engineer to help shape and scale mode...
Location
Location
United States , Reston
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of experience in data engineering, analytics engineering, or a comparable data-centric role
  • Demonstrated expertise in dimensional modeling, including the design of fact tables, dimension tables, and slowly changing dimensions
  • Strong hands-on capability with Python, PySpark, SparkSQL, and notebook-based data development
  • Experience delivering end-to-end data solutions built on medallion architecture principles
  • Working knowledge of modern big data technologies such as Apache Spark, Hadoop, Kafka, and ETL frameworks
  • Familiarity with machine learning and AI concepts within contemporary analytics environments
  • Experience using Git within team-based software or data development workflows
  • Strong communication skills with the ability to align technical delivery with business objectives
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable data pipelines that ingest, transform, and deliver trusted datasets for analytics and operational use
  • Develop curated data layers using medallion-style architecture to improve data quality, accessibility, and consistency across the platform
  • Create and optimize dimensional models, including fact and dimension structures, to support reporting, trend analysis, and business intelligence needs
  • Use Python, PySpark, SparkSQL, and notebook-based development environments to engineer efficient data processing workflows
  • Partner with business and technical stakeholders to gather requirements and translate them into practical data products and analytics solutions
  • Apply data governance, security, and enterprise data management standards to protect information and support compliant data usage
  • Contribute to collaborative development practices through version control, code review, and shared engineering standards using Git
  • Support advanced analytics initiatives by preparing data foundations that can be used for machine learning and AI-driven use cases
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • 401(k) plan
Read More
Arrow Right

Data Engineer

We are looking for an experienced Data Engineer to join a team delivering modern...
Location
Location
United States , Poughkeepsie
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Hands-on experience building data engineering solutions with Databricks and Apache Spark
  • Strong programming ability in Python, including development of ETL and data transformation workflows
  • Knowledge of lakehouse and big data technologies such as Delta Lake, Apache Hadoop, and Apache Kafka
  • Experience working with Azure Data Lake Storage Gen2 or comparable cloud-based data storage platforms
  • Ability to optimize distributed data processing jobs and troubleshoot performance issues in Spark environments
  • Familiarity with data governance, data quality, and security practices for enterprise data platforms
  • Comfortable working independently and collaborating with cross-functional teams in an agile delivery model
  • Proven ability to analyze technical problems, break them into manageable components, and implement effective solutions.
Job Responsibility
Job Responsibility
  • Create and support scalable data pipelines in Databricks using Spark technologies such as PySpark or Scala to process and deliver high-quality data
  • Develop lakehouse architectures on Azure Data Lake Storage Gen2 and ensure strong integration with Databricks for efficient data management
  • Establish and monitor data quality controls and governance practices within the platform using validation methods and Delta Lake capabilities
  • Investigate pipeline and application inefficiencies, then implement tuning strategies to improve Spark and Databricks performance
  • Work closely with analysts and other stakeholders to translate business data needs into refined, analytics-ready datasets
  • Automate ingestion, transformation, testing, and release processes, including integration with CI/CD workflows where appropriate
  • Provide guidance to less experienced engineers by sharing best practices for Databricks development, optimization, and support
  • Maintain clear technical documentation for notebooks, workflows, data models, configurations, and operational procedures
  • Protect data assets by applying security controls and compliance standards across the Databricks environment
  • Contribute to design sessions, solve complex data issues, and uphold change management and data integrity standards while delivering large assignments on schedule
What we offer
What we offer
  • Medical, vision, dental, and life and disability insurance
  • enrollment in company 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Data Analytics Engineer

We are looking for a Senior Data Analytics Engineer to help shape and scale mode...
Location
Location
United States , Reston
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 5 years of experience in data engineering, analytics engineering, or a comparable data-centric role
  • Demonstrated expertise in dimensional modeling, including the design of fact tables, dimension tables, and slowly changing dimensions
  • Strong hands-on capability with Python, PySpark, SparkSQL, and notebook-based data development
  • Experience delivering end-to-end data solutions built on medallion architecture principles
  • Working knowledge of modern big data technologies such as Apache Spark, Hadoop, Kafka, and ETL frameworks
  • Familiarity with machine learning and AI concepts within contemporary analytics environments
  • Experience using Git within team-based software or data development workflows
  • Strong communication skills with the ability to align technical delivery with business objectives
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable data pipelines that ingest, transform, and deliver trusted datasets for analytics and operational use
  • Develop curated data layers using medallion-style architecture to improve data quality, accessibility, and consistency across the platform
  • Create and optimize dimensional models, including fact and dimension structures, to support reporting, trend analysis, and business intelligence needs
  • Use Python, PySpark, SparkSQL, and notebook-based development environments to engineer efficient data processing workflows
  • Partner with business and technical stakeholders to gather requirements and translate them into practical data products and analytics solutions
  • Apply data governance, security, and enterprise data management standards to protect information and support compliant data usage
  • Contribute to collaborative development practices through version control, code review, and shared engineering standards using Git
  • Support advanced analytics initiatives by preparing data foundations that can be used for machine learning and AI-driven use cases
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Pyspark Data Engineer

The Pyspark Data Engineer is responsible for participation in the establishment ...
Location
Location
Canada , Mississauga
Salary
Salary:
79320.00 - 110680.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2-5 years of relevant experience in the Financial Service industry
  • Intermediate level experience in Applications Development role
  • Consistently demonstrates clear and concise written and verbal communication
  • Demonstrated problem-solving and decision-making skills
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Big Data Infrastructure: Develop and manage large-scale data processing systems using frameworks like Apache Spark, Hadoop, and Kafka
  • Proficiency in Python programming
  • Expertise in data processing frameworks such as Apache Spark, Hadoop
  • Expertise in Data Lakehouse technologies (Apache Iceberg, Trino, Deltalake)
  • Expertise in SQL and database technologies (e.g., Oracle, PostgreSQL, etc.)
Job Responsibility
Job Responsibility
  • Utilize knowledge of applications development procedures and concepts, and basic knowledge of other technical areas to identify and define necessary system enhancements, including using script tools and analyzing/interpreting code
  • Consult with users, clients, and other technology groups on issues, and recommend programming solutions, install, and support customer exposure systems
  • Apply fundamental knowledge of programming languages for design specifications
  • Analyze applications to identify vulnerabilities and security issues, as well as conduct testing and debugging
  • Serve as advisor or coach to new or lower level analysts
  • Identify problems, analyze information, and make evaluative judgements to recommend and implement solutions
  • Resolve issues by identifying and selecting solutions through the applications of acquired technical experience and guided by precedents
  • Has the ability to operate with a limited level of direct supervision
  • Can exercise independence of judgement and autonomy
  • Acts as SME to senior stakeholders and /or other team members
  • Fulltime
Read More
Arrow Right

Lead Python Full Stack Data Engineer

We are assembling an A-team of highly skilled, autonomous, and visionary enginee...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of progressive, hands-on experience as a Senior/Lead Data Engineer
  • Expert-level proficiency in Python
  • Deep expertise in developing highly optimized, scalable, and production-grade PySpark applications
  • Deep architectural understanding and extensive hands-on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming, Spark MLlib)
  • Advanced proficiency with Hive for enterprise data warehousing
  • Expert knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem
  • Master-level proficiency in SQL, complex query optimization, and advanced data warehousing concepts
  • Extensive experience with various data storage formats (e.g., Parquet, ORC, Avro) and leading data lake solutions (e.g., Delta Lake, Iceberg)
  • Proven experience with enterprise-grade NoSQL databases (e.g., Cassandra, MongoDB, HBase)
  • Expert-level experience with Apache Kafka
Job Responsibility
Job Responsibility
  • Lead and Architect end-to-end data solutions
  • Drive Strategic Initiatives within small, co-located squads
  • Act as a Player/Coach
  • Design, Develop, and Optimize highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques
  • Architect and Implement sophisticated data storage solutions leveraging a diverse set of big data technologies
  • Champion Data Modeling and Governance
  • Strategically Engage with data consumers, data scientists, and business stakeholders
  • Lead the Implementation of real-time data streaming and complex event-driven architectures
  • Enforce and Evolve Best Practices in data engineering and software development
  • Exhibit High Autonomy and Agency
  • Fulltime
Read More
Arrow Right