CrawlJobs Logo

Data Engineer – Java & Spark

India, Bangalore South · Job Posted March 05, 2026
Apply Position
Job Link Share

Job Description

We are looking for a skilled Data Engineer with strong expertise in Java and Apache Spark, specializing in data ingestion and large-scale data processing. The ideal candidate will design and build scalable, high-performance data pipelines and contribute to modern analytics platforms in a fast-paced Agile environment. This role requires hands-on experience in building ingestion frameworks, optimizing Spark workloads, and working with cloud-based data ecosystems.

Job Responsibility

  • Design, develop, and maintain scalable data ingestion pipelines using Java and Apache Spark
  • Build and optimize Spark jobs (Spark Core, Spark SQL, DataFrames, Streaming) for large-scale batch and real-time processing
  • Develop reusable ingestion frameworks for structured and semi-structured data from multiple sources (APIs, databases, files, streaming systems)
  • Implement high-performance ETL/ELT solutions with strong focus on data quality, reliability, and scalability
  • Collaborate with data architects, analysts, and cross-functional teams to design robust data workflows
  • Optimize Spark performance (partitioning, caching, tuning, memory management) for production environments
  • Contribute to CI/CD pipelines, code reviews, and best practices in data engineering
  • Troubleshoot data pipeline failures and implement monitoring and alerting mechanisms
  • Document technical designs and mentor junior engineers

Requirements

  • 4–7 years of strong hands-on experience in Data Engineering and Java development
  • Strong expertise in Apache Spark (Spark Core, Spark SQL, DataFrames, Structured Streaming)
  • Solid experience in data ingestion, ETL/ELT, and building data pipelines
  • Working knowledge on Java
  • Experience handling large-scale data processing and distributed systems
  • Familiarity with Maven/Gradle, Git, and CI/CD practices
  • Strong SQL skills and understanding of data modeling concepts
  • Excellent problem-solving and communication skills
  • Must be open to working from Bangalore location

Nice to have

  • Experience with Databricks (AWS preferred) for Spark-based data engineering
  • Hands-on experience with Snowflake for cloud data warehousing
  • Working knowledge of DBT (Data Build Tool) for analytics engineering and transformations
  • Exposure to Azure cloud services (Databricks)
  • Experience with Kafka, Airflow, or orchestration tools
  • Familiarity with Docker/Kubernetes
  • Basic Python scripting for automation and data manipulation

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Engineer – Java & Spark

8 matching positions

Lead Data Engineer Spark and SQL – Vice President

The Lead Data Engineer Spark and SQL – Vice President is responsible for establi...
Location
Location
Canada , Mississauga
Salary
Salary:
120800.00 - 170800.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years of relevant experience in Apps Development or systems analysis role (JAVA)
  • Experience with Spark and Scala
  • Experience with Ab Initio
  • Experience with ETL and SQL
  • Extensive experience system analysis and in programming of software applications
  • Experience in managing and implementing successful projects
  • Subject Matter Expert (SME) in at least one area of Applications Development
  • Ability to adjust priorities quickly as circumstances dictate
  • Demonstrated leadership and project management skills
  • Consistently demonstrates clear and concise written and verbal communication
Job Responsibility
Job Responsibility
  • Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
  • Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
  • Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
  • Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
  • Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
  • Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
  • Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
  • Fulltime
Read More
Arrow Right

Principal Java Data Engineer

Location
Location
United States
Salary
Salary:
Not provided
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ total years of professional experience in software/data engineering
  • 4+ years focused on building/operating data pipelines (batch + streaming)
  • Experience with Apache Kafka (producer/consumer, schema registry, partitioning, exactly-once semantics)
  • Experience with Apache Flink (stateful stream processing, checkpoints, event-time windows)
  • Experience with Spark Streaming/Structured Streaming (micro-batch, watermarking)
  • Experience with lakehouse architectures and query/storage layers (Apache Hudi, Azure ADLS Gen2, Trino/Presto, Databricks/Spark, HDFS, Other big data tech)
  • Experience in the design and implementation of scalable distributed systems based on Java microservices
  • Legally authorized to work in the US for our company
  • Do not require sponsorship for employment visa status (e.g., H-1B visa status, etc.) to work legally for our Company in the United States
  • Fulltime
Read More
Arrow Right

Principal Java Data Engineer

Location
Location
Canada , Mississauga
Salary
Salary:
Not provided
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ total years of professional experience in software/data engineering
  • 4+ years focused on data pipelines
  • Experience with Apache Kafka (producer/consumer, schema registry, partitioning, exactly-once semantics)
  • Experience with Apache Flink (stateful stream processing, checkpoints, event-time windows)
  • Experience with Spark Streaming/Structured Streaming (micro-batch, watermarking)
  • Experience with lakehouse architectures and query/storage layers (Apache Hudi, Azure ADLS Gen2, Trino/Presto, Databricks/Spark, HDFS, other big data tech)
  • Experience in design and implementation of scalable distributed systems based on Java microservices
  • Legally authorized to work in Canada
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Spark (Java)

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
Hungary , Budapest; Szeged
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years professional software development
  • Experience leading and delivering complex product enhancements
  • Strong understanding of at least one of the following languages: Java, Scala, Python
  • Experience with systems design, development
  • Passionate about programming, clean coding habits, attention to detail, and focus on quality
  • Strong oral and written communication skills
  • Strong ability to research and solve problems independently without constant supervision
  • Open-minded, desire to learn new things and build great products
  • Experience with distributed systems
Job Responsibility
Job Responsibility
  • Design new features for Cloudera’s data engineering experience, and take them from prototypes to leading a team to deliver the feature in production at scale
  • Contribute to Apache Spark, Livy
  • Develop new features in Scala/Java/Python on a modern platforms
  • Gain expertise in distributed data processing, from SQL planners and optimizers, to data layout and table formats like Apache Parquet and Iceberg, to fault tolerance in distributed systems
  • Gain a solid understanding and deep technical knowledge of components across the Cloudera Data Engineering Experience stack, but focusing on Iceberg and Spark
  • Get to work on large scale distributed systems, from 100s to 1000s of nodes, in production clusters
  • Debug system level deployment issues, root cause analysis, perform system test analysis and resolve failures
  • Work on improving internal infrastructure
  • Collaborate with other team members and stakeholders
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Spark (Java)

At Cloudera, we empower people to transform complex data into clear and actionab...
Location
Location
Hungary , Budapest; Szeged
Salary
Salary:
Not provided
cloudera.com Logo
Cloudera
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years professional software development
  • Experience leading and delivering complex product enhancements
  • Strong understanding of at least one of the following languages: Java, Scala, Python
  • Experience with systems design, development
  • Passionate about programming, clean coding habits, attention to detail, and focus on quality
  • Strong oral and written communication skills
  • Strong ability to research and solve problems independently without constant supervision
  • Open-minded, desire to learn new things and build great products
  • Experience with distributed systems
Job Responsibility
Job Responsibility
  • Design new features for Cloudera’s data engineering experience, and take them from prototypes to leading a team to deliver the feature in production at scale
  • Contribute to Apache Spark, Livy
  • Develop new features in Scala/Java/Python on a modern platforms
  • Gain expertise in distributed data processing, from SQL planners and optimizers, to data layout and table formats like Apache Parquet and Iceberg, to fault tolerance in distributed systems
  • Gain a solid understanding and deep technical knowledge of components across the Cloudera Data Engineering Experience stack, but focusing on Iceberg and Spark
  • Get to work on large scale distributed systems, from 100s to 1000s of nodes, in production clusters
  • Debug system level deployment issues, root cause analysis, perform system test analysis and resolve failures
  • Work on improving internal infrastructure
  • Collaborate with other team members and stakeholders
What we offer
What we offer
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
  • Fulltime
Read More
Arrow Right

Principal Java Data Engineer

Contribute to all phases of the software development life cycle; Play a crucial ...
Location
Location
United States
Salary
Salary:
183000.00 - 203000.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Software Data Engineer with at least 10 years of professional experience in software or data engineering
  • Minimum of 4 years focused on data pipelines (batch and streaming)
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Proven experience with Lakehouse architectures and related technologies, including Apache Hudi, Azure ADLS Gen2, HDFS, and other big data technologies (Trino, Databricks, Spark)
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and organization
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable distributed systems based on Java microservices
  • Engineer and optimize data pipelines using solutions like Apache Hudi, Apache Trino, Azure ADLS
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for data workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition
  • Fulltime
Read More
Arrow Right

Principal Java Data Engineer

PointClickCare is searching for a Principal Software Data Engineer who will cont...
Location
Location
Canada , Mississauga
Salary
Salary:
156000.00 - 174000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Software Data Engineer with at least 10 years of professional experience in software or data engineering, including a minimum of 4 years focused on data pipelines (batch and streaming)
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Proven experience with Lakehouse architectures and related technologies, including Apache Hudi, Azure ADLS Gen2, HDFS, and other big data technologies (Trino, Databricks, Spark)
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and organization.
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable distributed systems based on Java microservices
  • Engineer and optimize data pipelines using solutions like Apache Hudi, Apache Trino, Azure ADLS
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for data workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work.
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Lead Java Big Data Engineer Vice President

At Citi, we are at the forefront of financial technology, driven by a belief in ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of progressive experience in professional software engineering, with at least 3 years in a technical leadership or architect role
  • Proven track record of designing and building complex, high-performance, scalable server-side applications using Java
  • Deep, hands-on experience with the Big Data ecosystem, including mastery of Apache Spark, Hadoop (HDFS), and real-time data streaming with Kafka
  • Extensive experience with relational databases, data modeling, and data warehousing concepts
  • Demonstrated experience leading and mentoring technical teams and successfully delivering complex, large-scale data projects from concept to production
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field
Job Responsibility
Job Responsibility
  • Define the end-to-end architectural vision and technical roadmap for migrating from Sybase IQ to a modern Big Data platform, ensuring solutions are scalable, resilient, and secure
  • Lead the design, development, and deployment of robust, large-scale data processing pipelines using technologies like Apache Spark, Kafka, and distributed data stores
  • Develop and execute a comprehensive, phased strategy for migrating petabytes of historical and transactional data from Sybase IQ, ensuring data integrity, minimal downtime, and zero business disruption
  • Oversee the design and development of Java-based microservices that interact with the new data platform, ensuring seamless integration with the broader Oasys application ecosystem
  • Lead, inspire, and mentor a high-performing team of Java and Big Data engineers. Foster a culture of engineering excellence, innovation, and accountability
  • Partner with global business leaders, product owners, and other senior technology managers to define requirements, manage expectations, and deliver solutions that drive significant business value
  • Remain deeply technical and contribute to coding, design, and architectural decisions, leading by example
  • Fulltime
Read More
Arrow Right