CrawlJobs Logo

Data Engineer - Streaming

nttdata.com Logo

NTT DATA

Location Icon

Location:
India , Bangalore

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The Data Engineer - Streaming role involves designing and implementing PySpark Structured Streaming pipelines for data ingestion into Apache Iceberg tables. Candidates should have a strong background in Apache Kafka and PySpark, with at least 4 years of experience. Responsibilities include ensuring compliance with technical constraints and writing comprehensive tests for the streaming application. This position offers an opportunity to work in a dynamic environment with innovative data solutions.

Job Responsibility:

  • Design and implement a PySpark Structured Streaming application that reads from Confluent Kafka topics, parses JSON and Avro payloads, applies schema mappings, and writes atomically to Iceberg tables using the Iceberg Spark runtime and foreachBatch micro-batch pattern
  • Ensure all functionality relies exclusively on public Apache-supported APIs — Apache Spark, Apache Kafka, and Apache Iceberg — with no unsupported Confluent connectors or proprietary sinks
  • Configure Kafka source parameters: bootstrap servers, consumer group IDs, offset management (startingOffsets, failOnDataLoss), checkpoint paths, and trigger intervals
  • Implement PII detection and Protegrity tokenization hooks within the ingestion pipeline before data lands in the Iceberg Bronze layer
  • Write comprehensive unit and integration tests: row count validation, schema conformance checks, Kafka offset commit verification, and data comparison against the source topic
  • Support PNC UAT — walk PNC engineers through the code, demonstrate no unsupported connectors are used, and address review findings
  • Own the two streaming ingestion workstreams of the PNC Bank Hadoop-to-Iceberg POC
  • Design and deliver production-grade PySpark Structured Streaming pipelines that ingest data into Apache Iceberg tables — operating under specific technical constraints
  • Work closely with GitHub CoPilot to scaffold, iterate, test, and document the streaming application code — acting as the technical reviewer and subject matter expert who ensures AI-generated pipelines are production-ready, PNC-compliant, and correctly integrated with the Iceberg catalog and Protegrity tokenization layer

Requirements:

  • Apache Kafka – Producer & Consumer
  • 4+ years of hands-on experience with Apache Kafka, including both producer and consumer development in PySpark, Java, or Scala
  • Deep understanding of Kafka internals: topics, partitions, consumer groups, offsets, rebalancing, and exactly-once delivery semantics
  • Experience with Confluent Kafka: schema registry, Avro/JSON serialisation, and Confluent Cloud or on-prem cluster configuration
  • Proven ability to build ingestion pipelines without relying on unsupported or third-party sink connectors — using only native Kafka consumer APIs and Spark integration
  • Familiarity with Kafka Connect architecture to evaluate trade-offs and articulate why application-level ingestion is preferred in constrained environments
  • PySpark Structured Streaming
  • Strong practical experience with PySpark Structured Streaming: Kafka source, file source, foreachBatch, output modes (append/update/complete), and checkpoint management
  • Experience tuning streaming micro-batch trigger intervals, watermarking, and late data handling for production workloads
  • Hands-on experience writing streaming data directly to Apache Iceberg tables using the Iceberg Spark runtime
  • Ability to implement robust error handling: dead-letter queues, parse error isolation, and recovery from checkpoint failures
  • Data Engineering & Iceberg
  • Working knowledge of Apache Iceberg: catalog configuration, schema definition, append writes, and partition strategy for event and log data
  • Familiarity with S3-compatible object storage as an Iceberg warehouse destination
  • Understanding of medallion architecture — ability to correctly land streaming data in the Bronze layer with appropriate schema governance

Additional Information:

Job Posted:
April 23, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Data Engineer - Streaming

Principal Data Engineer

PointClickCare is searching for a Principal Data Engineer who will contribute to...
Location
Location
United States
Salary
Salary:
183200.00 - 203500.00 USD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Principal Data Engineer with at least 10 years of professional experience in software or data engineering, including a minimum of 4 years focused on streaming and real-time data systems
  • Proven experience driving technical direction and mentoring engineers while delivering complex, high-scale solutions as a hands-on contributor
  • Deep expertise in streaming and real-time data technologies, including frameworks such as Apache Kafka, Flink, and Spark Streaming
  • Strong understanding of event-driven architectures and distributed systems, with hands-on experience implementing resilient, low-latency pipelines
  • Practical experience with cloud platforms (AWS, Azure, or GCP) and containerized deployments for data workloads
  • Fluency in data quality practices and CI/CD integration, including schema management, automated testing, and validation frameworks (e.g., dbt, Great Expectations)
  • Operational excellence in observability, with experience implementing metrics, logging, tracing, and alerting for data pipelines using modern tools
  • Solid foundation in data governance and performance optimization, ensuring reliability and scalability across batch and streaming environments
  • Experience with Lakehouse architectures and related technologies, including Databricks, Azure ADLS Gen2, and Apache Hudi
  • Strong collaboration and communication skills, with the ability to influence stakeholders and evangelize modern data practices within your team and across the organization
Job Responsibility
Job Responsibility
  • Lead and guide the design and implementation of scalable streaming data pipelines
  • Engineer and optimize real-time data solutions using frameworks like Apache Kafka, Flink, Spark Streaming
  • Collaborate cross-functionally with product, analytics, and AI teams to ensure data is a strategic asset
  • Advance ongoing modernization efforts, deepening adoption of event-driven architectures and cloud-native technologies
  • Drive adoption of best practices in data governance, observability, and performance tuning for streaming workloads
  • Embed data quality in processing pipelines by defining schema contracts, implementing transformation tests and data assertions, enforcing backward-compatible schema evolution, and automating checks for freshness, completeness, and accuracy across batch and streaming paths before production deployment
  • Establish robust observability for data pipelines by implementing metrics, logging, and distributed tracing for streaming jobs, defining SLAs and SLOs for latency and throughput, and integrating alerting and dashboards to enable proactive monitoring and rapid incident response
  • Foster a culture of quality through peer reviews, providing constructive feedback and seeking input on your own work
What we offer
What we offer
  • Benefits starting from Day 1!
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more!
  • Fulltime
Read More
Arrow Right

Senior Back End Engineer for Streaming Data Platform

Do you want to build a high-quality data platform that will innovate financial m...
Location
Location
Salary
Salary:
Not provided
korfinancial.com Logo
KOR Financial
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 8+ years of experience as a Back End Engineer
  • Experience with Java and Spring Boot Framework
  • Experience with building and running applications on public cloud vendors like AWS
  • Working experience Kafka, DataBricks and Streaming data solutions
  • Experience profiling, debugging, and performance tuning complex distributed systems
  • A firm reliance on unit testing and mocking frameworks with a TDD (Test Driven Development) mindset
  • Knowledge of OOP principles and modern development practices
Job Responsibility
Job Responsibility
  • Designing and implementing the streaming data platform engine and SDK
  • Implementing new features for our range of web and streaming applications and data reporting capabilities
  • Be an active voice in the platform's build-out in regards to the technical choices and implementations
  • Working closely with the broader team to embrace new challenges and adapt requirements as we continue to grow and adjust priorities
  • Paired programming with a growing team of Back-end, Data, and Front-end Engineers
What we offer
What we offer
  • Culture of trust, empowerment, and constructive feedback
  • Competitive salary, great IT equipment, and expense allowance
  • Flexible working times
  • A span of control that matches your ambitions and skills
  • Commitment to a genuine, balanced relationship
Read More
Arrow Right

Software Engineer - Data Engineering

Akuna Capital is a leading proprietary trading firm specializing in options mark...
Location
Location
United States , Chicago
Salary
Salary:
130000.00 USD / Year
akunacapital.com Logo
AKUNA CAPITAL
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS/PhD in Computer Science, Engineering, Physics, Math, or equivalent technical field
  • 5+ years of professional experience developing software applications
  • Java/Scala experience required
  • Highly motivated and willing to take ownership of high-impact projects upon arrival
  • Prior hands-on experience with data platforms and technologies such as Delta Lake, Spark, Kubernetes, Kafka, ClickHouse, and/or Presto/Trino
  • Experience building large-scale batch and streaming pipelines with strict SLA and data quality requirements
  • Must possess excellent communication, analytical, and problem-solving skills
  • Recent hands-on experience with AWS Cloud development, deployment and monitoring necessary
  • Demonstrated experience working on an Agile team employing software engineering best practices, such as GitOps and CI/CD, to deliver complex software projects
  • The ability to react quickly and accurately to rapidly changing market conditions, including the ability to quickly and accurately respond and/or solve math and coding problems are essential functions of the role
Job Responsibility
Job Responsibility
  • Work within a growing Data Engineering division supporting the strategic role of data at Akuna
  • Drive the ongoing design and expansion of our data platform across a wide variety of data sources, supporting an array of streaming, operational and research workflows
  • Work closely with Trading, Quant, Technology & Business Operations teams throughout the firm to identify how data is produced and consumed, helping to define and deliver high impact projects
  • Build and deploy batch and streaming pipelines to collect and transform our rapidly growing Big Data set within our hybrid cloud architecture utilizing Kubernetes/EKS, Kafka/MSK and Databricks/Spark
  • Mentor junior engineers in software and data engineering best practices
  • Produce clean, well-tested, and documented code with a clear design to support mission critical applications
  • Build automated data validation test suites that ensure that data is processed and published in accordance with well-defined Service Level Agreements (SLA’s) pertaining to data quality, data availability and data correctness
  • Challenge the status quo and help push our organization forward, as we grow beyond the limits of our current tech stack
What we offer
What we offer
  • Discretionary performance bonus
  • Comprehensive benefits package that may encompass employer-paid medical, dental, vision, retirement contributions, paid time off, and other benefits
  • Fulltime
Read More
Arrow Right

Data Engineer, Enterprise Data, Analytics and Innovation

Are you passionate about building robust data infrastructure and enabling innova...
Location
Location
United States
Salary
Salary:
110000.00 - 125000.00 USD / Year
vaniamgroup.com Logo
Vaniam Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in data engineering, ETL, or related roles
  • Strong proficiency in Python and SQL for data engineering
  • Hands-on experience building and maintaining pipelines in a lakehouse or modern data platform
  • Practical understanding of Medallion architectures and layered data design
  • Familiarity with modern data stack tools, including: Spark or PySpark
  • Workflow orchestration (Airflow, dbt, or similar)
  • Testing and observability frameworks
  • Containers (Docker) and Git-based version control
  • Excellent communication skills, problem-solving mindset, and a collaborative approach
Job Responsibility
Job Responsibility
  • Design, build, and operate reliable ETL and ELT pipelines in Python and SQL
  • Manage ingestion into Bronze, standardization and quality in Silver, and curated serving in Gold layers of our Medallion architecture
  • Maintain ingestion from transactional MySQL systems into Vaniam Core to keep production data flows seamless
  • Implement observability, data quality checks, and lineage tracking to ensure trust in all downstream datasets
  • Develop schemas, tables, and views optimized for analytics, APIs, and product use cases
  • Apply and enforce best practices for security, privacy, compliance, and access control, ensuring data integrity across sensitive healthcare domains
  • Maintain clear and consistent documentation for datasets, pipelines, and operating procedures
  • Lead the integration of third-party datasets, client-provided sources, and new product-generated data into Vaniam Core
  • Partner with product and innovation teams to build repeatable processes for onboarding new data streams
  • Ensure harmonization, normalization, and governance across varied data types (scientific, engagement, operational)
What we offer
What we offer
  • 100% remote environment with opportunities for local meet-ups
  • Positive, diverse, and supportive culture
  • Passionate about serving clients focused on Cancer and Blood diseases
  • Investment in you with opportunities for professional growth and personal development through Vaniam Group University
  • Health benefits – medical, dental, vision
  • Generous parental leave benefit
  • Focused on your financial future with a 401(k) Plan and company match
  • Work-Life Balance and Flexibility
  • Flexible Time Off policy for rest and relaxation
  • Volunteer Time Off for community involvement
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

Atlassian is looking for a Principal Data Engineer to join our Data Engineering ...
Location
Location
United States , San Francisco
Salary
Salary:
168700.00 - 271100.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • You have 12+ years of experience in a Data Engineer role as an individual contributor
  • You have at least 2 years of experience as a tech lead for a Data Engineering team
  • You are an engineer with a track record of driving and delivering large (multi-person or multi-team) and complex efforts
  • You are a great communicator and maintain many of the essential cross-team and cross-functional relationships necessary for the team's success
  • Experience with building streaming pipelines with a micro-services architecture for low-latency analytics
  • Experience working with varied forms of data infrastructure, including relational databases (e.g. SQL), Spark, and column stores (e.g. Redshift)
  • Experience building scalable data pipelines using Spark using Airflow scheduler/executor framework or similar scheduling tools
  • Experience working in a technical environment with the latest technologies like AWS data services (Redshift, Athena, EMR) or similar Apache projects (Spark, Flink, Hive, or Kafka)
  • Understanding of Data Engineering tools/frameworks and standards to improve the productivity and quality of output for Data Engineers across the team
  • Industry experience working with large-scale, high-performance data processing systems (batch and streaming) with a 'Streaming First' mindset to drive Atlassian's business growth and improve the product experience
Job Responsibility
Job Responsibility
  • Own the technical evolution of the data engineering capabilities and be responsible for ensuring solutions are being delivered incrementally, meeting outcomes, and promptly escalating risks and issues
  • Establish a deep understanding of how things work in data engineering, use this to direct and coordinate the technical aspects of work across data engineering, and systematically improve productivity across the teams
  • Maintain a high bar for operational data quality and proactively address performance, scale, complexity and security considerations
  • Drive complex decisions that can impact the work in data engineering. Set the technical direction and balance customer and business needs with long-term maintainability & scale
  • Understand and define the problem space, and architect solutions. Coordinate a team of engineers towards implementing them, unblocking them along the way if necessary
  • Lead a team of data engineers through mentoring and coaching, work closely with the engineering manager, and provide consistent feedback to help them manage and grow the team
  • Work with close counterparts in other departments as part of a multi-functional team, and build this culture in your team
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

At Ingka Investments (Part of Ingka Group – the largest owner and operator of IK...
Location
Location
Netherlands , Leiden
Salary
Salary:
Not provided
https://www.ikea.com Logo
IKEA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Formal qualifications (BSc, MSc, PhD) in computer science, software engineering, informatics or equivalent
  • Minimum 3 years of professional experience as a (Junior) Data Engineer
  • Strong knowledge in designing efficient, robust and automated data pipelines, ETL workflows, data warehousing and Big Data processing
  • Hands-on experience with Azure data services like Azure Databricks, Unity Catalog, Azure Data Lake Storage, Azure Data Factory, DBT and Power BI
  • Hands-on experience with data modeling for BI & ML for performance and efficiency
  • The ability to apply such methods to solve business problems using one or more Azure Data and Analytics services in combination with building data pipelines, data streams, and system integration
  • Experience in driving new data engineering developments (e.g. apply new cutting edge data engineering methods to improve performance of data integration, use new tools to improve data quality and etc.)
  • Knowledge of DevOps practices and tools including CI/CD pipelines and version control systems (e.g., Git)
  • Proficiency in programming languages such as Python, SQL, PySpark and others relevant to data engineering
  • Hands-on experience to deploy code artifacts into production
Job Responsibility
Job Responsibility
  • Contribute to the development of D&A platform and analytical tools, ensuring easy and standardized access and sharing of data
  • Subject matter expert for Azure Databrick, Azure Data factory and ADLS
  • Help design, build and maintain data pipelines (accelerators)
  • Document and make the relevant know-how & standard available
  • Ensure pipelines and consistency with relevant digital frameworks, principles, guidelines and standards
  • Support in understand needs of Data Product Teams and other stakeholders
  • Explore ways create better visibility on data quality and Data assets on the D&A platform
  • Identify opportunities for data assets and D&A platform toolchain
  • Work closely together with partners, peers and other relevant roles like data engineers, analysts or architects across IKEA as well as in your team
What we offer
What we offer
  • Opportunity to develop on a cutting-edge Data & Analytics platform
  • Opportunities to have a global impact on your work
  • A team of great colleagues to learn together with
  • An environment focused on driving business and personal growth together, with focus on continuous learning
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

As a Senior Software Engineer, you will play a key role in designing and buildin...
Location
Location
United States
Salary
Salary:
156000.00 - 195000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience in platform engineering, data engineering or in a data facing role
  • Experience in building data applications
  • Deep knowledge of data eco system with an ability to collaborate cross-functionally
  • Bachelor's degree in a quantitative field (Physical / Computer Science, Engineering or Mathematics / Statistics)
  • Excellent communication skills
  • Self-motivated and self-directed
  • Inquisitive, able to ask questions and dig deeper
  • Organized, diligent, and great attention to detail
  • Acts with the utmost integrity
  • Genuinely curious and open
Job Responsibility
Job Responsibility
  • Architect and build robust, scalable data pipelines (batch and streaming) to support a variety of internal and external use cases
  • Develop and maintain high-performance APIs using FastAPI to expose data services and automate data workflows
  • Design and manage cloud-based data infrastructure, optimizing for cost, performance, and reliability
  • Collaborate closely with software engineers, data scientists, analysts, and product teams to translate requirements into engineering solutions
  • Monitor and ensure the health, quality, and reliability of data flows and platform services
  • Implement observability and alerting for data services and APIs (think logs, metrics, dashboards)
  • Continuously evaluate and integrate new tools and technologies to improve platform capabilities
  • Contribute to architectural discussions, code reviews, and cross-functional projects
  • Document your work, champion best practices, and help level up the team through knowledge sharing
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

We are on the lookout for a Principal Data Engineer to help define and lead the ...
Location
Location
United Kingdom
Salary
Salary:
Not provided
dotdigital.com Logo
Dotdigital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience delivering python-based projects in the data engineering space
  • Extensive experience working with SQL and NoSQL database technologies (e.g. SQL Server, MongoDB & Cassandra)
  • Proven experience with modern data warehousing and large-scale data processing tools (e.g. Snowflake, DBT, BiqQuery, Clickhouse)
  • Hands on experience with data orchestration tools like Airflow, Dagster or Prefect
  • Experience using cloud environments (e.g. Azure, AWS, GCP) to process, store and surface large scale data
  • Experience using Kafka or similar event-based architectures e.g. (Pub/Sub via AWS SQS, Azure EventHubs, AWS Kinesis)
  • Strong grasp of data architecture and data modelling principles for both OLAP and OLTP workloads
  • Capable in the wider software development lifecycle in terms of agile ways of working and continuous integration/deployment of data solutions
  • Experience as a lead or Principal Engineer on large-scale data initiative or product builds
  • Demonstrated ability to architect data systems and data structures for high volume, high throughput systems
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable, secure and resilient data systems across streaming, batch and real-time use cases
  • Architect data pipelines, model and storage solutions that power analytical and product use cases
  • using primarily Python and SQL via orchestration tooling that run workloads in the cloud
  • Leverage AI to automate both data processing and engineering processes
  • Assure and drive best practices relating to data infrastructure, governance, security and observability
  • Work with technologists across multiple teams to deliver coherent features and data outcomes
  • Support the data team to help adopt data engineering principles
  • Identify, validate and promote new tools and technologies that improve the performance and stability of data services
What we offer
What we offer
  • Parental leave
  • Medical benefits
  • Paid sick leave
  • Dotdigital day
  • Share reward
  • Wellbeing reward
  • Wellbeing Days
  • Loyalty reward
  • Fulltime
Read More
Arrow Right