This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are currently seeking a Offshore Pod Leads to join our team in TBD, Andaman and Nicobar Islands (IN-AN), India (IN). JOB DESCRIPTION Data Engineering Pod Lead Databricks Lakehouse Migration Program Two Roles: Informatica Pod Lead | AWS Glue Pod Lead Engagement Type Contract / Staff Augmentation or Full-Time Employee (FTE) — Open Seniority Level Lead / Architect — 12+ years of relevant experience Number of Openings 2 (one per pod) Team Size 4–6 Data Engineers per pod lead Cloud Platform AWS (Glue, Redshift, S3, Kinesis Streams, IAM, CloudWatch) Target Platform Databricks Lakehouse (Unity Catalog, Delta Lake, Workflows) Program Type Client-facing migration engagement — ETL modernization Program Context & Opportunity Our client is undertaking a large-scale data platform modernization initiative — migrating from a legacy ETL ecosystem (Informatica PowerCenter, AWS Glue, and Amazon Kinesis Streams) feeding Amazon Redshift into a unified Databricks Lakehouse architecture built on Delta Lake. This is a high-impact, high-visibility program requiring experienced technical leaders who can navigate complex legacy systems, architect modern solutions, and lead skilled engineering teams through the full migration lifecycle. We are hiring two dedicated Pod Leads — one for each legacy source domain — who will be jointly accountable for technical excellence, delivery velocity, and team development throughout the engagement.
Job Responsibility
Own the end-to-end technical design and implementation of the migration from the respective source platform to Databricks Lakehouse (Delta Lake, Unity Catalog, Databricks Workflows)
Conduct thorough assessments of existing ETL jobs — analyzing lineage, dependencies, transformation logic, scheduling, and data quality rules — prior to migration planning
Define migration patterns, reusable frameworks, and coding standards adopted across the pod
Architect scalable, cost-efficient pipelines using Databricks PySpark, Spark SQL, and Delta Live Tables (DLT) as appropriate
Make and document key architectural decisions (ADRs) with clear rationale and trade-off analysis
Drive adoption of software engineering best practices: version control (Git), CI/CD, unit testing, and code review within the pod
Directly lead a pod of 4–6 Data Engineers, providing technical mentorship, task assignment, code reviews, and unblocking day-to-day impediments
Manage sprint planning, backlog refinement, and progress tracking against migration milestones in close coordination with the Program Manager
Hold the team accountable for quality and velocity — proactively flag risks, scope changes, and dependencies before they become blockers
Conduct regular 1:1s and technical feedback sessions to support the professional growth of pod members
Foster a culture of ownership, collaboration, and continuous improvement within the pod
Serve as the primary technical point of contact for your pod's workstream with the client
Translate complex technical concepts and migration trade-offs into clear, concise communications for both technical and non-technical stakeholders
Participate in program-level status reviews, architecture governance meetings, and client steering committees as required
Manage expectations around scope, timelines, and quality, escalating issues appropriately
Ensure all migrated pipelines meet data quality, SLA, and observability requirements defined by the client
Champion data governance best practices including lineage tracking, catalog registration in Databricks Unity Catalog, and access control alignment
Produce and maintain clear technical documentation: architecture diagrams, runbooks, migration playbooks, and handover materials
Coordinate with QA/testing resources to validate migrated pipelines against source-system outputs
Analyze and decompose Informatica PowerCenter mappings, sessions, workflows, and worklets to understand full transformation logic, source/target connectivity, and scheduling dependencies
Define and execute a structured migration methodology — assess, convert, validate — for translating Informatica logic into equivalent PySpark/Spark SQL code on Databricks
Identify opportunities to simplify or consolidate legacy transformations during migration rather than performing a lift-and-shift
Manage Informatica repository metadata, mapping exports (XML), and PowerCenter Designer artifacts as inputs to the migration pipeline
Coordinate with source system owners (databases, flat files, legacy APIs) to ensure source connectivity is preserved or rerouted through AWS S3/Glue Catalog during migration
Validate migrated pipelines against Informatica source outputs using row-count reconciliation, checksum comparisons, and business rule validation
Audit and catalog all existing AWS Glue jobs — including PySpark and Python shell scripts, Glue Crawlers, Glue Data Catalog configurations, triggers, and job bookmarks
Assess Redshift loading patterns (COPY commands, stored procedures, views, materialized views) and define equivalent target-state patterns in Databricks using Delta Lake MERGE, upsert, and partition strategies
Evaluate and migrate Glue Crawlers and Glue Data Catalog schemas to Databricks Unity Catalog, ensuring metadata consistency and lineage continuity
Redesign Glue workflows and triggers as Databricks Workflow DAGs, preserving scheduling intent while improving observability and retry logic
Collaborate with AWS and cloud infrastructure teams to manage IAM role transitions, S3 access patterns, and network configurations during and after migration
Validate migrated pipelines against Glue/Redshift source outputs, including Redshift audit tables, row counts, and business-critical KPI reconciliation
Assess and migrate Amazon Kinesis Streams-based ingestion pipelines — analyzing stream consumers, shard configurations, and downstream processing logic — and re-architect them using Databricks Structured Streaming with Delta Lake as the target sink
Design low-latency streaming pipeline patterns on Databricks (Auto Loader, Structured Streaming) to replace Kinesis consumer applications, ensuring at-least-once or exactly-once delivery semantics are preserved
Requirements
12+ years in data engineering with at least 5+ years of hands-on Informatica PowerCenter experience (mappings, sessions, workflows, transformations, parameter files, workflow monitor)
Strong proficiency in PySpark and Spark SQL for building production-grade ETL/ELT pipelines
Solid understanding of AWS data services: S3, Redshift, Glue Data Catalog, IAM, CloudWatch
Experience migrating or re-platforming Informatica workloads to a modern data platform (Databricks, Spark, or cloud-native ETL)
Proficiency in SQL and familiarity with Redshift-specific SQL dialects and optimization patterns
Familiarity with Unity Catalog, Delta Live Tables, or similar data governance/pipeline orchestration frameworks is a strong plus
Experience with CI/CD tooling (Git, GitHub Actions, Jenkins, or similar) applied to data pipeline development
Proven track record leading a team of 4+ data engineers in a delivery-focused engagement or program
Strong analytical and problem-solving skills with the ability to work through ambiguous, undocumented legacy systems
Excellent written and verbal communication
able to present technical findings and migration plans to client stakeholders
Experience working in Agile/Scrum delivery environments
Consulting or client-engagement experience is a significant advantage
12+ years in data engineering with at least 5+ years of hands-on AWS Glue experience (PySpark ETL scripts, Python shell jobs, Glue Studio, Crawlers, Data Catalog, job bookmarks, triggers)
Deep expertise in Amazon Redshift — data modeling, distribution/sort keys, COPY/UNLOAD operations, stored procedures, performance tuning, and Redshift Spectrum
Hands-on experience with Amazon Kinesis Streams — stream consumers (KCL/Lambda/Glue Streaming), shard management, retention policies, and integration with downstream AWS services
Strong AWS platform proficiency: S3, IAM, CloudWatch, Kinesis Streams, AWS Secrets Manager, Lake Formation — AWS Solutions Architect or Data Analytics certification preferred
Strong proficiency in PySpark and Spark SQL for building and optimizing production pipelines on Databricks
Experience migrating Glue-based workloads to Databricks or equivalent Spark-based platforms
Familiarity with Unity Catalog, Delta Live Tables, and Databricks Asset Bundles is a strong plus
Experience with CI/CD tooling (Git, GitHub Actions, AWS CodePipeline, or similar) applied to data pipeline development
Proven track record leading a team of 4+ data engineers in a delivery-focused engagement or program
Strong ability to navigate AWS service interdependencies and translate cloud infrastructure nuances into migration decisions
Excellent written and verbal communication
able to present technical migration plans and risk assessments to client stakeholders
Experience working in Agile/Scrum delivery environments
Consulting or client-engagement experience is a significant advantage
Nice to have
Experience with real-time or near-real-time streaming pipelines using Databricks Structured Streaming, Delta Live Tables, or Apache Kafka — particularly migrating from Amazon Kinesis-based architectures
Databricks Certified Data Engineer Associate or Professional certification
AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect
Prior experience on a large-scale ETL migration or data platform modernization program
Familiarity with data observability tools (Monte Carlo, Great Expectations, Deequ) or Databricks built-in data quality frameworks
Experience with infrastructure-as-code tools such as Terraform or AWS CDK for managing Databricks workspace configurations
Knowledge of data mesh principles, medallion architecture (Bronze/Silver/Gold), and lakehouse design patterns
Prior consulting, systems integration, or professional services delivery experience