Job Description
Design, develop, and maintain scalable data pipelines, ETL/ELT processes, and data integration solutions using Databricks, PySpark, and Delta Lake. Build and optimize cloud-native lakehouse architectures leveraging Azure Databricks, Azure Data Lake Storage (ADLS Gen2), and modern data engineering best practices. Develop robust data ingestion frameworks supporting structured, semi-structured, and unstructured data sources. Create high-performance data transformation workflows using Python, SQL, Spark, and Databricks notebooks. Design, implement, and optimize data models, schemas, partitioning strategies, and storage structures for large-scale analytics workloads. Integrate Databricks environments with downstream platforms including Snowflake, business intelligence tools, APIs, enterprise applications, and reporting solutions. Utilize Azure Data Factory (ADF), Azure Functions, and Azure ecosystem services to support end-to-end data platform operations. Implement and manage Databricks Unity Catalog, data governance frameworks, security controls, data lineage, and access management processes. Monitor, troubleshoot, tune, and optimize data pipeline performance, reliability, scalability, and operational efficiency. Develop and support CI/CD pipelines using Azure DevOps, GitHub Actions, Infrastructure-as-Code (Terraform), and modern DevOps practices. Collaborate closely with data engineers, data architects, analysts, software developers, business stakeholders, and client teams to deliver data-driven solutions. Support real-time and batch data processing initiatives using Spark, Kafka, Event Hubs, streaming frameworks, and event-driven architectures. Leverage orchestration and workflow automation tools including Airflow, dbt, and Azure Data Factory pipelines to manage complex data workloads. Participate in client-facing engagements, technical workshops, requirements gathering sessions, solution discussions, and consulting activities. Apply Databricks platform expertise, lakehouse architecture principles, data engineering best practices, and performance optimization techniques to deliver scalable enterprise data solutions.