Job Description:
Design and develop large-scale data systems, including databases, data warehouses, and big data platforms. Build robust and scalable software solutions using modern software engineering practices. Design and build scalable data pipelines for both batch and real-time processing - leveraging Apache Airflow, Spark, and SQL for ETL workflows across diverse data sources (e.g., relational databases, APIs, logs), and using tools like Apache Kafka, Flink, and Spark Structured Streaming to enable near real-time data processing for analytics and monitoring use cases. Collaborate with stakeholders to understand business needs and translate them into scalable and reliable data systems and tools, while ensuring data quality, privacy, and compliance. Champion and enforce data governance practices, including data lineage, metadata management, data quality controls, and privacy regulations. Drive automation initiatives by developing scripts, utilities, and frameworks to streamline data processes, improve efficiency, and enforce data governance practices. Collaborate with cross-functional teams, mentor junior engineers. Stay updated with latest industry trends and technologies in data engineering, Gen AI and Cloud solutions and create innovative solutions for complex challenges. Maintain comprehensive documentation of data solutions, processes, best practices and actively share knowledge with team. May telecommute.