Job Description:
Understand and prioritize business problems and identify ways to leverage data to recommend solutions to business problems. Organize and synthesize data into actionable business decisions, focused on insights. Provide insight into, trends, financial and business operations through data analysis and the development of business intelligence visuals. Work with advanced business intelligence tools to complete complex calculations, table calculations, geographic mapping, data blending, and optimization of data extracts. Apply all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies Proficient in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase and Oozie with AWS EC2/Azure VM’s cloud computing Expertise in using Hive for creating tables, data distribution by implementing Partitioning and Bucketing. Capable in developing, tuning and optimizing the HQL queries Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa Expert in Spark SQL and Spark Data Frames using Scala for Distributed Data Processing Develop Data Frame and RDD (Resilient Distributed Datasets) to achieve unified transformations on the data load Expertise in various scripting languages like Linux/Unix shell scripts and Python Develop scheduling and monitoring Oozie workflows for parallel execution of jobs Experience in working with cloud environment AWS EMR, EC2, S3 and Athena and GCP Big Query Transfer data from different platforms into AWS platform Diverse experience in working with variety of Database like SQL Server, MySql, IBM DB2 and Netezza Manage the source code in GitHub Track and delivery requirements in Jira Expertise in using IDEs and Tools like Eclipse, GitHub, Jenkins, Maven and IntelliJ Optimize the Spark application to improve performance and reduced time on the Hadoop cluster Proficient in executing Hive queries using Hive cli, Web GUI Hue and Impala to read, write and query the data Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time Create metrics and apply business logic using Spark, Scala, R, Python, and/or Java Model, design, develop, code, test, debug, document and deploy application to production through standard processes also in addition build business models using Data science skills Harmonize, transform, and move data from a raw format to consumable and curated view Apply strong Data Governance principles, standards, and frameworks to promote data consistency and quality while effectively managing and protecting the integrity of corporate data