Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
-
Updated
Mar 17, 2023 - Python
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
In this project I build a batch ETL pipeline to read transactional data from Amazon RDS, transform it to a usable format and then load it into an Amazon S3 bucket. The data is then loaded into Redshift Tables, after which I perform analytical queries on the loaded data to gain insights.
Cloud Data Warehouse of Sparkify Data using Redshift
Developed a batch ETL pipeline to extract, transform, and load transactional data from RDS to Redshift. Used Sqoop to ingest data from RDS to HDFS, PySpark to transform and load data to S3, and Redshift to create and query dimension & fact tables. Performed analytical queries to identify ATMs with inactive transactions, ATM failures by weather, etc
Creation and database queries using AWS (S3, Redshift and IAM)
Add a description, image, and links to the redshift-aws topic page so that developers can more easily learn about it.
To associate your repository with the redshift-aws topic, visit your repo's landing page and select "manage topics."