- Create MySQL database instance on RDS as transactional database
- Connect DB client DBeaver to the transactional database on RDS
- Create tables and load data using DBeaver
- Data source: https://www.kaggle.com/olistbr/brazilian-ecommerce
- Connect DB client DBeaver to the transactional database on RDS
- Create a Data Warehouse in Redshift, and connect it to DBeaver
- AWS DataPipeline: one-time load historical data into Redshift tables using copy command
- AWS Glue: incremental data loads into Redshift
- use AWS Secret Managers to store credentials
- create a schema for staging tables in Redshift, and create staging tables
- copy current data from S3 and store into a staging table
- delete from the main table in the Data Warehouse using the staging tables
- insert into the main table in the Data Warehouse by selecting everything from the staging table, and truncate the staging table at the end of this process.
- use Lambda function to trigger the Glue job
- use AWS Secret Managers to store credentials
- Data Enrichment & Centralisation
- User behaviour/Funnel data from external sources that is semi-structured can be stored in S3 (data lake)
- use AWs Glue Crawler and Athena to read and query the data
- the AWS Glue Crawler read the timestamp column into string format so a Glue job in pySpark is used to convert it into timestamp and also partition the timestamp into year and month
- Add a Lambda function to trigger the Glue job so when the data arrives in S3 the ETL process can be triggered.
- Imported the external data into Redshift
- method1: create external schema from data catalog
- method2: create external schema directly from s3
- External data can be joined with internal data in Redshift
- Connect Redshift to Tableau for data visualization and BI
-
Notifications
You must be signed in to change notification settings - Fork 3
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
Yan-Luo-AU/Data_Engineer_Project_ETL_BI
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published