Supply Chain Data Engineering End-to-End Project

Objective

In this project, I developed a complete data pipeline, guiding data from collection to insights through a series of structured stages.

Extracted data, cleaned it and loaded into Google Cloud Storage for further processing.
Transformed and modeled the data using fact and dimensional data modeling concepts using Python.
Using ETL concept, orchestrated the data pipeline on Mage AI and loaded the transformed data into Google BigQuery.
Developed a dashboard on Looker Studio.

Dataset Used

I'm using the dataset from DataCo Global. The data description is [here](data\Data Dictionary.pdf). It consists of 52 features in areas of provisioning, production, sales, and commercial distribution.

Technologies:

Language: Python, SQL
Extraction and transformation: Jupyter Notebook, Google BigQuery
Storage: Google Cloud Storage
Orchestration: Mage AI
Dashboard: Looker Studio

Data Pipeline Architecture

Step 1: Cleaning and transformation - sc_data.ipynb Step 2: Storage Step 3: ETL, Orchestration - Mage: Extract, Transform, Load Step 4: Analytics - SQL script Step 5: Dashboard

Data Modeling

The datasets are designed using the principles of fact and dim data modeling concepts.

Step 1: Cleaning and Transformation

In this step, I loaded the CSV file into VSCode and carried out data cleaning and transformation activities prior to organizing them into fact and dim tables.script

Step 2: Stored the data on Google Cloud Storage

Step 3: ETL

Launched the SSH instance and installed required libraries and Mage AI library.
ETL is carried out by accessing the external IP address and mage-ai port number.
Created a new pipeline with the following stages:
- Extract
- Transform
- Load

Step 4: Analytics

After running the pipeline in Mage, the fact and dimensional tables were generated in Google Big Query. Performed few queries and crearted sales analytics table. SQL queries here.

Step 5: Dashboard

After completing the analysis, I loaded the relevant tables into Looker Studio and created a dashboard. Dashboard here.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Mage		Mage
data		data
images		images
LICENSE		LICENSE
README.MD		README.MD
data_cleaning.ipynb		data_cleaning.ipynb
queries.sql		queries.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supply Chain Data Engineering End-to-End Project

Objective

Dataset Used

Technologies:

Data Pipeline Architecture

Data Modeling

Step 1: Cleaning and Transformation

Step 2: Stored the data on Google Cloud Storage

Step 3: ETL

Step 4: Analytics

Step 5: Dashboard

About

Releases

Packages

Languages

License

srushtii-m/ETL-Analytics

Folders and files

Latest commit

History

Repository files navigation

Supply Chain Data Engineering End-to-End Project

Objective

Dataset Used

Technologies:

Data Pipeline Architecture

Data Modeling

Step 1: Cleaning and Transformation

Step 2: Stored the data on Google Cloud Storage

Step 3: ETL

Step 4: Analytics

Step 5: Dashboard

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages