Covid19-ETL-Datapipeline

This repository contains a data engineering project that implements an ETL (Extract, Transform, Load) data pipeline using Dagster, Spark, Plotly, Dash. The goal of this project is to extract data related to Covid-19 from various sources, transform it to a standard format, and load it into a database. The transformed data is then used to create interactive dashboards using Plotly and Dash.

Project Structure

The project is structured as follows:

dagster, dagster-home, etl_pipeline: Contains the Dagster pipeline code (Spark data transformation)
spark: Contains the Spark initialization
notebooks: Contains the code for testing and for the interactive dashboards (Plotly + Dash)
covid-19-dataset: Contains the raw and transformed data

Technologies Used

The following technologies have been used in this project:

Dagster: A data orchestrator for machine learning, analytics, and ETL.
Spark: An open-source distributed computing system used for big data processing.
MySQL: An open-source relational database management system.
PostgreSQL: An open-source relational database management system.
MinIO: An open-source object storage server.
Plotly: An open-source data visualization library.
Dash: An open-source Python framework for building analytical web applications.

Cre(dataset): https://www.kaggle.com/datasets/imdevskp/corona-virus-report

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
covid-19-dataset		covid-19-dataset
dagster		dagster
dagster_home		dagster_home
etl_pipeline		etl_pipeline
notebooks		notebooks
spark		spark
.gitattributes		.gitattributes
.gitignore		.gitignore
AIDE_Project.pdf		AIDE_Project.pdf
DATAFLOW.png		DATAFLOW.png
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
load_data.txt		load_data.txt
mysql_schemas.sql		mysql_schemas.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid19-ETL-Datapipeline

Project Structure

Technologies Used

About

Languages

thangbuiq/covid19-etl-pipeline

Folders and files

Latest commit

History

Repository files navigation

Covid19-ETL-Datapipeline

Project Structure

Technologies Used

About

Topics

Resources

Stars

Watchers

Forks

Languages