- This project aims to show the positivity and death rate of the COVID-19 virus. The raw data will be preprocessed using EXCEL, SQL workbench in BigQuery.
- This project also seeks to show the fundamentals of BigQuery which is: a) Storage of Big Data b) Ingestion c) Querying
- The goal is to analyse the number of deaths globally.
The following tools were used in this project;
- Excel
- BigQuery
- SQL
- Tableau
- The Covid Dataset includes raw data on confirmed cases and deaths from Johns Hopkins University (JHU) and is publicly available on the website https://ourworldindata.org/
- The dataset is quite large and coud not be uploaded on GitHub.
- The dataset contains a lot of unnecessary information for the analysis. This project seeks to showcase:
- The Total number of deaths per continent.
- The Percent Population Infected per country.
- The dataset was loaded into Excel and slight reformattng was done.
- Columns that were not necessary were deleted and the main variables that guided this project included: i) Total number of deaths ii) Population iii) Total number of vaccinated persons.
- To avoid having to join multiple tables on SQL for every variable, only two tables were extracted from the main .csv file
- The two tables were then loaded onto BigQuery to analyse the data using various SQL queries
- A sample of the queries that were written to explore the data and join the two tables can be found on this link (https://console.cloud.google.com/bigquery?sq=931240212867:ebf14e8c0104404a9b37b5b6d63f2a89)
or on the file PortfolioSeries.sql(https://github.com/stacie-kipruto/CovidDeathsSQL/blob/main/PortfolioSeries.sql)
- After querying and arriving at the right questions, the data was extracted into (4) .csv files which was then visualized on the Tableau Platform to create an interactive dashboard.
- The link to the public dashboard can be found on https://public.tableau.com/app/profile/stacey.kipruto/viz/CovidDashboardPortfolio_16573154928880/Dashboard1
- Tableau is not a natively live streaming tool. The data for this project is current and the visuals presented on the dashboard will only show the numbers for the date in which the project was posted.