Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 1.17 KB

README.md

File metadata and controls

9 lines (8 loc) · 1.17 KB

DataAnalytics_Prep

This repository focuses on data understanding and preparation for the Covid-19 pandemic. The data comes from the Centers for Disease Control and Prevention. CDC is a USA health protection agency and is in charge of collecting data about the COVID-19 pandemic, and in particular, tracking cases, deaths, and trends of COVID-19 in the United States. CDC collects and makes public deidentified individual-case data on a daily basis, submitted using standardized case reporting forms. In this analysis, we focus on using the data collected by CDC to build a data analytics solution for death risk prediction. CDC collects demographic characteristics, exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities. It also includes information on whether the individual survived or not.

We carry out the following tasks:

  1. Prepare a data quality report for the dataset on CSV.
  2. Prepare a data quality plan for the cleaned CSV file.
  3. Explore Relationships between feature pairs.
  4. Transform the existing features to create new features with the aim to better capture the problem domain and the target outcome.