Data Analysis Project of No-Show Medical Appointments in Brazil.
Note: Findings are tentative. Not verified by the principles of statistics and machine learning.
https://www.kaggle.com/joniarroba/noshowappointments
- conda install pandas
- conda install pip
- conda install numpy
- conda install matplotlib
- conda install seaborn
- python -m pip install requests
- File is downloaded programmatically from Kaggle
Assess data for:
- Quality: inconsistent data, inaccurate data, non-descriptive headers, missing values (NAN)
- Tidiness: issues with structure that prevent easy analysis. Tidy data requirements: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table.
Types of assessment:
- Visual assessment
- Programmatic assessment (used Pandas)
Programmatic data cleaning process:
- Define: convert the assessments into defined cleaning tasks.
- Code: convert those definitions to code and run that code.
- Test: test your dataset, visually or with code, to make sure cleaning operations worked.
- Carried out the descriptive analysis
- Findings are tentative. Not verified by the principles of statistics and machine learning.