Author: Andy Peng
The contents of this repository detail an analysis of the heart disease classification project. This analysis is detailed in hopes of making the work accessible and replicable.
The task is to investigate heart diseases for a hospital. For this project we will be looking at a data set of patients in a hospital some with heart diseases and some without heart diseases. The main goal of this project is to keep an eye out for certain features that will allow us to categorize this patient as having heart disease.
Data includes whether the patient have heart disease or not and features relating to the patient such as age, sex, chest pain, resting blood pressure, maximum heart rate, etc.
- Descriptive Analysis
- Modeling
- Choices made
- Key relevant findings from exploritory data analysis
-
AUC - Random Forest with GridSearchCV
-
Accuracy/ F1 Score/ Recall - XGBoost
-
Precision - Decision Tree with GridSearchCV
To summarize everything above, we can see from above that to correctly classified a patient as having heart disease we need to consider the following features.
-
Gender of the individual - Males have a higher chance at having heart disease than females.
-
Asymptomatic Chest Pain - Individuals with this type of chest pain have a high chance of having heart disease
-
Reversable Defect - If the thalium stress result turns out to be reversable defect, the individual would have a high chance of having heart disease.
-
Age & Maximum Heart Rate - As you get older, your maximum heart rate goes down. We can see that individuals that have heart disease tend to be older and have a lower maximum heart rate.
Our modeling shows that a regular XGBoost is the best model for our problem. This is because we want a model that generates a high recall value in order to minimize the chances of false negatives. Being that heart disease is extremely serious, mistakenly classifying a patient as false negative can be very dangerous.
There are many features that we haven't considered. For example whether the family has a genetic disorder, body fat percentage, and the individual's diet. The model can be further improved by gathering more data.
Please review the narrative of our analysis in our jupyter notebook or review our presentation
For any additional questions, please contact andypeng93@gmail.com
Here is where you would describe the structure of your repoistory and its contents, for example:
├── README.md <- The top-level README for reviewers of this project.
├── Heart Disease.ipynb <- narrative documentation of analysis in jupyter notebook
├── presentation.pdf <- pdf version of project presentation
└── Visualizations
└── images <- both sourced externally and generated from code