Modeling Heart Diseases Classification

Author: Andy Peng

The contents of this repository detail an analysis of the heart disease classification project. This analysis is detailed in hopes of making the work accessible and replicable.

Business problem:

The task is to investigate heart diseases for a hospital. For this project we will be looking at a data set of patients in a hospital some with heart diseases and some without heart diseases. The main goal of this project is to keep an eye out for certain features that will allow us to categorize this patient as having heart disease.

Data

Data includes whether the patient have heart disease or not and features relating to the patient such as age, sex, chest pain, resting blood pressure, maximum heart rate, etc.

Methods

Descriptive Analysis
Modeling
Choices made
Key relevant findings from exploritory data analysis

Results

Visual 1

> Gender VS Heart Disease

Visual 2

> Chest Pain Types VS Heart Disease

Visual 3

> Thalium Stress Result VS Heart Disease

Visual 4

> Age of Individuals VS Heart Disease

Visual 5

> Age of Individuals VS Maximum Heart Rate

Models

> ROC Curve of the different Models

> Model Results of Base Model and Model Tuning

AUC - Random Forest with GridSearchCV
Accuracy/ F1 Score/ Recall - XGBoost
Precision - Decision Tree with GridSearchCV

Recommendations:

To summarize everything above, we can see from above that to correctly classified a patient as having heart disease we need to consider the following features.

Gender of the individual - Males have a higher chance at having heart disease than females.
Asymptomatic Chest Pain - Individuals with this type of chest pain have a high chance of having heart disease
Reversable Defect - If the thalium stress result turns out to be reversable defect, the individual would have a high chance of having heart disease.
Age & Maximum Heart Rate - As you get older, your maximum heart rate goes down. We can see that individuals that have heart disease tend to be older and have a lower maximum heart rate.

Our modeling shows that a regular XGBoost is the best model for our problem. This is because we want a model that generates a high recall value in order to minimize the chances of false negatives. Being that heart disease is extremely serious, mistakenly classifying a patient as false negative can be very dangerous.

Limitations & Next Steps

There are many features that we haven't considered. For example whether the family has a genetic disorder, body fat percentage, and the individual's diet. The model can be further improved by gathering more data.

For further information

Please review the narrative of our analysis in our jupyter notebook or review our presentation

For any additional questions, please contact andypeng93@gmail.com

Repository Structure:

Here is where you would describe the structure of your repoistory and its contents, for example:


├── README.md                       <- The top-level README for reviewers of this project.
├── Heart Disease.ipynb             <- narrative documentation of analysis in jupyter notebook
├── presentation.pdf                <- pdf version of project presentation
└── Visualizations
    └── images                          <- both sourced externally and generated from code

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Visualizations		Visualizations
Heart Disease.ipynb		Heart Disease.ipynb
Module3_Heart_Disease_Presentation.pdf		Module3_Heart_Disease_Presentation.pdf
Module3_Heart_Disease_Presentation_Notes.pdf		Module3_Heart_Disease_Presentation_Notes.pdf
README.md		README.md
datasets_33180_43520_heart.csv		datasets_33180_43520_heart.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modeling Heart Diseases Classification

Business problem:

Data

Methods

Results

Visual 1

Visual 2

Visual 3

Visual 4

Visual 5

Models

Recommendations:

Limitations & Next Steps

For further information

Repository Structure:

About

Releases

Packages

Languages

andypeng93/Heart_Disease_Classification

Folders and files

Latest commit

History

Repository files navigation

Modeling Heart Diseases Classification

Business problem:

Data

Methods

Results

Visual 1

Visual 2

Visual 3

Visual 4

Visual 5

Models

Recommendations:

Limitations & Next Steps

For further information

Repository Structure:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages