Classification Model (End to End Classification of Heart Disease - UCI Data Set)
Create a Machine Learning Model capable of Predicting Presence of Heart Disease based on their Medical Attributes.
Problem Definition : Based on Medical Features, Predict whether the Patient have Heart Disease or not.
Data : Heart Disease UCI : ( The Orignal Data Set | Kaggle Data Set )
- Age
- Sex : 1 - Male; 0 - Female
- Chest Pain Type (4 values)
- Resting Blood Pressure
- Serum Cholestoral in mg/dl
- Fasting Blood Sugar in mg/dl
- Resting Electrocardiographic Results (values 0,1,2)
- Maximum Heart Rate Achieved : Thalach
- Exercise Induced Angina
- Oldpeak : ST Depression induced by Exercise Relative to Rest
- The Slope of the Peak Exercise ST Segment
- Number of Major Vessels (0-3) Colored by Flourosopy
- Thalassemia : 3 - Normal; 6 - Fixed Defect; 7 - Reversable Defect.
- Target : 1 - Heart Diseased; 0 - Not Heart Diseased.
- Logistic Regression
- K Nearest Neighbors
- Random Forest Classifier
Model Selection :
- Train Test Split
- Cross Validation
- Randommized Search Cross Validation
- Grid Search Cross Validation
Classification Evaluation Metrics :
- Accuracy Score
- Precision Score
- Recall Score
- F1 Score
- Receiver Operating Characteristics Curve
- Area Under Curve Score
- Classification Report
-
A Model that Predicts Zero False Positive has the Precision Score of 100%
-
A Model that Predicts Zero False Negative has the Recall Score of 100%
-
A Model that Predicts Zero False Positive and Zero False Negative has the F1 Score of 100%
-
Macro Average : Average of Precision, Recall and F1 Scores between Classes.
-
Macro Average does not take Imbalanced Class.
-
Weighted Average is Biased to the Class with More Samples.
If You have not Reached to your Expected Evaluation Metric :
-
Collect some more Data if Possible.
-
Try to Explore other Machine Learning Model.
-
Improve Current Model, Experiment with the Hyperparameters.