This repository contains a machine learning project aimed at predicting the likelihood of a heart attack based on a set of medical attributes. The dataset includes various patient features such as age, cholesterol levels, and exercise-induced angina, among others. The goal of this project is to develop a robust predictive model that can assist in early diagnosis and prevention of heart-related conditions.
- Overview
- Dataset
- Features
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Model Building
- Model Evaluation
- Hyperparameter Tuning
- Results
- Installation
- Usage
The dataset includes the following features:
- Age: Age of the patient (years).
- Sex: Gender of the patient (1 = Male, 0 = Female).
- cp (Chest Pain Type):
- 0: Typical angina
- 1: Atypical angina
- 2: Non-anginal pain
- 3: Asymptomatic
- trestbps: Resting blood pressure (in mm Hg).
- chol: Serum cholesterol in mg/dl.
- fbs: Fasting blood sugar > 120 mg/dl (1 = True; 0 = False).
- restecg: Resting electrocardiographic results.
- thalach: Maximum heart rate achieved.
- exang: Exercise-induced angina (1 = Yes; 0 = No).
- oldpeak: ST depression induced by exercise relative to rest.
- slope: Slope of the peak exercise ST segment.
- ca: Number of major vessels (0-3) colored by fluoroscopy.
- thal: Thalassemia (0 = Normal; 1 = Fixed defect; 2 = Reversible defect).
- target: Heart attack occurrence (1 = Yes, 0 = No).
Extensive EDA was performed to understand the data distribution, identify correlations, and uncover hidden patterns. Key steps included:
- Univariate Analysis: Histograms, box plots, and density plots were created to inspect the distribution of individual features.
- Bivariate Analysis: Pair plots and correlation heatmaps were used to explore relationships between features and the target variable.
- Outlier Detection: Outliers were identified and analyzed using statistical methods.
To ensure the data was suitable for model building, the following preprocessing steps were taken:
- Handling Missing Values: No missing values were found in the dataset.
- Feature Scaling: Continuous features were standardized using z-score normalization.
- Encoding Categorical Variables: Categorical features were converted into numerical values using one-hot encoding.
Multiple machine learning models were tested to find the best-performing one. The models included:
- Logistic Regression
- Random Forest
- XGBoost
- Neural Networks (Keras)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=10)
The models were evaluated based on the following metrics:
- Accuracy
- Precision
- Recall
- F1-Score
- ROC-AUC Score
- Training Accuracy: 99.59%
- Test Accuracy: 83.61%
- ROC-AUC Score: 0.91 (for the best model)
Hyperparameter tuning was performed using GridSearchCV
and RandomizedSearchCV
to optimize the model's performance.
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.1, 0.2]
}
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
The final model demonstrates strong predictive capability with the following key results:
- Best Model: XGBoost Classifier
- Training Accuracy: 99.59%
- Test Accuracy: 83.61%
- Key Insights:
- Higher cholesterol levels and exercise-induced angina are significant predictors of heart attacks.
- The model's predictions are reliable with a high ROC-AUC score, indicating a strong ability to distinguish between patients with and without heart attacks.
Clone the repository and install the required dependencies:
git clone https://github.com/TravelXML/ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS.git
cd ML-HEART-ATTACK-EDA-PREDICTION-WITH-KERAS
pip install -r requirements.txt
To run the model and reproduce the results, follow these steps:
- Prepare the Data: Ensure the dataset is available in the correct directory.
- Run the Jupyter Notebook: Open the provided Jupyter notebook and execute the cells.
- Model Evaluation: Evaluate the model's performance on your data.
jupyter notebook heart-attack-eda-prediction.ipynb
Happy Coding!