Deplying Machine Learning Algorithms to predict the occurance of stroke in a person.
Stroke is the second largest cause of mortality worldwide and remains an enormous health burden for individuals. Hypertension, heart illness, diabetes and dysregulation of glucose metabolism, atrial fibrillation, and lifestyle variables are some of the controllable risk factors of stroke. The objective of our project is to successfully predict a person’s likelihood of suffering a stroke based on potentially modifiable risk variables by applying machine learning methods to big data sets. This may be done by analyzing medical records using machine learning models to find patterns that are related to the risk of stroke.
The dataset we use consists of 5111 rows and 12 columns.
● Id : Unique Identifier.
● Gender : "Male", "Female" or "Other".
● Age : Age of the patient.
● Hypertension : 1 if the patient has hypertension or 0 if not.
● Heart_disease : 1 if the patient has a heart disease or 0 if not.
● Ever_married : "No" or "Yes".
● Work_type : "Never_worked", "Children", "Govt_job", "Private", “Self_emp".
● Residence_type : "Rural" or "Urban".
● Avg_glucose_level : Average glucose level in blood.
● Bmi : body mass index.
● Smoking_status : "Formerly smoked", "Never smoked", "Smokes" or "Unknown".
● Stroke : 1 if the patient will have a stroke or 0 if not.
Decision Trees
Random Forest
Logistic Regression
Naive Bayes
SVM
KNN
Neural Networks MLP
scikit-learn
numpy
mathplotlib
seaborn
1.Download the repository usiing the "git clone" command.
2.Upload the "stroke_prediction.ipynb" file to google colab.
3.Upload the "healthcare-dataset-stroke-data.csv" file to the runtime.
4.Copy and Paste the relative path of the dataset to """ stroke_data=pd.read_csv('/healthcare-dataset-stroke-data.csv') """
5.Run the colab file.