Skip to content

Diabetes prediction utilizing established characteristics. The objective of this exercise is to showcase the efficacy of Machine learning. The dataset comprises various health-related attributes gathered to facilitate the creation of predictive models for detecting potential diabetes risks.

Notifications You must be signed in to change notification settings

JonathanPollyn/Diabetic-Prediction

Repository files navigation

# Diabetes Prediction Using Machine Learning

## Project Overview
This project aims to predict diabetes in patients using machine learning techniques. The dataset is sourced from Kaggle and contains various medical predictor variables and one target variable indicating the presence of diabetes. I was inspired by my passion for working in the healthcare field and my data science knowledge.

## Dataset
**Source**: [Kaggle Healthcare Diabetes Dataset](https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes)
**Features**:
 1.    Id: Unique identifier for each data entry.
 2.    Pregnancies: Number of times pregnant.
 3.    Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test.
 4.    BloodPressure: Diastolic blood pressure (mm Hg).
 5.    SkinThickness: Triceps skinfold thickness (mm).
 6.    Insulin: 2-Hour serum insulin (mu U/ml).
 7.    BMI: Body mass index (weight in kg / height in m^2).
 8.    DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes.
 9.    Age: Age in years.
 10.   Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.

## Data Preprocessing
1. Handling missing values
2. Feature scaling
3. Data splitting

## Exploratory Data Analysis
 Descriptive statistics
 Data visualization (histograms, correlation matrix, pair plot)

## Model Building
 Model selection: Decision Tree, Random Forest, SVM, k-NN
 Model training
 Hyperparameter tuning

## Model Evaluation
 Evaluation metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC
 Confusion matrix

## Conclusion
 Summary of findings
 Limitations
 Future work

## Deployment
 Streamlit app: Coming soon

## Installation
1. Clone the repository
 ```bash
 git clone https://github.com/yourusername/diabetes-prediction.git
  1. Install the required packages
pip install -r requirements.txt
  1. Run the Streamlit app (Coming Soon)
streamlit run streamlit_app/app.py

License

This project is licensed under the MIT License.

Acknowledgements

  • Kaggle for providing the dataset

About

Diabetes prediction utilizing established characteristics. The objective of this exercise is to showcase the efficacy of Machine learning. The dataset comprises various health-related attributes gathered to facilitate the creation of predictive models for detecting potential diabetes risks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published