# Diabetes Prediction Using Machine Learning
## Project Overview
This project aims to predict diabetes in patients using machine learning techniques. The dataset is sourced from Kaggle and contains various medical predictor variables and one target variable indicating the presence of diabetes. I was inspired by my passion for working in the healthcare field and my data science knowledge.
## Dataset
**Source**: [Kaggle Healthcare Diabetes Dataset](https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes)
**Features**:
1. Id: Unique identifier for each data entry.
2. Pregnancies: Number of times pregnant.
3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test.
4. BloodPressure: Diastolic blood pressure (mm Hg).
5. SkinThickness: Triceps skinfold thickness (mm).
6. Insulin: 2-Hour serum insulin (mu U/ml).
7. BMI: Body mass index (weight in kg / height in m^2).
8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes.
9. Age: Age in years.
10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.
## Data Preprocessing
1. Handling missing values
2. Feature scaling
3. Data splitting
## Exploratory Data Analysis
Descriptive statistics
Data visualization (histograms, correlation matrix, pair plot)
## Model Building
Model selection: Decision Tree, Random Forest, SVM, k-NN
Model training
Hyperparameter tuning
## Model Evaluation
Evaluation metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC
Confusion matrix
## Conclusion
Summary of findings
Limitations
Future work
## Deployment
Streamlit app: Coming soon
## Installation
1. Clone the repository
```bash
git clone https://github.com/yourusername/diabetes-prediction.git
- Install the required packages
pip install -r requirements.txt
- Run the Streamlit app (Coming Soon)
streamlit run streamlit_app/app.py
This project is licensed under the MIT License.
- Kaggle for providing the dataset