Thyroid disease is a very common problem in India, more than one crore people are suffering with the disease every year. Thyroid disorder can speed up or slow down the metabolism of the body.
The main objective of this project is to predict if a person is having compensated hypothyroid, primary hypothyroid, secondary hypothyroid or negative (no thyroid) with the help of Machine Learning. Classification algorithms such as Random Forest, XGBoost and KNN Model have been trained on the thyroid dataset, UCI Machine Learning repository. After hyperparameter tuning XGBoost model has performed well with better accuracy, precision and recall. Application has deployed on Azure with the help of flask framework.
Microsoft Azure: http://tddbulkprediction-env.eba-uqgwbduj.us-east-2.elasticbeanstalk.com/
- Python 3.8 and more
- Important Libraries: sklearn, pandas, numpy, matplotlib & seaborn
- Front-end: HTML, CSS
- Back-end: Flask framework
- IDE: Jupyter Notebook, Pycharm & VSCode
- Database: Cassandra
- Deployment: Microsoft Azure
Code is written in Python 3.8 and more. If you don't have python installed on your system, click here https://www.python.org/downloads/ to install.
- Create virtual environment - conda create -n myenv python=3.7
- Activate the environment - conda activate myenv
- Install the packages - pip install -r requirements.txt
- Run the app - python run app.py
Thyroid Disease Data Set from UCI Machine Learning Repository.
Link:https://archive.ics.uci.edu/ml/datasets/thyroid+disease
- Missing values handling by Simple imputation (KNN Imputer)
- Outliers detection and removal by boxplot and percentile methods
- Categorical features handling by ordinal encoding and label encoding
- Feature scaling done by Standard Scalar method
- Imbalanced dataset handled by SMOTE
- Drop unnecessary columns
- Various classification algorithms like Random Forest, XGBoost, KNN etc tested.
- Random Forest, XGBoost and KNN were all performed well. XGBoost was chosen for the final model training and testing.
- Hyper parameter tuning was performed using RandomizedSearchCV
- Model performance evaluated based on accuracy, confusion matrix, classification report.
Cassandra database used for this project.
The final model is deployed on Heroku using Flask framework.
Downloaded CSV file will contain index numer with type of thyroid disease patient is suffering from.
-
Architecture: https://github.com/AYUSHSURYAVANSHI/Thyroid-Disease-Detection-Project-/blob/main/Docs/TDD_Architecture_V1.0.pdf
-
Detailed Project Report: https://github.com/AYUSHSURYAVANSHI/Thyroid-Disease-Detection-Project-/blob/main/Docs/Thyroid%20Disease%20Detection%20(1).pdf
Ayush Suryavanshi: https://www.linkedin.com/in/ayush-suryavanshi/
Hello Reader if you find any bug please consider raising issue I will address them asap.