Disease Prediction from Symptoms

A data mining application to predict disease using symptom data i.e. Prognosis. To develop this application, we used the Columbia University dataset and build a model using both Multinomial Naive-Bayes and Decision Tree Algorithm to predict the disease given the symptoms observed in a person.

Columbia University Dataset

This dataset is a knowledge database of disease-symptom associations generated by an automated method based on information in textual discharge summaries of patients at New York-Presbyterian Hospital admitted during 2004. The dataset can be found here.
The first column shows the disease, the second the number of discharge summaries containing a positive and current mention of the disease, and the associated symptom.

Tasks performed

Data extraction and cleaning : Basic cleaning, segmentation of columns and string formatting were performed in Excel.
Data preprocessing : Data preprocessing tasks performed include:
- Spelling mistakes in the names of diseases or symptoms or their codes was rectified
- The codes which were given to diseases and symptoms were removed as they were irrelevant for our task
- A cumulative list of all symptoms was made
- Each symptom was assigned a Boolean value of 0 or 1 for each disease, according to whether the symptom occurs with the disease or not
Data visualization : Built correlation heatmaps for relationship between the symptoms and relationship between the diseases
Model Building : Used 2 algorithms for this dataset and compared the results to evaluate which one yielded better results: Multinomial Naive Bayes Classifier and Decision Tree.

Find the detailed documentation here.

Results

The results of all the tasks can be viewed by running this code in Google Collab or in the detailed documentation above.

The entire decision tree is too big to be inserted here, so only a part of it is shown here. The entire decision tree can be found here.

Contributors

Mihir Gandhi - mihir-m-gandhi

Jasdeep Singh Grover - jasdeep100

Hardik Chodvadiya - willyhardik

Amit Dave - amitdave1998

License

This project is licensed under the MIT - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Datasets		Datasets
Images		Images
.gitignore		.gitignore
Disease_Prediction_from_Symptoms.ipynb		Disease_Prediction_from_Symptoms.ipynb
Disease_Prediction_from_Symptoms.pdf		Disease_Prediction_from_Symptoms.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disease Prediction from Symptoms

A data mining application to predict disease using symptom data i.e. Prognosis. To develop this application, we used the Columbia University dataset and build a model using both Multinomial Naive-Bayes and Decision Tree Algorithm to predict the disease given the symptoms observed in a person.

Columbia University Dataset

Tasks performed

Results

Contributors

License

About

Releases

Packages

Languages

License

mihir-m-gandhi/Disease-Prediction-from-Symptoms

Folders and files

Latest commit

History

Repository files navigation

Disease Prediction from Symptoms

A data mining application to predict disease using symptom data i.e. Prognosis. To develop this application, we used the Columbia University dataset and build a model using both Multinomial Naive-Bayes and Decision Tree Algorithm to predict the disease given the symptoms observed in a person.

Columbia University Dataset

Tasks performed

Results

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages