Skip to content

Commit

Permalink
lung cancer model added
Browse files Browse the repository at this point in the history
  • Loading branch information
aindree-2005 committed Dec 26, 2023
1 parent 19a88ff commit d10cbde
Show file tree
Hide file tree
Showing 11 changed files with 1,383 additions and 0 deletions.
2 changes: 2 additions & 0 deletions Lung Cancer Detection/Dataset/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer/data
LUNG CANCER DATA
5 changes: 5 additions & 0 deletions Lung Cancer Detection/Images/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
We have used EDA to display the following:
1. Distirbution of Lung Cancer Density by Age
2. Used plots to display the count of "yes" and "no"s by various parameters
3. Used barcharts to see if SMOTE is needed
4. Use corelation heatmaps to see dependence of various columns (as it approaches 1, we get more positive correlation)
Binary file added Lung Cancer Detection/Images/Screenshot (379).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lung Cancer Detection/Images/Screenshot (381).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lung Cancer Detection/Images/Screenshot (382).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lung Cancer Detection/Images/Screenshot (383).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 55 additions & 0 deletions Lung Cancer Detection/Models/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
## Lung Cancer Detection Using Ten Models
# Goal
The goal is to compare performance of standard machine learning models to Keras Sequential Model, which is in total 10 models
## Dataset
The dataset is : https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer/data
## Description
The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

## What I have done
1. Data cleaning and removal of duplicates.
2. EDA to see dependance of parameters and data distribution
3. SMOTE to balance classes for imbalanced data
4. Using 9 models (std. ML) for checking performance and classification
5. Designing a Keras Sequential Model for Lung Cancer Detection

## Libraries used
1. numpy
2. pandas
3. matplotlib
4. seaborn
5. tensorflow
6. keras
7. sklearn

## Visualization
![Alt text](<../Images/Screenshot (379).png>)
![Alt text](<../Images/Screenshot (381).png>)
![Alt text](<../Images/Screenshot (384).png>)

## Models Used
1. Logistic Regression
2. KNN
3. SVC
4. DecisionTree Classifier
5. Random Forest Classifier
6. Catboost Classifier
7. XGBoost Classifier
8. LGBM Classifier
9. Keras Sequential Models
## Accuracy
1. Logistic Regression - 0.95
2. KNN - 0.94
3. SVC - 0.95
4. DecisionTree Classifier -0.94
5. Random Forest Classifier - 0.95
6. Catboost Classifier - 0.96
7. XGBoost Classifier - 0.95
8. LGBM Classifier - 0.95
9. Gradient Boosting Classifier -0.95
9. Keras Sequential Model - 0.98
## Conclusion
Successfully able to develop a Machine Learning Model that can Analyse or Predict Lung Cancer.
Keras Sequential Model is most useful at 98%, with Catboost coming in at close second (96%). SMOTE is needed to balance classes. The sequential model is effective for lung cancer detection because it enables the construction of a step-by-step neural network, allowing the model to learn hierarchical representations. This is crucial for capturing intricate patterns in medical data. Additionally, CatBoost, a gradient boosting algorithm, complements the sequential model by enhancing its predictive power. CatBoost handles categorical features adeptly, vital in medical datasets, and mitigates overfitting. The combination of a sequential model and CatBoost leverages their respective strengths, resulting in a robust and accurate system for lung cancer detection.

## Aindree Chatterjee
1 change: 1 addition & 0 deletions Lung Cancer Detection/Models/ml-all.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit d10cbde

Please sign in to comment.