Cancer Prediction using Machine Learning

Overview

This dataset was obtained from Kaggle, it contains $569$ samples of cell growth data frokm different patients. Every cell growth is classified into the following two classes:

$B$: Benign
$M$: Malignant

There are a total of 31 columns in the dataset, among which there is an id column which is removed during the preprocessing steps, 29 columns containing the features and diagnosis column is the target variable.

I have used TPOT package instead of Scikit Learn as the former is a low-code ML training module which uses genetic algorithm to optimize the pipeline.

Setting Up the Environment

create a conda environment with python version 3.11 and use pip install -r requirements.txt to install the necessary packages. After that start jupyter lab by executing jupyter lab command.

TPOT Settings

the following settings were used:

{"generations": 50,
 "population_size": 50,
 "scoring": "f1_weighted",
 "cv": 5,
 "subsample": 0.5,
 "n_jobs": -1,
 "verbosity": 2,
 "random_state": 1337
}

Results

The sklearn.metrics.classification_report on the validation dataset gives the following results

              precision    recall  f1-score   support

           0       0.61      0.70      0.65        69
           1       0.40      0.31      0.35        45

    accuracy                           0.54       114
   macro avg       0.50      0.50      0.50       114
weighted avg       0.53      0.54      0.53       114

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cancer_prediction.ipynb		cancer_prediction.ipynb
cancerdata.csv		cancerdata.csv
requirements.txt		requirements.txt
tpot_pipeline.py		tpot_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cancer Prediction using Machine Learning

Overview

Setting Up the Environment

TPOT Settings

Results

About

Languages

License

arnabd64/CancerPrediction

Folders and files

Latest commit

History

Repository files navigation

Cancer Prediction using Machine Learning

Overview

Setting Up the Environment

TPOT Settings

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages