You will first need to install julia in your system to run this code along side the various Libraries used in this project. You also need jupyter notebook to run this ipynb file. The packages needed are mentioned below.
- DataFrames
- CSV
- Plots
- Statistics
- StatsPlots
- Pkg
- Pandas
The dataset used in this project was taken from kaggle available at https://www.kaggle.com/uciml/breast-cancer-wisconsin-data and can also be found on UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29.
The accuracy of the model varies each time you train it from 70 to 90 percent reason being that it was only trained using a RandomForestClassifier. You can try applying various other models for better accuracy and/or do feature scaling and/or feature engineering.
You can understand more on how this model works from the article I have written in helloml available at https://helloml.org/breast-cancer-prediction-using-julia/.
Clone this repo and then use the Jupyter Notebook to open the ipynb file.