- Name: Zhaoyan LU
- Student ID: 23049710
This repo is for ELEC0134: Applied Machine Learning Systems 23/24.
It is using XGBoost to accomplish the binary classification task and the multi-class classificaion task.
$ tree
.
├── A
│ ├── README.md
│ ├── config.json
│ ├── solution.py # Solution for task A
│ ├── task_a_early_stopping_rounds_3_evals_result.json
│ ├── task_a_early_stopping_rounds_None_evals_result.json
│ └── task_a_training_model.json # Model saved
├── B
│ ├── README.md
│ ├── config.json
│ ├── solution.py # Solution for task B
│ ├── task_b_early_stopping_rounds_3_evals_result.json
│ ├── task_b_early_stopping_rounds_None_evals_result.json
│ └── task_b_training_model.json # Model saved
├── Datasets # An empty dir for Datasets
├── Makefile
├── README.md
├── constants.py # global constants
├── environment.yml # conda environment config
├── images
│ ├── flowchart.png # draw.io flowchart
│ ├── task_a_learning_curve.png
│ └── task_b_learning_curve.png
├── main.py # Entrypoint of this repo
├── plot.ipynb # Plots
├── requirements.txt # python package requirements
└── utils
├── dataset.py # Dataset Loader
├── logger.py # Logger for this project
└── solution.py # Solution framework for each task
Datasets being used are downloaded from zenodo:
Both of them are .npz
files.
You can use the following command to download them:
$ make download
...
$ tree Datasets
Datasets
├── PathMNIST
│ └── pathmnist.npz
└── PneumoniaMNIST
└── pneumoniamnist.npz
2 directories, 2 files
-
cuda 11.8
: If you want to usepy-xgboost-gpu
, GPU andcuda
are required -
Python packages
$ cat requirements.txt
# format python files
black
# read and process datasets
numpy
# Models
scikit-learn
# GPU version
# If you want to use CPU instead, use py-xgboost-cpu
py-xgboost-gpu
# save results
pandas
# plot
matplotlib
NOTE: py-xgboost-gpu
will use GPU. If you want to use CPU instead, use py-xgboost-cpu
and set the device using --device cpu
.
- Install all packages needed using
conda
:
$ make create-env
...
$ conda activate amls-final-zhaoyanlu
- Or you can use
pip
instead:
$ pip install -f requirements.txt
- All actions provided
$ python main.py --help
usage: main.py [-h] [-d] [-v] {solve,info} ...
AMLS Final Assignment
positional arguments:
{solve,info} actions provided
optional arguments:
-h, --help show this help message and exit
-d, --debug Print lots of debugging statements
-v, --verbose Be verbose
# options for the solve action
$ python main.py solve --help
usage: main.py solve [-h] [--task TASK] [--stages STAGES] [--device DEVICE]
[--save SAVE]
[--early_stopping_rounds EARLY_STOPPING_ROUNDS]
optional arguments:
-h, --help show this help message and exit
--task TASK task to solve: A, B, or all; default: all
--stages STAGES task stages: val, train, test, or all; default:
train,test
--device DEVICE Device ordinal for xgboost, available options: cpu,
cuda, and gpu; default: cuda
--save SAVE Save results or not, available options: True or False;
default: True
--early_stopping_rounds EARLY_STOPPING_ROUNDS
Used to prevent overfitting; require >= 0, default: 3
- Show info
$ python main.py info
-------------------------------------
| AMLS Assignment 23-24 |
| Name: Zhaoyan Lu |
| Student No: 23049710 |
-------------------------------------
- Solve all tasks
# Run all stages for all tasks
$ python main.py -v solve
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --stages val
# only training
$ python main.py -v solve --stages train
# training and testing
$ python main.py -v solve --stages train,test
- Task A
# Solve Task A
$ python main.py -v solve --task A
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --task A --stages val
# only training
$ python main.py -v solve --task A --stages train
# training and testing
$ python main.py -v solve --task A --stages train,test
- Task B
# Solve Task B
$ python main.py -v solve --task B
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --task B --stages val
# only training
$ python main.py -v solve --task B --stages train
# training and testing
$ python main.py -v solve --task B --stages train,test