AMLS_assignment23_24

Name: Zhaoyan LU
Student ID: 23049710

Overview

This repo is for ELEC0134: Applied Machine Learning Systems 23/24.

It is using XGBoost to accomplish the binary classification task and the multi-class classificaion task.

Repo Structure

$ tree
.
├── A
│   ├── README.md
│   ├── config.json
│   ├── solution.py                 # Solution for task A
│   ├── task_a_early_stopping_rounds_3_evals_result.json
│   ├── task_a_early_stopping_rounds_None_evals_result.json
│   └── task_a_training_model.json  # Model saved
├── B
│   ├── README.md
│   ├── config.json
│   ├── solution.py                 # Solution for task B
│   ├── task_b_early_stopping_rounds_3_evals_result.json
│   ├── task_b_early_stopping_rounds_None_evals_result.json
│   └── task_b_training_model.json  # Model saved
├── Datasets                        # An empty dir for Datasets
├── Makefile
├── README.md
├── constants.py                    # global constants
├── environment.yml                 # conda environment config
├── images
│   ├── flowchart.png               # draw.io flowchart
│   ├── task_a_learning_curve.png
│   └── task_b_learning_curve.png
├── main.py                         # Entrypoint of this repo
├── plot.ipynb                      # Plots
├── requirements.txt                # python package requirements
└── utils
    ├── dataset.py                  # Dataset Loader
    ├── logger.py                   # Logger for this project
    └── solution.py                 # Solution framework for each task

Datasets

Datasets being used are downloaded from zenodo:

Both of them are .npz files.

You can use the following command to download them:

$ make download
...

$ tree Datasets
Datasets
├── PathMNIST
│   └── pathmnist.npz
└── PneumoniaMNIST
    └── pneumoniamnist.npz

2 directories, 2 files

Requirements

cuda 11.8: If you want to use py-xgboost-gpu, GPU and cuda are required
Python packages

$ cat requirements.txt
# format python files
black
# read and process datasets
numpy
# Models
scikit-learn
# GPU version
# If you want to use CPU instead, use py-xgboost-cpu
py-xgboost-gpu
# save results
pandas
# plot
matplotlib

NOTE: py-xgboost-gpu will use GPU. If you want to use CPU instead, use py-xgboost-cpu and set the device using --device cpu.

Usage

Setup

Install all packages needed using conda:

$ make create-env
...

$ conda activate amls-final-zhaoyanlu

Or you can use pip instead:

$ pip install -f requirements.txt

All actions provided

$ python main.py --help
usage: main.py [-h] [-d] [-v] {solve,info} ...

AMLS Final Assignment

positional arguments:
  {solve,info}   actions provided

optional arguments:
  -h, --help     show this help message and exit
  -d, --debug    Print lots of debugging statements
  -v, --verbose  Be verbose

# options for the solve action
$ python main.py solve --help
usage: main.py solve [-h] [--task TASK] [--stages STAGES] [--device DEVICE]
                     [--save SAVE]
                     [--early_stopping_rounds EARLY_STOPPING_ROUNDS]

optional arguments:
  -h, --help            show this help message and exit
  --task TASK           task to solve: A, B, or all; default: all
  --stages STAGES       task stages: val, train, test, or all; default:
                        train,test
  --device DEVICE       Device ordinal for xgboost, available options: cpu,
                        cuda, and gpu; default: cuda
  --save SAVE           Save results or not, available options: True or False;
                        default: True
  --early_stopping_rounds EARLY_STOPPING_ROUNDS
                        Used to prevent overfitting; require >= 0, default: 3

Show info

$ python main.py info
-------------------------------------
|       AMLS Assignment 23-24       |
|         Name: Zhaoyan Lu          |
|        Student No: 23049710       |
-------------------------------------

Solve Tasks

Solve all tasks

# Run all stages for all tasks
$ python main.py -v solve
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --stages val
# only training
$ python main.py -v solve --stages train
# training and testing
$ python main.py -v solve --stages train,test

Task A

# Solve Task A
$ python main.py -v solve --task A
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --task A --stages val
# only training
$ python main.py -v solve --task A --stages train
# training and testing
$ python main.py -v solve --task A --stages train,test

Task B

# Solve Task B
$ python main.py -v solve --task B
# cross validation
# Notice: it will cost a long time to do cross validtion
$ python main.py -v solve --task B --stages val
# only training
$ python main.py -v solve --task B --stages train
# training and testing
$ python main.py -v solve --task B --stages train,test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMLS_assignment23_24

Contents

Overview

Repo Structure

Datasets

Requirements

Usage

Setup

Solve Tasks

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
A		A
B		B
Datasets		Datasets
images		images
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
constants.py		constants.py
environment.yml		environment.yml
main.py		main.py
plot.ipynb		plot.ipynb
requirements.txt		requirements.txt

ZhaoyanLU23/AMLS_assignment23_24

Folders and files

Latest commit

History

Repository files navigation

AMLS_assignment23_24

Contents

Overview

Repo Structure

Datasets

Requirements

Usage

Setup

Solve Tasks

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages