Please note that the dataset has been removed from data/
directory in case of dataset leakage, remember to add the dataset in this directory and modify the data path in data_processing.py
before running the code
To clone the project:
git clone https://github.com/ACSEkevin/Industrial-Programme-with-AMRC-Sheffield.git
checkpoint/
: storing weightsHDF5 file
data/
: storing dataset csv file
itpma3_utils/
:
utils.py
: wrapping functions and classes that are frequently usedmodels/
: machine learning models
data_processing.py
: data analysis, preprocessing, feature engineering
train.py
: model training
evaluate.py
: model evaluation
requirements.py
: for version test and available packages detecting
The notebook version of data processing, model training and evalutaion are also provided which can resent a clear overall visualizations:
NOTICE:
Please change the directory before running the code, in colab, this command might be helpful:
from google.colab import drive
drive.mount('/content/drive')
The models in the project are developed using Keras/TensorFlow (MLP) and Scikit-Learn (AdaBoost, XGBoost, LightGBM, same API), any questions please refer to
- Keras tutorial: build a model in class object
- TensorFlow tutorial: model save checkpoint, weights saving and loading
- Sklearn ensemble tutorial: Ensemble learning & AdaBoost
- Numpy quick start
- XGBoost sklearn API tutorial
- LightGBM sklearn API tutorial
The project has six contributors. All the page links will be refined in the future