The original dataset from Panchenko et al., 2019 is represented by two files: comparg_train.tsv
and comparg_test.tsv
. A split has been created using data_preparation.py
.
To set up the environment, you need to install poetry and run the following commands.
pipx install poetry
cd SC/train
poetry install
All the requirements are listed in pyproject.toml
.
The encoder-based model can be trained on GPU by executing train_bert.py
.
CUDA_VISIBLE_DEVICES=0 python train_bert.py
If you don't want to report to WandB, please comment the report_to
argument in TrainingArguments in train_bert.py
.
To optimize the hyperparameters of a new encoder-based model from HuggingFace, run optimize_bert.py
.
CUDA_VISIBLE_DEVICES=0 python optimize_bert.py
To cross-validate a new encoder-based model from HuggingFace, run cross_val_bert.py
.
CUDA_VISIBLE_DEVICES=0 python cross_val_bert.py
Once the model is created, you can run a demo operated by Gradio.
python demo.py
An API was created to access the model through a request. It is in the main file main.py
.
python main.py