This repository hods code and artifacts of the paper titled Evaluating Recent Legal Rhetorical Role Labeling Approaches Supported by Transformer Encoders accepted in BRACIS 2023.
The reports
folder holds the reports with the results of experiments. It has a subfolder for each dataset.
Files related to specific models have the model name as prefix. For example, the files pe_app.py
, pe_models.py
, pe_run_InCaseLaw.py
and pe_run_RoBERTa.py
concern PE-S and PE-C models. The prefix mixup
relates the Mixup-A models whereas mixup2
relates the Mixup-B models.
There is a running script for each model. For example, to run the DFCSC-CLS-RoBERTa model we execute the dfcsc_cls_run_RoBERTa.py
file. Running a model yields the respective report file. There is no command line parameters.
This repository does not contain the original dataset, though it is available at this link. The original dataset path is set in the data_manager.py
file. Before running a model, set this path accordingly the location of the dataset on your system.
The hyperparameters of a model can be set in the respective run script. In the following we describe such hyperparameters.
Generic hyperparameters (i.e., they are available in all models):
ENCODER_ID
: identifier of the exploited pre-trained Transformer model from the Hugging Face repository (https://huggingface.co/models).MODEL_REFERENCE
: name utilized to reference the model in the reports.MAX_SEQUENCE_LENGTH
: number of tokens in a chunk or sentence. It is usually set equals to the maximum sequence supported by the pre-trained model. For DFCSC models, it works as the c_len hyperparameter.EMBEDDING_DIM
: the embedding dimension of a token embedding. It is determined by the choosen pre-trained model.N_EPOCHS
: the number of fine-tuning epochs.LEARNING_RATE
: the initial learning rate of the fine-tuning procedure.BATCH_SIZE
: batch size of the fine-tuning procedure.DROPOUT_RATE
: dropout rate of the classification layer.DATASET
: the ID/name of the dataset to be used. The options are7_roles
and4_roles
.n_iterations
: number of executions of the model. Each execution adopts a different random seed value.weight_decay
: weight decay value of the Adam optimizer.eps
: epsilon value of the Adam optimizer.use_mock
: boolean value to indicate if it should to use a mock model instead a real one. This is used as a way to speed the runing time when the code is being validated.n_documents
: number of documents to be used to train and evaluate a model. This is used as a way to speed the runing time when the code is being validated.
PE models:
COMBINATION
: the operation to combine positional embeddings and sentence embeddings. The options areS
for sum andC
for concatenation.
DFCSC-CLS and DFCSC-SEP models:
MIN_CONTEXT_LENGTH
(m_edges): the desired minimum number of tokens in the edges of a chunk.
Cohan models:
MAX_SENTENCE_LENGTH
: maximum number of tokens in a sentence.MAX_SENTENCES_PER_BLOCK
: maximum number of sentences in a chunk.CHUNK_LAYOUT
: the layout of the chunk. It must be set withCohan
.
Mixup-A models:
MIXUP_ALPHA
: the alpha hyperparameter of Mixup method.CLASSES_TO_AUGMENT
: when generating a Mixup vector, the training procedure chooses two source vectors. The first vector is always a random vector belonging to a class indicated in this hyperparamenter and the second vector is from a different class chosen at random.AUGMENTATION_RATE
: the gamma hyperparameter described in the paper.N_EPOCHS_ENCODER
: the total number or fine-tuning epochs of the encoder.STOP_EPOCH_ENCODER
: the epoch in which the fine-tuning of the encoder will finish. It must be equal or lesser thanN_EPOCHS_ENCODER
. Remark that the setsN_EPOCHS_ENCODER = 4
,STOP_EPOCH_ENCODER=2
andN_EPOCHS_ENCODER = 2
,STOP_EPOCH_ENCODER=2
produce different results because of the learning rate schedule procedure.LEARNING_RATE_ENCODER
: the initial learning rate used in the fine-tuning of the encoder.N_EPOCHS_CLASSIFIER
: the number of training epochs of the classifier.LEARNING_RATE_CLASSIFIER
: the initial learning rate used in the training of the classifier.
Mixup-B models:
MIXUP_ALPHA
: the alpha hyperparameter of Mixup method.CLASSES_TO_AUGMENT
: when generating a Mixup vector, the training procedure chooses two source vectors. The first vector is always a random vector belonging to a class indicated in this hyperparamenter and the second vector is from a different class chosen at random.AUGMENTATION_RATE
: the gamma hyperparameter described in the paper.