The notebooks are supposed to work on the Pubmed_200k_RCT dataset.
The notebooks are supposed to run on Google Colab (it is possible to run it locally by changing the file paths and removing the drive mount for Google drive.
It may be required to install the following libraries to make the notebooks work:
$ pip install yellowbrick
$ pip install gensim
$ pip install seaborn
$ pip install transformers
$ pip install datasets
Because of the size of the trained models and preprocessed data, these file can be found here.
The motebook Task1.ipynb
requires to have the preprocessed dataset. To use this either use the preprocessed data mentioned in the above mentioned folder or run the notebook Task1_preprocessing.ipynb
. The preprocessed files should be in the path indicated by data_path
. For the notebook Task2.ipynb
either set is_preprocess_enabled=True
to let the notebook do the preprocessing or set it to False
and use the preprocessed data mentioned in the above mentioned folder. The preprocessed files should be in the path indicated by data_path
.
In the first cells of each notebook you can find the following line:
train = <True|False>
This indiates that the model will be loaded. To train the model simply change the previous cell to:
train = True
In the first cells of each notebook you can find 2 paths:
data_path = path/to/data/
model_path = path/to/saved/models/
The first path indicates the directory of the datasets, the second path indicates the file from which the model will be loaded (or saved if training is enabled): data_path
variable should point to the folder which contains the original dataset and the preprocessed files, model_path
variable should point to the folder which contains the model files.