This repository contains solutions to the question answering problem on the SQuAD v1.1 dataset, which consists on selecting a possible answer to the given question as a span of words in the given context paragraph. The newest version (v2.0) of the dataset also contains unanswerable questions, but the one on which we worked on (v1.1) does not.
In order to install all the dependencies required by the project, you have two options:
- Using
pip
: make sure that you havePython 3.8
installed on your system and run
python3 -m venv squad
source squad/bin/activate
pip install -r init/requirements.txt
- Using
conda
: simply run the following command
conda env create --name squad -f init/environment.yml
conda activate squad
The training part of the project is managed through a Jupyter notebook, in which you can select which model to train and which hyperparameters to use.
Training and evaluation metrics, along with model checkpoints and results, are directly logged into a W&B project, which is openly accessible here. Logging abilities are only granted to members of the team, so that if you want to launch your training run, you would have to disable wandb
, by setting the environment variable WANDB_DISABLED
to an empty value at the top of the notebook (%env WANDB_DISABLED=
).
The testing part of the project is managed using two Python scripts:
compute_answers.py
: given the path to the testing JSON file (formatted as the official SQuAD training set JSON), computes and saves another JSON file with the following format
{
"question_id": "textual answer",
...
}
evaluate.py
: given the path to the same testing JSON file used in thecompute_answers.py
script and the JSON file produced by the script itself, prints to the standard output a dictionary of metrics such as theF1
andExact Match
scores, which can be used to assess the performance of a trained model as done in the official SQuAD competition