Skip to content

Tagger(s)

Iris edited this page Feb 2, 2022 · 11 revisions

In our analysis, we test different tagger's configurations:

  1. tagger/tagger_with_bert_config.json - BiLSTM-CRF tagger using BERT embeddings
  2. tagger/tagger_with_english_elmo_config.json - [BiLSTM-CRF tagger using English ELMo embeddings
  3. tagger/tagger_with_german_elmo_config.json - BiLSTM-CRF tagger using German ELMo embeddings
  4. New tagger

For the ELMo taggers, we use the following ELMo parameters (i.e. options and weights):

You can look up how they differ in their performance in our analysis by checking out the corresponding accuracy rates reported in the Performance section of the README file.

=================================================================================

How to run the taggers:

Step 1:

  • Adjust parameters including file paths in the respective .json config files, as needed. By default, the paths point to datasets in data. See respective README files there for details about the datasets.

Step 2:

  • Preprocess the input data. The tagger models expect data to be in CoNLL-2003 format with the relevant columns being the first (TEXT) and the fourth (LABEL).

Step 3:

Train the model:

  • Run allennlp train [params] -s [serialization dir] to train a model, where

[params] is the path to the .json config file.

[serialization dir] is the directory to save trained model, logs and other results.

Step 4:

Evaluate the model:

  • Run allennlp evaluate [archive file] [input file] --output-file [output file] to evaluate the model on some evaluation data, where

[archive file] is the path to an archived trained model.

[input file] is the path to the file containing the evaluation data.

[output file] is an optional path to save the metrics as JSON; if not provided, the output will be displayed on the console.

Step 5:

Use the model to make preditions:

  • Run allennlp predict [archive file] [input file] --use-dataset-reader --output-file [output file] to parse a file with a pretrained model, where

[archive file] i is the path to an archived trained model.

[input file] is the path to the file you want to parse; this file should be in the same format as the training data, i.e. CoNLL-2003

use-dataset-reader tells the parser to use the same dataset reader as it used during training.

[output file] is an optional path to save parsing results as JSON; if not provided, the output will be displayed on the console.

Clone this wiki locally