Tagger(s)

In our analysis, we test different tagger's configurations:

tagger/tagger_with_bert_config.json - BiLSTM-CRF tagger using BERT embeddings
tagger/tagger_with_english_elmo_config.json - [BiLSTM-CRF tagger using English ELMo embeddings
tagger/tagger_with_german_elmo_config.json - BiLSTM-CRF tagger using German ELMo embeddings
New tagger

For the ELMo taggers, we use the following ELMo parameters (i.e. options and weights):

English: weights and options (use the weights and options files under fta/ after unzipping)
German: weights and options

You can look up how they differ in their performance in our analysis by checking out the corresponding accuracy rates reported in the Performance section of the README file.

=================================================================================

How to run the taggers:

Step 1:

Adjust parameters including file paths in the respective .json config files, as needed. By default, the paths point to datasets in data. See respective README files there for details about the datasets.

Step 2:

Preprocess the input data. The tagger models expect data to be in CoNLL-2003 format with the relevant columns being the first (TEXT) and the fourth (LABEL).

Step 3:

Train the model:

Run allennlp train [params] -s [serialization dir] to train a model, where

[params] is the path to the .json config file.

[serialization dir] is the directory to save trained model, logs and other results.

Step 4:

Evaluate the model:

Run allennlp evaluate [archive file] [input file] --output-file [output file] to evaluate the model on some evaluation data, where

[archive file] is the path to an archived trained model.

[input file] is the path to the file containing the evaluation data.

[output file] is an optional path to save the metrics as JSON; if not provided, the output will be displayed on the console.

Step 5:

Use the model to make preditions:

Run allennlp predict [archive file] [input file] --use-dataset-reader --output-file [output file] to parse a file with a pretrained model, where

[archive file] i is the path to an archived trained model.

[input file] is the path to the file you want to parse; this file should be in the same format as the training data, i.e. CoNLL-2003

use-dataset-reader tells the parser to use the same dataset reader as it used during training.

[output file] is an optional path to save parsing results as JSON; if not provided, the output will be displayed on the console.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly