-
Notifications
You must be signed in to change notification settings - Fork 3
Tagger(s)
In our analysis, we test different tagger's configurations:
-
tagger/tagger_with_bert_config.json
- BiLSTM-CRF tagger using BERT embeddings -
tagger/tagger_with_english_elmo_config.json
- [BiLSTM-CRF tagger using English ELMo embeddings -
tagger/tagger_with_german_elmo_config.json
- BiLSTM-CRF tagger using German ELMo embeddings - New tagger
For the ELMo taggers, we use the following ELMo parameters (i.e. options and weights):
- English: weights and options (use the weights and options files under
fta/
after unzipping) - German: weights and options
You can look up how they differ in their performance in our analysis by checking out the corresponding accuracy rates reported in the Performance section of the README file.
=================================================================================
-
Adjust parameters including file paths in the respective
.json
config files, as needed. By default, the paths point to datasets indata
. See respective README files there for details about the datasets.
- Preprocess the input data. The tagger models expect data to be in CoNLL-2003 format with the relevant columns being the first (TEXT) and the fourth (LABEL).
Train the model:
- Run
allennlp train [params] -s [serialization dir]
to train a model, where
[params]
is the path to the .json config file.
[serialization dir]
is the directory to save trained model, logs and other results.
Evaluate the model:
- Run
allennlp evaluate [archive file] [input file] --output-file [output file]
to evaluate the model on some evaluation data, where
[archive file]
is the path to an archived trained model.
[input file]
is the path to the file containing the evaluation data.
[output file]
is an optional path to save the metrics as JSON; if not provided, the output will be displayed on the console.
Use the model to make preditions:
- Run
allennlp predict [archive file] [input file] --use-dataset-reader --output-file [output file]
to parse a file with a pretrained model, where
[archive file]
i is the path to an archived trained model.
[input file]
is the path to the file you want to parse; this file should be in the same format as the training data, i.e. CoNLL-2003
use-dataset-reader
tells the parser to use the same dataset reader as it used during training.
[output file]
is an optional path to save parsing results as JSON; if not provided, the output will be displayed on the console.