Error running BERT tagger on CoLi servers #4

siyutao · 2022-02-14T09:42:56Z

Currently getting error while running the allennlp0.8 BERT config tagger/tagger_with_bert_config.json after changing the label_encoding to "BIO" ("BIOUL" throws a different error)
Error output:

Traceback (most recent call last):
  File "/proj/irtg.shadow/conda/envs/allennlp/bin/allennlp", line 10, in <module>
    sys.exit(run())
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/run.py", line 18, in run
    main(prog="allennlp")
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 102, in main
    args.func(args)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/commands/train.py", line 116, in train_model_from_args
    args.cache_prefix)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/commands/train.py", line 160, in train_model_from_file
    cache_directory, cache_prefix)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/commands/train.py", line 243, in train_model
    metrics = trainer.train()
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/training/trainer.py", line 480, in train
    train_metrics = self._train_epoch(epoch)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/training/trainer.py", line 322, in _train_epoch
    loss = self.batch_loss(batch_group, for_training=True)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/training/trainer.py", line 263, in batch_loss
    output_dict = self.model(**batch)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/models/crf_tagger.py", line 182, in forward
    embedded_text_input = self.text_field_embedder(tokens)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 125, in forward
    return torch.cat(embedded_representations, dim=-1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 433 and 422 in dimension 1 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

Another TO-DO: we need to add min_padding_length to config. May or may not be related to the current error.

/proj/irtg.shadow/conda/envs/allennlp/lib/python3.7/site-packages/allennlp/data/token_indexers/token_characters_indexer.py:55: UserWarning: You are using the default value (0) of `min_padding_length`, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.

The text was updated successfully, but these errors were encountered:

irisferrazzo · 2022-02-15T11:13:28Z

Hi, I've read that @siyutao you have many exams, should we say that you continue working on the allennlp 2.8 ELMo tagger and @TheresaSchmidt and I meet asap, run the allennlp 0.8 BERT tagger, and discuss debugging? Let me know what you think about this.

siyutao · 2022-02-15T12:28:10Z

Hey @irisferrazzo , that sounds good to me if you and Theresa have time this week. Otherwise I can spend some time this weekend on this issue too (but I think we decided moving ELMo to 2.8 is the priority?) I'll be a lot freer from the 22nd. Thanks!

irisferrazzo · 2022-02-15T12:53:57Z

Hi @siyutao, yes you're right, but I want to respect the fact that you have many exams :) let's see if @TheresaSchmidt can this week or not. If not and you have some time this week/weekend to run and debug together instead of working on moving ELMo to 2.8, that would be obv better! Just don't want to put pressure on anybody :)

TheresaSchmidt · 2022-02-21T11:53:11Z

This is really not the error I would expect from changing label_encoding. I would suspect an underlying that influences the different errors for BIOUL and BIO, respectively. But I'm really just guessing, too.
I'll have a look but this looks very much like the type of error that I got stuck with before, i.e. when I was working on joint learning.

Also, this week I'm still pretty busy but next week should be better.

TheresaSchmidt · 2022-02-21T16:11:31Z

We've narrowed down the issue to the data. Somehow, with part of the data, the training runs through just fine (I tried with the German data and with cropped versions of the English data) but the full English data triggers the above error.

I did a superficial search for white-space irregularities (have had problems with that before) but couldn't find anything. We could also try to look for gaps in the data. Maybe there's a line where not all columns are filled.

TheresaSchmidt · 2022-02-22T09:18:35Z

We could also try to look for gaps in the data. Maybe there's a line where not all columns are filled.

Haven't found anything.

TheresaSchmidt · 2022-02-22T10:12:04Z

If I use the attached file as training data, I get a dimension error (like the one above but with different numbers). If I split up the file into two separate files, each of the two files trains successfully.

train_1222211.txt
This files contains one recipe from the English training data.

irisferrazzo · 2022-02-22T10:32:27Z

This last file actually has a last white line. Does it work though, right? I can have a look at the data now. I let you know if I find something

irisferrazzo · 2022-02-22T10:44:56Z

I have tried yesterday to run the elmo tagger but I still don't have access to proj/cookbook (for the elmo weights etc., which I would prefer not to download). Could you also run it on the same data if you get to it? Then we doublecheck

TheresaSchmidt · 2022-02-22T11:20:12Z

Here's a technical explanation why it's not working: allenai/allennlp#2851

However, this does not explain why it used to run through without a problem and suddenly doesn't do so anymore even though we haven't changed anything...

TheresaSchmidt · 2022-02-22T11:32:52Z

I have tried yesterday to run the elmo tagger but I still don't have access to proj/cookbook (for the elmo weights etc., which I would prefer not to download). Could you also run it on the same data if you get to it? Then we doublecheck

Training with elmo runs as expected. Confirming that the problem is with the tokenization in bert.

irisferrazzo · 2022-02-22T13:35:43Z

It seems like we need to change the way BERT embeds the recipes. The most quoted solutions is the addition of a sliding window allenai/allennlp#2537

TheresaSchmidt · 2022-02-23T10:25:21Z

It seems like we need to change the way BERT embeds the recipes. The most quoted solutions is the addition of a sliding window allenai/allennlp#2537

a) The feature for sliding windows only came after allennlp0.8, right? So it would probably be quite an effort to implement it.
b) We've prioritized moving elmo to 2.8.
Therefore I suggest to let this issue be (for now) and keep it in mind because I would expect the same error with bert in 2.8.

siyutao · 2022-02-23T10:39:58Z

only came after allennlp0.8

Isn't the current implementation on allennlp 0.8.4? According to the release notes, 0.8.4. happens to be the release that added #2537. I already re-implemented the BERT in 2.8 and there wasn't an error with this but there was problem replicating the previously reported results as we've talked about.

But agreed that we should prioritize moving ELMo.

TheresaSchmidt · 2022-02-23T12:05:11Z

Ah ok. Then let's just postpone this, I think.

TheresaSchmidt · 2022-02-23T16:37:14Z

We are sure that we haven't changed the data or the configuration of the model. This leaves only two possible factors that could have changed s.t. training the model doesn't work anymore (correct me if I missed something):

The environment at /proj/irtg.shadow/conda/envs/allennlp might have been updated.
The pre-trained bert-base-multilingual-cased could have changed. It seems the model is updated regularly. In general, it is downloaded once and then a locally stored version is used each time you're training a new model. However, it is possible that the local version either gets updated sometimes and / or that it was lost at some point in time and a newer version was downloaded instead.

siyutao added bug Something isn't working help wanted Extra attention is needed labels Feb 14, 2022

siyutao assigned siyutao, TheresaSchmidt and irisferrazzo and unassigned irisferrazzo Feb 14, 2022

siyutao closed this as completed Feb 15, 2022

siyutao reopened this Feb 15, 2022

siyutao assigned irisferrazzo Feb 15, 2022

irisferrazzo added wontfix This will not be worked on and removed help wanted Extra attention is needed labels Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running BERT tagger on CoLi servers #4

Error running BERT tagger on CoLi servers #4

siyutao commented Feb 14, 2022 •

edited

Loading

irisferrazzo commented Feb 15, 2022

siyutao commented Feb 15, 2022

irisferrazzo commented Feb 15, 2022 •

edited

Loading

TheresaSchmidt commented Feb 21, 2022

TheresaSchmidt commented Feb 21, 2022

TheresaSchmidt commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

TheresaSchmidt commented Feb 23, 2022

siyutao commented Feb 23, 2022

TheresaSchmidt commented Feb 23, 2022

TheresaSchmidt commented Feb 23, 2022

Error running BERT tagger on CoLi servers #4

Error running BERT tagger on CoLi servers #4

Comments

siyutao commented Feb 14, 2022 • edited Loading

irisferrazzo commented Feb 15, 2022

siyutao commented Feb 15, 2022

irisferrazzo commented Feb 15, 2022 • edited Loading

TheresaSchmidt commented Feb 21, 2022

TheresaSchmidt commented Feb 21, 2022

TheresaSchmidt commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

TheresaSchmidt commented Feb 22, 2022

irisferrazzo commented Feb 22, 2022

TheresaSchmidt commented Feb 23, 2022

siyutao commented Feb 23, 2022

TheresaSchmidt commented Feb 23, 2022

TheresaSchmidt commented Feb 23, 2022

siyutao commented Feb 14, 2022 •

edited

Loading

irisferrazzo commented Feb 15, 2022 •

edited

Loading