Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to retrain existing Syntaxnet model? #37

Open
apurvnagvenkar opened this issue Jul 28, 2018 · 5 comments
Open

How to retrain existing Syntaxnet model? #37

apurvnagvenkar opened this issue Jul 28, 2018 · 5 comments

Comments

@apurvnagvenkar
Copy link

Is there a way to retrain the syntaxnet POS tagger model with new dataset?

@dsindex
Copy link
Owner

dsindex commented Jul 30, 2018

'parser_trainer.py' has '--pretrained_params, --pretrained_params_names' parameters.
in documentation, they are used for global training.

https://github.com/tensorflow/models/blob/master/research/syntaxnet/g3doc/syntaxnet-tutorial.md

bazel-bin/syntaxnet/parser_trainer \
  --arg_prefix=brain_parser \
  --batch_size=8 \
  --decay_steps=100 \
  --graph_builder=structured \
  --hidden_layer_sizes=200,200 \
  --learning_rate=0.02 \
  --momentum=0.9 \
  --output_path=models \
  --task_context=models/brain_parser/greedy/$PARAMS/context \
  --seed=0 \
  --training_corpus=projectivized-training-corpus \
  --tuning_corpus=tagged-tuning-corpus \
  --params=200x200-0.02-100-0.9-0 \
  --pretrained_params=models/brain_parser/greedy/$PARAMS/model \
  --pretrained_params_names=\
embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,\
bias_0,weights_0,bias_1,weights_1

but i guess it could be used for retraining 'brain_tagger'.
so i modified 'train.sh' for retraining 'brain_tagger' like below :

TAGGER_PARAMS=${TAGGER_HIDDEN_LAYER_PARAMS}-0.08-3600-0.9-0
function train_tagger {
    ${BINDIR}/parser_trainer \
      --task_context=${CONTEXT} \
      --arg_prefix=brain_tagger \
      --compute_lexicon \
      --graph_builder=greedy \
      --training_corpus=training-corpus \
      --tuning_corpus=tuning-corpus \
      --output_path=${TMP_DIR} \
      --batch_size=${BATCH_SIZE} \
      --decay_steps=3600 \
      --hidden_layer_sizes=${TAGGER_HIDDEN_LAYER_SIZES} \
      --learning_rate=0.08 \
      --momentum=0.9 \
      --beam_size=1 \
      --seed=0 \
      --params=${TAGGER_PARAMS} \
      --num_epochs=12 \
      --report_every=100 \
      --checkpoint_every=1000 \
      --pretrained_params=${TMP_DIR}/brain_tagger/greedy/${TAGGER_PARAMS}/model \
      --pretrained_params_names=embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,bias_0,weights_0,bias_1,weights_1 \
      --logtostderr
}

and ran again, as you see, 'eval metric' is already 91.04% for epoch 1'

$ ./train.sh -v -v
2018-07-30 21:53:52.409090: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-07-30 21:53:53.405823: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.12, eval metric: 91.04%
INFO:tensorflow:Writing out trained parameters.
....

@apurvnagvenkar
Copy link
Author

Hi,
It doesn't work when I change my dataset.
` File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 303, in
app.run(main)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 274, in run
_run_main(main, argv)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 238, in _run_main
sys.exit(main(argv))
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 299, in main
Train(sess, num_actions, feature_sizes, domain_sizes, embedding_dims)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 239, in Train
sess.run(targets, feed_dict=feed_dict)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1436,8] rhs shape= [1297,8]
[[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@embedding_matrix_2"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_matrix_2, save/RestoreV2_9)]]

Caused by op u'save/Assign_9', defined at:
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 303, in
app.run(main)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 274, in run
_run_main(main, argv)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 238, in _run_main
sys.exit(main(argv))
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 299, in main
Train(sess, num_actions, feature_sizes, domain_sizes, embedding_dims)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 216, in Train
parser.AddSaver(FLAGS.slim_model)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/graph_builder.py", line 577, in AddSaver
variables_to_save, builder=tf_saver.BaseSaverBuilder())
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 185, in restore
self.op.get_shape().is_fully_defined())
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/ops/state_ops.py", line 283, in assign
validate_shape=validate_shape)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
use_locking=use_locking, name=name)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1436,8] rhs shape= [1297,8]
[[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@embedding_matrix_2"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_matrix_2, save/RestoreV2_9)]]

`

@dsindex
Copy link
Owner

dsindex commented Aug 2, 2018

i guess there is a dimension mismatch.

 lhs shape= [1436,8] rhs shape= [1297,8]

what is the hidden layer size of the model you have?
in 'train.sh', '64' is used.

@apurvnagvenkar
Copy link
Author

apurvnagvenkar commented Aug 2, 2018

TAGGER_HIDDEN_LAYER_SIZES=64
TAGGER_HIDDEN_LAYER_PARAMS=64
Also i am just training the POS model remaining functionalities i have commented at the time of training and retraining.
Can that be an issue?

convert_corpus ${CORPUS_DIR}
train_tagger
preprocess_with_tagger
#pretrain_parser
#evaluate_pretrained_parser
#train_parser
#evaluate_parser

@dsindex
Copy link
Owner

dsindex commented Aug 2, 2018

i just have done testing and got the same error.

new
Building training network with parameters: feature_sizes: [8 2 3 3] domain_sizes: [5380    5 2087 2813]

original
Building training network with parameters: feature_sizes: [8 2 3 3] domain_sizes: [18755     5  4214  5365]

it seems that "embedding_matrix_0,embedding_matrix_1,embedding_matrix_2" model parameters refer to the original corpus(dimension?);;

so, i removed those parameters.

--pretrained_params=${TMP_DIR}/brain_tagger/greedy/${TAGGER_PARAMS}/model \
--pretrained_params_names=bias_0,weights_0,bias_1,weights_1 \

and then ran again

...
2018-08-02 22:49:23.828902: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-08-02 22:49:24.811940: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.11, eval metric: 86.80%
INFO:tensorflow:Writing out trained parameters.
INFO:tensorflow:Epochs: 2, num steps: 1100, seconds elapsed: 14.66, avg cost: 0.32,
INFO:tensorflow:Epochs: 2, num steps: 1200, seconds elapsed: 15.81, avg cost: 0.30,
INFO:tensorflow:Epochs: 2, num steps: 1300, seconds elapsed: 16.96, avg cost: 0.29,
INFO:tensorflow:Epochs: 2, num steps: 1400, seconds elapsed: 18.11, avg cost: 0.31,
INFO:tensorflow:Epochs: 2, num steps: 1500, seconds elapsed: 19.25, avg cost: 0.27,
INFO:tensorflow:Epochs: 2, num steps: 1600, seconds elapsed: 20.41, avg cost: 0.27,
INFO:tensorflow:Epochs: 2, num steps: 1700, seconds elapsed: 21.21, avg cost: 0.16,
2018-08-02 22:49:33.039774: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Epochs: 3, num steps: 1800, seconds elapsed: 22.23, avg cost: 0.19,
INFO:tensorflow:Epochs: 3, num steps: 1900, seconds elapsed: 23.35, avg cost: 0.22,
INFO:tensorflow:Epochs: 3, num steps: 2000, seconds elapsed: 24.51, avg cost: 0.23,
INFO:tensorflow:Evaluating training network.
2018-08-02 22:49:37.094506: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Seconds elapsed in evaluation: 1.00, eval metric: 89.33%

here is original one.

2018-08-02 22:51:34.841066: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-08-02 22:51:35.818366: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.10, eval metric: 81.04%
INFO:tensorflow:Writing out trained parameters.
INFO:tensorflow:Epochs: 2, num steps: 1100, seconds elapsed: 15.54, avg cost: 0.56,
INFO:tensorflow:Epochs: 2, num steps: 1200, seconds elapsed: 16.76, avg cost: 0.49,
INFO:tensorflow:Epochs: 2, num steps: 1300, seconds elapsed: 18.01, avg cost: 0.44,
INFO:tensorflow:Epochs: 2, num steps: 1400, seconds elapsed: 19.25, avg cost: 0.45,
INFO:tensorflow:Epochs: 2, num steps: 1500, seconds elapsed: 20.49, avg cost: 0.37,
INFO:tensorflow:Epochs: 2, num steps: 1600, seconds elapsed: 21.77, avg cost: 0.35,
INFO:tensorflow:Epochs: 2, num steps: 1700, seconds elapsed: 22.66, avg cost: 0.19,
2018-08-02 22:51:44.737869: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Epochs: 3, num steps: 1800, seconds elapsed: 23.77, avg cost: 0.26,
INFO:tensorflow:Epochs: 3, num steps: 1900, seconds elapsed: 24.99, avg cost: 0.32,
INFO:tensorflow:Epochs: 3, num steps: 2000, seconds elapsed: 26.21, avg cost: 0.34,
INFO:tensorflow:Evaluating training network.
2018-08-02 22:51:49.019258: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Seconds elapsed in evaluation: 1.00, eval metric: 87.70%

'86.80%' is a bit lower starting point but, it continues training after restoring 'bias_0,weights_0' parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants