RL training #14

estelleaf · 2018-06-25T14:51:25Z

Hi,
Thanks for having shared your implementation of the RL chatbot.
I might ask stupid questions since I am not an expert in RL neither in NLP so sorry in advance!
1- In python/RL/train.py
l307, saver.restore(sess, os.path.join(model_path, model_name)) seems to intialize the weight of the model with some pretrained params, correct? Is it the ones given by the Seq2seq trained as usual in a supervised way? I dont find anywhere the 'model-55' you are using for this... Am I missing something?

2- In python/RL/rl_mpdel.py
Why do we have build_model and build_generator, it seems to have the same setup but not the same output. Is it RL specific?

3- In the paper
Also, in the paper they specified that for the reward they use a seq2seq2 model and not the RL model. Is this taken into consideration in your code?

Thanks a lot for your answers!

pochih · 2018-07-24T02:35:27Z

If the checkpoint exists, saver can restore the trained parameters
build_model will construct the graph for training, build_generator will construct the graph for inferring.
The most of the parts of two graphs is same.
Separate two graphs can make the development easier.
In the paper, they first train the model with seq2seq until convergence, then use policy gradient to train the model. The graph of seq2seq and RL is similar, but the reward function is used for the later.

estelleaf · 2018-07-24T11:19:13Z

Thanks a lot for your answers but I still dont get the 3.
1 - Which are the weights that are used for the reward ? The ones that are used in Seq2Seq after convergence or the ones of the policy that are being updated ?
2 - When I test the RL method, I dont have the same results as you show in the README when using model-56-3000. Is it normal?
3 - Here you have a file with sentences. You dont have an incoming flow of data, does it act as a replay memory?

pochih added the question label Jul 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL training #14

RL training #14

estelleaf commented Jun 25, 2018 •

edited

Loading

pochih commented Jul 24, 2018 •

edited

Loading

estelleaf commented Jul 24, 2018 •

edited

Loading

RL training #14

RL training #14

Comments

estelleaf commented Jun 25, 2018 • edited Loading

pochih commented Jul 24, 2018 • edited Loading

estelleaf commented Jul 24, 2018 • edited Loading

estelleaf commented Jun 25, 2018 •

edited

Loading

pochih commented Jul 24, 2018 •

edited

Loading

estelleaf commented Jul 24, 2018 •

edited

Loading