Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
config_iwslt14.py		config_iwslt14.py
config_model.py		config_model.py
config_toy_copy.py		config_toy_copy.py
prepare_data.py		prepare_data.py
seq2seq_attn_pg.py		seq2seq_attn_pg.py

README.md

Seq2seq Model with Policy Gradient Training

This example builds an attentional seq2seq model that is trained with policy gradient and BLEU reward. The example is mainly for demonstration of the Texar sequence Reinforcement Learning APIs. No MLE pre-training is included so the model collapsed very quickly. In practice one would usually pretrain the model with teacher-forcing MLE (e.g., see the example seq2seq_attn) and continue to fine-tune with policy gradient.

The data and model configs are exact the same as the MLE seq2seq example. The only difference is that MLE cross-entropy minimization is replaced with policy gradient training.

The example shows:

Use of texar.tf.agents.SeqPGAgent for policy gradient sequence generation.
Use of the Python-based texar.tf.evals.sentence/corpus_bleu for efficient reward computing, and the Moses texar.tf.evals.sentence/corpus_bleu_moses for standard test set evaluation.
Use of texar.tf.data.FeedableDataIterator for data feeding and resuming from breakpoint.

Usage

Dataset

Two example datasets are provided:

toy_copy: A small toy autoencoding dataset from TF Seq2seq toolkit.
iwslt14: The benchmark IWSLT2014 (de-en) machine translation dataset.

Download the data with the following cmds:

python prepare_data.py --data toy_copy
python prepare_data.py --data iwslt14

Train the model

Train the model with the following cmd:

python seq2seq_attn_pg.py --config_model config_model --config_data config_toy_copy

Here:

--config_model specifies the model config. Note not to include the .py suffix.
--config_data specifies the data config.

All configs are (mostly) the same as those in the seq2seq_attn example.

Results

The code is for demonstrating Texar API. With pure policy gradient and without MLE pretraining the model collapse very quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seq2seq_rl

seq2seq_rl

README.md

Seq2seq Model with Policy Gradient Training

Usage

Dataset

Train the model

Results

Files

seq2seq_rl

Directory actions

More options

Directory actions

More options

Latest commit

History

seq2seq_rl

Folders and files

parent directory

README.md

Seq2seq Model with Policy Gradient Training

Usage

Dataset

Train the model

Results