Compare the performance of LSTM and GRU in Text Summarization using Pointer Generator Networks as discussed in https://github.com/abisee/pointer-generator
Based on code at https://github.com/steph1793/Pointer_Generator_Summarizer
Python 3.7+
Tensorflow 2.8+ (Any 2.x should work, but not tested)
rouge 1.0.1 (pip install rouge)
We use the CNN-DailyMail dataset. The application reads data in the tfrecords format files.
Dataset can be created and processed based on the instructions at https://github.com/abisee/cnn-dailymail
Alternatively, pre-processed dataset can be downloaded from https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail
Easiest way yet, download the dataset zip file and extract it into the dataset directory (by default, this will be "./dataset/")
In case you want to skip the training and run evaluate the models, you can download the checkpoint zip file and extract it into the checkpoint directory (by default, this will be "./checkpoint/").
Training and Evaluation will perform the actions on all four models. If you wish to train/eval only one model, pass it as --model_name.
python main.py --mode="train" --vocab_path="./dataset/vocab" --data_dir="./dataset/chunked_train"
python main.py --mode="eval" --vocab_path="./dataset/vocab" --data_dir="./dataset/chunked_val"
Most of the parameters have defaults and can be skipped. Here are the parameters that you can tweak along with their defaults.
Parameter | Default | Description |
---|---|---|
max_enc_len | 512 | Encoder input max sequence length |
max_dec_len | 128 | Decoder input max sequence length |
max_dec_steps | 128 | maximum number of words of the predicted abstract |
min_dec_steps | 32 | Minimum number of words of the predicted abstract |
batch_size | 4 | batch size |
beam_size | 4 | beam size for beam search decoding (must be equal to batch size in decode mode) |
vocab_size | 50000 | Vocabulary size |
embed_size | 128 | Words embeddings dimension |
enc_units | 256 | Encoder LSTM/GRU cell units number |
dec_units | 256 | Decoder LSTM/GRU cell units number |
attn_units | 512 | [context vector, decoder state, decoder input] feedforward result dimension - used to compute the attention weights |
learning_rate | 0.15 | Learning rate |
adagrad_init_acc | 0.1 | Adagrad optimizer initial accumulator value. Please refer to the Adagrad optimizer API documentation on tensorflow site |
max_grad_norm | 0.8 | Gradient norm above which gradients must be clipped |
checkpoints_save_steps | 1000 | Save checkpoints every N steps |
max_checkpoints | 10 | Maximum number of checkpoints to keep. Older ones are deleted |
max_steps | 50000 | Max number of iterations |
max_num_to_eval | 100 | Max number of examples to evaluate |
checkpoint_dir | "./checkpoint" | Checkpoint directory |
log_dir | "./log" | Directory in which to write logs |
results_dir | None | Directory in which we write the intermediate results (Article, Actual Summary and Predicted Summary) during evaluation |
data_dir | None | Data Folder |
vocab_path | None | Path to vocab file |
mode | None | Should be "train" or "eval" |
model_name | None | Name of a specific model. If empty, all models are used |