Code for ACL 2021 paper Self-Attention Networks Can Process Bounded Hierarchical Languages
- Install the required packages.
pip install -r requirements.txt
- Evaluate different positional encoding schemes (Figure 4 (a)):
for d in acl2021/experiments_embedding/*; do
python src/run_lm.py ${d}
done
- Compare Transformer and LSTM with different memory dims (Figure 4 (b, c)):
for d in acl2021/experiments_memory/*; do
python src/run_lm.py ${d}
done
This repository exclusively uses yaml
configuration files for specifying each experiment.
Here's an explanation of what each part of the yaml
configs means:
The first portions specify the datasets' locations and properties of the specific Dyck-(k,m) language.
For generating data with rnns-stacks/generate_mbounded_dyck.py
, only this portion is needed.
corpus
:train_corpus_loc
: The filepath for the training corpusdev_corpus_loc
: The filepath for the development corpustest_corpus_loc
: The filepath for the test corpus
language
:train_bracket_types
: The number of unique bracket types, also k in Dyck-(k,m) for the training set.train_max_length
: The maximum length of any training exampletrain_min_length
: The minimum length of any training exampletrain_max_stack_depth
: The maximum number of unclosed open brackets at any step of a training exampletrain_sample_count
: Number of samples in tokens (!!) not lines, for the training set.dev_bracket_types
: The number of unique bracket types in the development set, also k in Dyck-(k,m).dev_max_length
: The maximum length of any development exampledev_min_length
: The minimum length of any development exampledev_max_stack_depth
: The maximum number of unclosed open brackets at any step of a development exampledev_sample_count
: Number of samples in tokens (!!) not lines, for the development set.test_bracket_types
: The number of unique bracket types, also k in Dyck-(k,m) for the test set.test_max_length
: The maximum length of any test exampletest_min_length
: The minimum length of any test exampletest_max_stack_depth
: The maximum number of unclosed open brackets at any step of a test exampletest_sample_count
: Number of samples in tokens (!!) not lines, for the test set.
Note that running an experiment training an LM with a specific corpus
and language
configuration doesn't generate the corresponding dataset; instead, you should first run rnns-stacks/generate_mbounded_dyck.py
to generate the dataset, and then use rnns-stacks/run_lm.py
to train and evaluate the LM.
The next portions of the yaml
configuration files is for specifying properties of the LSTM LMs.
lm
:embedding_dim
: The dimensionality of the word embeddings.hidden_dim
: The dimensionality of the LSTM hidden states.lm_type
: Chooses RNN type; pick from RNN, GRU, LSTM.num_layers
: Chooses number of stacked RNN layerssave_path
: Filepath (relative to reporting directory) where model parameters are saved.
reporting
:reporting_loc
: Path specifying where to (optionally construct a folder, if it doesn't exist) to hold the output metrics and model parameters.reporting_methods
: Determines how to evaluate trained LMs.constraints
provides an evaluation metric determining whether models know which bracket should be closed, whether the sequence can end, and whether an open bracket can be seen at each timestep.
training
:batch_size
: Minibatch size for training. Graduate student descent has found that smaller batches seems to be better in general. (100: too big. 1: maybe the best? But very slow. 10: good)dropout
: Dropout to apply between the LSTM and the linear (softmax matrix) layer constructing logits over the vocabulary.learning_rate
: Learning rate to initialize Adam to. Note that a 0.5-factor-on-plateau decay is implemented; each time the learning rate is decayed, Adam is restarted.max_epochs
: Number of epochs after which to halt training if it has not already early-stopped.seed
: Doesn't actually specify random seed; used to distinguish multiple runs in summarizing results. Maybe should have specified random seeds, but wouldn't replicate across different GPUs anyway...
generate_mbounded_dyck.py
: Code for generating samples from distributions over Dyck-(k,m).run_lm.py
: Code for running experiments withyaml
configs.rnn.py
: Classes for specifying RNN models.transformer.py
: Classes for specifying Transformer models.lm.py
: Classes for specifying probability distributions given an encoding of the sequence.dataset.py
: Classes for loading and serving examples from disk.training_regimen.py
: Script for training an LM on samplesreporter.py
: Classes for specifying how results should be reported on a given experiment.utils.py
: Provides some constants (like the Dyck-(k,m) vocabulary) as well as paths to corpora and results.
@inproceedings{yao2021dyck,
title={Self-Attention Networks Can Process Bounded Hierarchical Languages},
author={Yao, Shunyu and Peng, Binghui and Papadimitriou, Christos and Narasimhan, Karthik},
booktitle={Association for Computational Linguistics (ACL)},
year={2021}
}
The code heavily borrows from dyckkm-learning. Thanks John!
For any questions please contact Shunyu Yao <shunyuyao.cs@gmail.com>
.