Transition-based joint syntactic dependency parser and semantic role labeler using stack LSTM RNN architecture
- A C++ compiler supporting the C++11 language standard
- Boost libraries
- Eigen (newer versions strongly recommended)
- CMake
- This runs with the older version of cnn/dynet :
cnn-v1
The first time you clone the repository, you need to sync the cnn/
submodule.
git submodule init
git submodule update
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen
make -j2
The data must be in the CoNLL 2009 format. For best performance, it is suggested to projectivize the treebank data. As a preprocessing step, first convert the treebank data (in CoNLL 2009 format) into transitions, *.transitions (this is a format usable by the joint parser), using the following commands.
java -jar ../jointOracle.jar -inp train.conll -lemmas true > train.transitions
java -jar ../jointOracle.jar -inp dev.conll > dev.transitions
Note that it is required to set the option ``lemmas'' to true for the training data, so that an auxilliary file, train.conll.pb.lemmas is generated, which saves the lemmas of all the predicate words. The joint parser can now run on the generated files.
parser/lstm-parse -T train.transitions -d dev.transitions -w sskip.100.vectors --propbank_lemmas train.conll.pb.lemmas -g dev.conll -e ../eval09.pl -s dev.predictions.conll --out_model joint.model -t
Link to the word vectors that we used in the ACL 2015 paper for English: sskip.100.vectors. The evaluation script, eval09.pl, is provided by CoNLL 2009. The model is written to the current directory.
Note-1: you can also run it without word embeddings by removing the -w option for both training and parsing.
Note-2: the training process should be stopped when the development Macro F1 does not substantially improve anymore.
Note-3: the default hyperparameters are the same as in the paper for the CoNLL 2009 English model. These can be changed by altering the command-line options provided in the lstm-parse.cc code.
The test.conll file must also be in to the CoNLL 2009 data format .
java -jar ../jointOracle.jar -inp test.conll > test.transitions
parser/lstm-parse -T train.transitions -d test.transitions -w sskip.100.vectors --propbank_lemmas train.conll.pb.lemmas -m joint.model -s test.predictions.conll -g test.conll -e ../eval09.pl
The parser will output a file test.predictions.conll with predicted syntax and SRL dependencies.
If you make use of this software, please cite the following:
@inproceedings{swayamdipta:2016conll,
author={Swabha Swayamdipta and Miguel Ballesteros and Chris Dyer and Noah A. Smith},
title={Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs},
booktitle={Proc. of CoNLL},
year={2016}
}
For questions and usage issues, please contact swabha@cs.cmu.edu