This repository contains code for an extended version of the mate-tools semantic role labeler. Most extensions are described in Roth and Woodsend, 2014. Unpublished extensions include feature selection routines and some currently undescribed additional functionalities.
June 2015: The current version achieves state-of-the-art performance on the CoNLL-2009 data set. With F1-scores of 87.33 in-domain and 76.38 out-of-domain, it is the best performing system for SRL in English to date. With an in-domain F1-score of 81.38, it is also the best SRL system available for German. A demo is available online here.
September 2015: This repository now also includes code for our frame-semantic SRL model introduced in Roth and Lapata, 2015. This model achieves a state-of-the-art F1-score of 76.88 on identifying and labeling arguments in FrameNet 1.5 full texts (using gold frames). Installation instructions are provided below. If you want to try out the frame-semantic SRL model online, please use the demo here.
The following libraries and model files need to be downloaded in order to run mateplus on English text:
- Bernd Bohnet's dependency parser and model files (
anna-3.3.jar
andCoNLL2009-ST-English*.model
)1 - The WSJ tokenizer from Stanford CoreNLP (
stanford-corenlp-3.x.jar
) - A recent Java port of LIBLINEAR (
liblinear-x.jar
) - The most recent mateplus SRL model (June 2015), available from Google Drive here
In order to run the FrameNet and context extensions of mateplus (i.e., framat and framat+context), please also download the following dependencies:
- FrameNet version 1.5, available from ICSI Berkeley here
- SEMAFOR (for frame identification, including MSTparser for preprocessing), available from CMU here
- Stanford CoreNLP (for coreference resolution), available from Stanford here
- GloVe 1.0a + pre-trained vectors (September 2015), available from Google Drive here
- The most recent framat SRL model (September 2015), available from Google Drive here
To run mateplus on German text, additional preprocessing libraries need to be downloaded:
- Bernd Bohnet's joint parsing model (
transition-1.30.jar
,pet-ger-S2a-X
andlemma-ger-3.6.model
) - OpenNLP tokenizer (libraries from
apache-opennlp-1.5.3*
andde-token.bin
) - The most recent mateplus SRL model for German (June 2015), available from Google Drive here
If you want to run mateplus on German text using ParZu as an external dependency parser (recommended for non-newswire text), please use this model from Google Drive.
If copies of all required libraries and models are available in the subdirectories lib/
and models/
, respectively, mateplus can simply be executed as a standalone application using the scripts scripts/parse.sh
and scripts/parse_framenet.sh
. These scripts run necessary preprocessing tools on a given input text file (assuming one sentence per line), and apply our state-of-the-art model for identifying and role labeling of semantic predicate-argument structures. For German, please use the script scripts/parse-ger.sh
(recommended for newswire text) or scripts/parse-ger-ext.sh
(recommended for non-newswire text).
It is also possible to apply the mateplus SRL model on already preprocessed text in the CoNLL 2009 format, using the Java class se.lth.cs.srl.Parse
. Since mateplus is trained based on preprocessed input from specific pipelines, however, we strongly recommend to use the complete pipeline to achieve best performance.
If you are using mateplus in your work--and we highly recommend you do!--please cite the following publication:
Michael Roth and Kristian Woodsend (2014). Composition of word representations improves semantic role labelling. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October, pp. 407-413
If you are using the FrameNet based models Framat or Framat+context, please cite the following journal paper:
Michael Roth and Mirella Lapata (2015). Context-aware Frame-Semantic Role Labeling. Context-aware frame-semantic role labeling. Transactions of the Association for Computational Linguistics, 3, 449-460.
Depending on which parts of the pipeline you are using, please also cite the following.
German joint parsing model: Bernd Bohnet, Joakim Nivre, Igor Boguslavsky, Richárd Farkas, Filip Ginter, Jan Hajic (2013). Joint morphological and syntactic analysis for richly inflected languages. Transactions of the Association for Computational Linguistics (TACL) 1:415--428
ParZu--The Zurich Dependency Parser: Rico Sennrich, Martin Volk, Gerold Schneider (2013). Exploiting synergies between open resources for german dependency parsing, POS-tagging, and morphological analysis. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria.
English parsing model: Bernd Bohnet (2010). Very high accuracy and fast dependency parsing is not a contradiction. The 23rd International Conference on Computational Linguistics (COLING), Beijing, China.
Original mate-tools SRL model: Anders Björkelund, Love Hafdell, and Pierre Nugues (2009). Multilingual semantic role labeling. In Proceedings of The Thirteenth Conference on Computational Natural Language Learning (CoNLL), Boulder, Colorado, pp. 43--48
1 To reproduce our evaluation results on the CoNLL-2009 data set, preprocessing components must be retrained on the training split only, using 10-fold jackknifing.