-
Notifications
You must be signed in to change notification settings - Fork 2
A Universal Phrase Tagset
Welcome to the aaron-project-hppr wiki!
A universal phrase tagset containing 9 commonly used phrasal categories is proposed in the paper "Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation". The designed universal phrase tagset contains NP (noun phrase), VP (verbal phrase), AJP (adjective phrase), AVP (adverbial phrase), PP (prepositional phrase), S (sub-sentence), CONJP (conjunction phrase), COP (coordinated phrase), and X (a less clear category, e.g. describing list marker, foreign words, interjection, abbreviation, idiosyncratic unit, unknown or uncertain ones). Furthermore, the phrase tagset mapping between current treebanks (French and English) tagset and the universal phrase tagset has also been designed in the paper. Last, using the proposed universal phrase tagset and the mapping work, a novel unsupervised machine translation evaluation metric HPPR is proposed to test the effectiveness of the designed universal phrase tagset.
Detailed knowledge about the design of universal phrase tagset is shown in the paper "Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation" by Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu. 2013. Proceedings of the GSCL 2013, September 23-27, Germany. LNCS Vol. 8105, pp. 119–131. Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch.
Source code: https://github.com/aaronlifenghan/aaron-project-hppr
Download paper: http://link.springer.com/chapter/10.1007/978-3-642-40722-2_13#!
If you use the work in your researches, please cite the paper.
Contact: hanlifengaaron AT gmail.com