Proposition Acquisition from SRL: From our LREC 2016 paper (also DH 2016). Also covered in my thesis (Ruiz Fabo, 2017).
Proposition Extraction based on different relation sources, mainly PropBank and NomBank semantic roles.
A proposition is defined as a triple of shape ⟨actor, predicate, message⟩, where the predicate is a reporting verb or noun. The actor emits a message via the predicate.
The SRL-based workflow described in the references above relies on the following materials (but see at the end of the list for alternative sources of relational information that we also tested).
- data: domain data like actors and predicates
- db: scripts to help format results in way required by django app that allows to navigate the extractions
- getting keyphrase and entity offsets (and sentence) to match IXA Pipes tokenization
- getting sentence offsets according to IXA Pipes sentence-splitting
- etc.
- kp: keyphrase extraction (used to extract keyphrases from the propositions' messages)
- scripts: scripts to start module or general data parsing and analyses
- temp: temporary scripts
- vua_kn_temp_scripts: testing KafNafParserPy from VUA
- srl: to exploit ixa-pipes dependencies and SRL layers
The modules to run this workflow are
parse_srl.py
andparse_srl_from_pickle.py
parse_srl.py
: reads NAF files to arrive at propositions, optionally stores pickle with resultsparse_srl_from_pickle.py
: reads propositions off a pickle, and outputs them in configurable formats (options are set in the module directly).- exp: evaluable format
- exp_free: accepts incomplete propositions,
- exp_free_at: accepts incomplete and adds actor types
- testsets: different testsets created to test the srl-based extraction
- annotations: golden sets (i.e. sentences annotated with propositions)
- dev: devsets to work on different problems
- l6: 100 raw (unannotated) sentences for lrec 2016 test-set
- test_eval_scripts: cases to test evaluation script
- config.py: config for several of the modules
- evalprops.py: proposition extraction evaluation with F1, and error analysis
- manage_domain_data.py: parses the data in the data directory so that rest of modules can use it
- model.py: basic objects for proposition extraction (
Proposition
,Actor
,Predicate
etc.) - utils: general utility functions
Other sources of relational information that we also tested:
- madios: to work with the grammar induction algorithm ADIOS by Z. Solan, using this implementation.
- openie4: to work with Open IE 4, an Open Information Extraction toolkit