Skip to content

Latest commit

 

History

History
43 lines (35 loc) · 3.2 KB

README.md

File metadata and controls

43 lines (35 loc) · 3.2 KB

PASRL

Proposition Acquisition from SRL: From our LREC 2016 paper (also DH 2016). Also covered in my thesis (Ruiz Fabo, 2017).

Proposition Extraction based on different relation sources, mainly PropBank and NomBank semantic roles.

A proposition is defined as a triple of shape ⟨actor, predicate, message⟩, where the predicate is a reporting verb or noun. The actor emits a message via the predicate.

The SRL-based workflow described in the references above relies on the following materials (but see at the end of the list for alternative sources of relational information that we also tested).

  • data: domain data like actors and predicates
  • db: scripts to help format results in way required by django app that allows to navigate the extractions
    • getting keyphrase and entity offsets (and sentence) to match IXA Pipes tokenization
    • getting sentence offsets according to IXA Pipes sentence-splitting
    • etc.
  • kp: keyphrase extraction (used to extract keyphrases from the propositions' messages)
  • scripts: scripts to start module or general data parsing and analyses
    • temp: temporary scripts
    • vua_kn_temp_scripts: testing KafNafParserPy from VUA
  • srl: to exploit ixa-pipes dependencies and SRL layers The modules to run this workflow are parse_srl.py and parse_srl_from_pickle.py
    • parse_srl.py: reads NAF files to arrive at propositions, optionally stores pickle with results
    • parse_srl_from_pickle.py: reads propositions off a pickle, and outputs them in configurable formats (options are set in the module directly).
      • exp: evaluable format
      • exp_free: accepts incomplete propositions,
      • exp_free_at: accepts incomplete and adds actor types
  • testsets: different testsets created to test the srl-based extraction
    • annotations: golden sets (i.e. sentences annotated with propositions)
    • dev: devsets to work on different problems
    • l6: 100 raw (unannotated) sentences for lrec 2016 test-set
    • test_eval_scripts: cases to test evaluation script
  • config.py: config for several of the modules
  • evalprops.py: proposition extraction evaluation with F1, and error analysis
  • manage_domain_data.py: parses the data in the data directory so that rest of modules can use it
  • model.py: basic objects for proposition extraction (Proposition, Actor, Predicate etc.)
  • utils: general utility functions

Other sources of relational information that we also tested: