This is my thesis for a Natural Language Processing supervised by Dr. Wei Wang at UNSW.
This research provides a method for extracting information from academic text from the databases domain, using a verb as a query. The amount of latent information in documents in non-structured format or natural language texts is known to be very large, and this is motivation for the development of methods that are able to bring this information into a structured format that can be computationally useful. Most of the academic output is provided in different formats, mostly PDF (Portable Document Format), and contain a very large amount of information and comparison across methods and techniques. We chose to use language models to extract language information, such as part-of-speech tags or dependency trees, and use sets of rules to output a relation in the Relation(Arg1, Arg2, Argn) format. Our results correctness, for the types of relation we propose to extract, are comparable to other existing tools.
This cover is based on the code in Guillaume Jourjon's PhD thesis, and was further modified and generalised for by Olivier Mehani shtrom-ctan@ssji.net. There might be earlier authors whose name unfortunately have not made their way to my ears. Please let me know.
The latest version is available from [0].