Automatic Sense Disambiguation of Potentially Idiomatic Expressions

This is the source code for a system to automatically disambiguate potentially idiomatic expressions (PIEs, for short) in text. It implements four methods of doing so: a baseline most-frequent-sense method, a baseline canonical form-based method (Fazly et al., 2009), a lexical cohesion graph-based method (Sporleder & Li, 2009), and a variation on that method using literal representations of idioms' figurative senses. It evaluates those methods on a combination of four corpora, the VNC-Tokens corpus, the IDIX corpus, the PIE Corpus, and the SemEval-2013 Task 5b dataset. For a detailed description of the systems, see our LAW-MWE-CxG paper.

Requirements

To run this code, you'll need the following Python setup:

Python 2.7.6
beautifulsoup4 4.5.1
numpy 1.14.0
scipy 0.19.1
spacy 2.0.6 + en_core_web_sm 2.0.0

Different versions might work just as well, but cannot be guaranteed.

You'll also need:

Getting Started

Clone the repository
Create subdirectories called working and ext
Add these symlinks (or edit config.py):
- create a symlink ext/BNC to the Texts directory of your copy of the BNC
- create a symlink ext/glove to the directory containing the GloVe embeddings
- create symlinks ext/VNC, ext/IDIX, ext/PIE_Corpus, and ext/SemEval to the main directory of the respective corpora
Try and run the system with python psd.py -c 0 -m cg -gs 0s. This should run a basic lexical cohesion graph method and evaluate on the development set of the combined corpora.
Get an overview of all options by simply running python psd.py --help

Contact

For any questions about (running) the system, feel free to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
canonical_form.py		canonical_form.py
cohesion_graph.py		cohesion_graph.py
config.py		config.py
definitions.py		definitions.py
evaluate.py		evaluate.py
most_frequent_sense.py		most_frequent_sense.py
pie.py		pie.py
psd.py		psd.py
read_corpus.py		read_corpus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Releases

Packages

Languages

License

VisualJoyce/pie-disambiguation

Folders and files

Latest commit

History

Repository files navigation

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages