Author: Do Kook Choe
This code is used for experiments described in "Naive Bayes Word Sense Induction."
You can download the SemEval 2010 Word Sense Induction task dataset at: http://www.cs.york.ac.uk/semeval2010_WSI/datasets.html.
USEAGE:
- cd src/
- ./compile.h
- ./run.h (with appropriate arguments)
DESCRIPTIONS OF FILES
in src:
- *.java are source files.
- compile.h compiles source files.
- run.h executes Experiment.class.
in data:
- smart_common_words.txt contains a list of stopwords from SMART IR engine.
- punctuation.txt contains a list of punctuation.
- nouns.txt and verbs.txt contains lists of target nouns and verbs respectively. These files are need to execute Experiment.
jars:
- dom4j.jar is to parse XML input. It is downloaded at http://dom4j.sourceforge.net/.
- stanford-corenlp-2012-07-09.jar is to tokenize sentences and lemmatize words. It is downloaded at http://nlp.stanford.edu/.