Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. We propose an approach to induce and disambiguate word senses of some target words in collections of short texts, such as tweets, through the use of fuzzy lexico-semantic patterns that we define as sequences of Morpho-semantic Components (MSC).
miningMSCpatterns.php
is an algorithm to find the most frequent MSC+ patterns in a set of documents.msc-microposts2016test.txt
is the document previously annotated with PoS tagging and some word senses.patterns
folder has the resulting of the mining MSC+ patterns algorithm.
If you use any code or sources from MSC patterns in your research work, you are kindly asked to acknowledge the use of the tool in your publications.
Goularte, F.B., Sorato, D., Nassar, S.M., Fileto, R., Saggion, H. "MSC+: Morpho-semantic Components for Word Sense Induction and Disambiguation." 2019.