Semantic-Similarity-Classification

A map-reduce application using a Google Syntactic N-Grams dataset, Amazon EMR and Hadoop map-reduce to calculate the co-occurrence vector of each word pair in a given gold standard dataset, based on the various measures of association with context and vector similarity discribed in the paper: https://www.cs.bgu.ac.il/~dsp211/wiki.files/04588492.pdf
Then we can build a classifier based on these vectors, running classification algorithm in WEKA software: http://www.cs.waikato.ac.nz/ml/weka/index.html in order to classify word pairs by their semantic similarity.

The input is the English All - Biarcs dataset of Google Syntactic N-Grams: http://storage.googleapis.com/books/syntactic-ngrams/index.html, which provides syntactic parsing of Google-books N-Grams. The format of the corpus is described in this file: https://docs.google.com/document/d/14PWeoTkrnKk9H8_7CfVbdvuoFZ7jYivNTkBX2Hj7qLw/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
out/artifacts		out/artifacts
src/main/java		src/main/java
target/classes		target/classes
DSP-Ass3.iml		DSP-Ass3.iml
README.md		README.md
pom.xml		pom.xml
relatedness_full.arff		relatedness_full.arff
relatedness_small.arff		relatedness_small.arff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic-Similarity-Classification

About

Releases

Packages

Languages

ofekalg/Semantic-Similarity-Classification

Folders and files

Latest commit

History

Repository files navigation

Semantic-Similarity-Classification

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages