Word-Co-occurence-Embedding-Model

In this project, we build a word embedding model using the co-occurence matrix. We first build a co-occurrence matrix M, which is a symmetric word-by-word matrix in which Mij is the number of times wj appears inside wi’s window.

Then, we run a dimensionality reduction on the matrix using Singular Value Decomposition (SVD) and select the top k principal components. The below figure provides a visualization of dimensionality reduction using SVD. In this picture our co-occurrence matrix is A with n rows corresponding to n words. We obtain a full matrix decomposition, with the singular values ordered in the diagonal S matrix, and our new, shorter-length-k word vectors in Uk.

Below is a plot of few hand-picked words which have been dimensionally reduced to 2-dimensions

To run and obtain the embeddings

python run.py

This will compute co-occurrence, run SVD and create co_occurence_embeddings.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Word-Co-occurence-Embedding-Model

Files

README.md

Latest commit

History

README.md

File metadata and controls

Word-Co-occurence-Embedding-Model