Word-Co-occurence-Embedding-Model

In this project, we build a word embedding model using the co-occurence matrix. We first build a co-occurrence matrix M, which is a symmetric word-by-word matrix in which Mij is the number of times wj appears inside wi’s window.

Then, we run a dimensionality reduction on the matrix using Singular Value Decomposition (SVD) and select the top k principal components. The below figure provides a visualization of dimensionality reduction using SVD. In this picture our co-occurrence matrix is A with n rows corresponding to n words. We obtain a full matrix decomposition, with the singular values ordered in the diagonal S matrix, and our new, shorter-length-k word vectors in Uk.

Below is a plot of few hand-picked words which have been dimensionally reduced to 2-dimensions

To run and obtain the embeddings

python run.py

This will compute co-occurrence, run SVD and create co_occurence_embeddings.png

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Images		Images
utils		utils
.gitignore		.gitignore
README.md		README.md
XCS224N_Assignment_1_Handout.pdf		XCS224N_Assignment_1_Handout.pdf
co_occurrence.py		co_occurrence.py
co_occurrence_embeddings.png		co_occurrence_embeddings.png
collect_submission.sh		collect_submission.sh
local_env.yml		local_env.yml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word-Co-occurence-Embedding-Model

About

Releases

Packages

Languages

shashankvmaiya/Word-Co-occurence-Embedding-Model

Folders and files

Latest commit

History

Repository files navigation

Word-Co-occurence-Embedding-Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages