Skip to content
/ usif Public
forked from kawine/usif

Implementation of unsupervised smoothed inverse frequency (Best Paper, Repl4NLP @ ACL 2018)

Notifications You must be signed in to change notification settings

rvoleti89/usif

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uSIF

This is an implementation of unsupervised smoothed inverse frequency (uSIF), a simple but effective way to create sentence embeddings without any labelled data (Best Paper, Repl4NLP @ ACL 2018). See the paper for more details.

Setup

  1. Unzip the pre-trained ParaNMT word vectors (thanks to John Wieting for providing this).
  2. Install the python packages in requirements.txt.
  3. Initialize a uSIF embedding model with usif.py. Call get_paranmt_usif to get the model that uses the ParaNMT vectors and call test_STS to see if you get the expected results. Once you know it's working, feel free to try it with other word vectors.

Embedding Individual Sentences

If you don't have a sizable list of related sentences to embed, then there is not much point to doing piecewise common component removal, in which case you can set m = 0 when initializing uSIF. Even for STS tasks, setting m = 0 only decreases performance by 1 - 4%.

About

Implementation of unsupervised smoothed inverse frequency (Best Paper, Repl4NLP @ ACL 2018)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.8%
  • Perl 42.2%