Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ankushbhatia2 authored Oct 29, 2018
1 parent f29c1bc commit 2e16c06
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,23 @@ It is a python library for Searching similar documents in a large corpus of docu
It uses a 2-layer Earth Mover's Distance (my research) or a Jenson Shannon Distance over latent topic distribution of documents and word embeddings.

I'll update the further methodology once my paper is published.


## Classes
DocSearch() :

(i)__init__() takes 5 optional arguments.
"""
:param n_topics: number of topics (default 100)
:param wv_size: word embedding dimension (default 100)
:param stop_words: stop words list (default list)
:param min_word_freq: minimum word frequency (default 15000)
:param sim_metric: allowed values :['jenson-shannon', 'emd']
"""

(ii) __fit__() takes one single argument which is the list of documents.

(iii) __get_most_similar_documents__() takes 2 arguments _viz._ query_document and number of similar documents to be shown(k).
## Usage
```from docsearch import DocSearch
import pandas as pd
Expand Down

0 comments on commit 2e16c06

Please sign in to comment.