From 2e16c0627e5058c3f3295216dd4a22db17cdecc9 Mon Sep 17 00:00:00 2001 From: Ankush Bhatia Date: Mon, 29 Oct 2018 15:04:17 +0530 Subject: [PATCH] Update README.md --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index 0b4adc7..b5d3f8d 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,23 @@ It is a python library for Searching similar documents in a large corpus of docu It uses a 2-layer Earth Mover's Distance (my research) or a Jenson Shannon Distance over latent topic distribution of documents and word embeddings. I'll update the further methodology once my paper is published. + + +## Classes +DocSearch() : + +(i)__init__() takes 5 optional arguments. +""" + :param n_topics: number of topics (default 100) + :param wv_size: word embedding dimension (default 100) + :param stop_words: stop words list (default list) + :param min_word_freq: minimum word frequency (default 15000) + :param sim_metric: allowed values :['jenson-shannon', 'emd'] +""" + +(ii) __fit__() takes one single argument which is the list of documents. + +(iii) __get_most_similar_documents__() takes 2 arguments _viz._ query_document and number of similar documents to be shown(k). ## Usage ```from docsearch import DocSearch import pandas as pd