To create index run
./index.sh <path to wikiDump>
Then to search, run
./search.sh
You will see something like this:
--------------------------------------------------------------------------------- ➜ wikiSearch ./search.sh 'text:;cat:<...>;ref:<...>;title:<...>;link:<...>' > ---------------------------------------------------------------------------------
The prompt is where you type your query.
NOTE: QUOTATIONS AROUND QUERY ARE MUST.
'\<fieldname1\>: \<word1\> \<word2\>;\<fieldname2\>: \<word3\> \<word4\>;'
NOTE:
- QUOTATIONS AROUND QUERY ARE MUST.
- SEMICOLONS AFTER EACH FIELD IS A MUST.
- NOT ALL FIELDS ARE MUST.
Fields:
- text -> text of the corpus
- title -> title of each document in the corpus
- ref -> reference
- cat -> category
- link -> links in each page
Sample query:
> 'text: android lollipop;title: google;'
- The search module was coded in PYTHON 2.7.
- 90% of the time results will be given be under 0.03 seconds for 4 words.
- The indexer is implemented in JAVA and the search is written in PYTHON.
- It is an implementation of BSBI algorithm.
- In total six index files are constructed, three for the corpus, three for Document name retrieval.
- Two small tertiary indexes in memory, two secondary indexes, one inverted index and one file of document names to be retrieved are created.
- Can lookup any posting in just two disk accesses. This makes the search really fast.