WikiSearchEngine

1. HOW TOs

To create index run

./index.sh <path to wikiDump>

Then to search, run

./search.sh

You will see something like this:

---------------------------------------------------------------------------------
➜  wikiSearch ./search.sh
'text:;cat:<...>;ref:<...>;title:<...>;link:<...>'

> 
---------------------------------------------------------------------------------

The prompt is where you type your query.

NOTE: QUOTATIONS AROUND QUERY ARE MUST.

2. QUERY SYNTAX:

'\<fieldname1\>: \<word1\> \<word2\>;\<fieldname2\>: \<word3\> \<word4\>;'

NOTE:

QUOTATIONS AROUND QUERY ARE MUST.
SEMICOLONS AFTER EACH FIELD IS A MUST.
NOT ALL FIELDS ARE MUST.

Fields:

text -> text of the corpus
title -> title of each document in the corpus
ref -> reference
cat -> category
link -> links in each page

Sample query:

> 'text: android lollipop;title: google;'

3. STATISTICS

The search module was coded in PYTHON 2.7.
90% of the time results will be given be under 0.03 seconds for 4 words.

4. IMPLEMENTATION

The indexer is implemented in JAVA and the search is written in PYTHON.
It is an implementation of BSBI algorithm.
In total six index files are constructed, three for the corpus, three for Document name retrieval.
Two small tertiary indexes in memory, two secondary indexes, one inverted index and one file of document names to be retrieved are created.
Can lookup any posting in just two disk accesses. This makes the search really fast.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
Java		Java
Python		Python
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
clean.sh		clean.sh
index.sh		index.sh
search.sh		search.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiSearchEngine

1. HOW TOs

2. QUERY SYNTAX:

3. STATISTICS

4. IMPLEMENTATION

About

Releases

Packages

Languages

ramkishore07s/WikiSearchEngine

Folders and files

Latest commit

History

Repository files navigation

WikiSearchEngine

1. HOW TOs

2. QUERY SYNTAX:

3. STATISTICS

4. IMPLEMENTATION

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages