There are 12 files in this asignment folder. BooleanOperator.py: defines and, or and not operator for list data structure. Conversion.py: defines infix to postfix conversion of boolean expressions ExtendedBinaryRetrieval.py: defines the extended binary retrieval model (phrase query with biword index). InverseIndex.py: defines basic inverted indexing Lemmatizer.py: defines tokenization and lemmatization of text main.py: main program. This is where from where you can test this assignment. Query.py: defines query processing (both normal and biword query processing) README.md: this file Stack.py: defines different operations of stack data structure PositionalIndex.py: defines positional indexing SoundexIndex.py: defines soundex indexing Soundex.py: defines soundex algorithm
- ExtendedBinaryRetrieval.py extends InverseIndex.py
Dataset/corpus for this assignment is present in the Dataset folder. posting_list.txt and biword_index.txt contain posting lists for single words and biwords respectively.
Indexes folder contains the indexes generated by the program. The indexes are stored in the form of a dictionary. The dictionary is stored in a text file.
- Implementing indexes through B+-trees
- Better structure to classes
- Processing proximity queries.