Project Details:
- Read 10 files (.txt)
- Apply tokenization
- Apply Stop words (except [in,to])
- Build positional index and displays each term
- Allow users to write phrase query on positional index and system returns the matched documents for the query.
- Compute term frequency for each term in each document.
- Compute IDF for each term.
- Displays TF.IDF matrix.
- Compute cosine similarity between the query and matched documents.
- Rank documents based on cosine similarity.