Search Engine for FAST-Resources, a repository of study materials, for user to enter keywords as text, to retrieve all files that contains them.
i. Using Regular Expressions library to clean trailing whitespaces from files and converting to Lower Case
iii. Transforming into Cleansed Words after filtering Fillers (Stop Words / Punctuation / 3 Letter Words)
ii. Extracting text from file, skipping over any text-less files and Counting Term Frequency of Transformed
v. Running the driver program to perform all the above for all the files present in FAST-Resources repository
i. Searching for the query (input) by splitting into words for using as index to fetch file paths in its row
ii. If only single file is found then its keys (file paths and Relevance score) is fetched from src dict (DB)
iii. If multiple files are found then their file paths are intersected into posting list before keys fetched
i. Iniating SQLite database with text columns: Word, Topic, File and integer column: Relevance score
ii. Creating Index at runtime on Word column and initiating Search Pattern 1 that allows search with Topic
iii. Creating Index at runtime on Word column and initiating Search Pattern 2 that allows search without Topic
iv. Inserting the Word and its Topic, File Path, and Relevance score (TF in main.py for ordering search results)
v. Using S3 as front, DynamoDB over SQLite, and Actions to trigger Lambda API-Gateway when new file added
Ask questions about this repository at https://chat.collectivai.com/Usaid-Bin-Rehan/FAST_Resources_Reverse_Indexing (UNVERIFIED)
- Fork repository
- Git Clone Fast Resources Link
- Replace
self.repo_path = '/Users/jazib/Desktop/workrepo/FAST-Resources/'
inmain.py
with absolute path to Fast Resources clone - Replace
self.conn = sqlite3.connect('/Users/jazib/Desktop/workrepo/FAST_Resources_Reverse_Indexer/fast_zakhira.db')
indatabase.py
with any local dir - Push your changes and create merge request