Given the embedding of the search query we can efficent get the top matching k results form DB with 20M document.The objective of this project is to design and implement an indexing system for a semantic search database.
Check Final Notebook
https://github.com/ZiadSheriif/IntelliQuery/blob/main/Evaluate_ADB_Project.ipynb
Clone Repo
git clone https://github.com/ZiadSheriif/IntelliQuery.git
Install dependencies
pip install -r requirements.txt
Run Indexer
$ python ./src/evaluation.py
This is out final Approach with Some Enhancements
- Changed MiniBatchKMeans to regular KMeans
- We calculate initial centroids with just the first chunk of data
- Introduced parallel processing for different regions
It Combines both LSH & PQ
Ziad Sherif |
Zeyad Tarek |
Abdalhameed Emad |
Basma Elhoseny |
This software is licensed under MIT License, See License for more information ©Ziad Sherif.