diff --git a/README.md b/README.md index 2bfc657..02b7711 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ --- title: Multimodal Image Search Engine -emoji: 🚀 -colorFrom: indigo -colorTo: indigo +emoji: 🔍 +colorFrom: yellow +colorTo: yellow sdk: gradio sdk_version: 4.13.0 app_file: app.py @@ -10,4 +10,43 @@ pinned: false license: mit --- -Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference +

+

Multi-Modal Image Search Engine

+

+ A Semantic Search Engine that understands the Content & Context of your Queries. +
+ Use Multi-Modal inputs like Text-Image or a Reverse Image Search to search a Vector Database of over 15k Images. Try it Out! +

+ +

+

+ +

• About The Project

+ +At its core, the Search Engine is built upon the concept of **Vector Similarity Search**. +All the Images are encoded into vector embeddings based on their semantic meaning using a Transformer Model, which are then stored in a vector space. +When searched with a query, it returns the nearest neighbors to the input query which are the relevant search results. + +

+ +We use the Contrastive Language-Image Pre-Training (CLIP) Model by OpenAI which is a Pre-trained Multi-Modal Vision Transformer that can semantically encode Words, Sentences & Images into a 512 Dimensional Vector. This Vector encapsulates the meaning & context of the entity into a *Mathematically Measurable* format. + +

+

2-D Visualization of 500 Images in a 512-D Vector Space

+ +The Images are stored as vector embeddings in a Qdrant Collection which is a Vector Database. The Search Term is encoded and run as a query to Qdrant, which returns the Nearest Neighbors based on their Cosine-Similarity to the Search Query. + +

+ + +

• Technologies Used

+ +- Python +- Jupyter Notebooks +- Qdrant - Vector Database +- Sentence-Transformers - Library +- CLIP by OpenAI - ViT Model +- Gradio - UI +- HuggingFace Spaces - Deployment + + diff --git a/assets/Visualization.png b/assets/Visualization.png index 087ec00..0592891 100644 Binary files a/assets/Visualization.png and b/assets/Visualization.png differ diff --git a/assets/demo.gif b/assets/demo.gif new file mode 100644 index 0000000..63635aa Binary files /dev/null and b/assets/demo.gif differ diff --git a/assets/encoding_flow.png b/assets/encoding_flow.png new file mode 100644 index 0000000..069457e Binary files /dev/null and b/assets/encoding_flow.png differ diff --git a/assets/retrieval_flow.png b/assets/retrieval_flow.png new file mode 100644 index 0000000..ff10e34 Binary files /dev/null and b/assets/retrieval_flow.png differ