From c6654067dc5736b625f230e2effefe2fde1d98a9 Mon Sep 17 00:00:00 2001 From: Snehil Shah Date: Tue, 9 Jan 2024 05:36:08 +0530 Subject: [PATCH] Update README Signed-off-by: Snehil Shah --- README.md | 48 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 2bfc657..38e6e78 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ --- title: Multimodal Image Search Engine -emoji: 🚀 -colorFrom: indigo -colorTo: indigo +emoji: 🔍 +colorFrom: yellow +colorTo: yellow sdk: gradio sdk_version: 4.13.0 app_file: app.py @@ -10,4 +10,44 @@ pinned: false license: mit --- -Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference +

+

Multi-Modal Image Search Engine

+

+ A Semantic Search Engine that understands the Content & Context of your Queries. +
+ Use Multi-Modal inputs like Text-Image or a Reverse Image Search to search a Vector Database of over 15k Images. Try it Out! +

+ +

+

+ +

• About The Project

+ +At its core, the Search Engine is built upon the concept of **Vector Similarity Search**. +All the Images are encoded into vector embeddings based on their semantic meaning using a Transformer Model, which are then stored in a vector space. +When searched with a query, it returns the nearest neighbors to the input query which are the relevant search results. + +

+ +We use the Contrastive Language-Image Pre-Training (CLIP) Model by OpenAI which is a Pre-trained Multi-Modal Vision Transformer that can semantically encode Words, Sentences & Images into a 512 Dimensional Vector. This Vector encapsulates the meaning & context of the entity into a *Mathematically Measurable* format. + +

+

2-D Visualization of 500 Images in a 512-D Vector Space

+ +The Images are stored as vector embeddings in a Qdrant Collection which is a Vector Database. The Search Term is encoded and run as a query to Qdrant, which returns the Nearest Neighbors based on their Cosine-Similarity to the Search Query. + +

+ +**The Dataset**: All images are sourced from the [Open Images Dataset](https://github.com/cvdfoundation/open-images-dataset) by Common Visual Data Foundation. + +

• Technologies Used

+ +- Python +- Jupyter Notebooks +- Qdrant - Vector Database +- Sentence-Transformers - Library +- CLIP by OpenAI - ViT Model +- Gradio - UI +- HuggingFace Spaces - Deployment + +