Semantic-Search-Summarizer

Task :

To create a model which can when given a user input produces a summary of the most relevant information from the huge data corpus fed to it.

Dataset :

sus.json contains a dictionary of list of information (paragraphs) about various topics.

Approach:

Using Asymmetric Semantic Search (where the query size and data_corpus size is different), to find the similarity between given data and the query.

Semantic Search - The idea behind semantic search is to embed all entries in your corpus, which can be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embedding from your corpus is found.

Model Used - msmarco-distilbert-base-dot-prod-v3 which uses dot product to find the similarity.
Encodings and storing them - FAISS: (Facebook AI Similarity Search) is a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other.
Summarizer : Used the Hugging Face Pipeline for the summarization with its default model (sshleifer/distilbart-cnn-12-6). However dedicated summarizer can be implemented to increase the efficiency and time optimization
Finally saving the output to a ".txt" file

A sample of the produced summary is also given.

Article for reference :

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
1_bO2SPS9O1f8vhZosAhK7Xg.webp		1_bO2SPS9O1f8vhZosAhK7Xg.webp
Approach_4.ipynb		Approach_4.ipynb
README.md		README.md
Sample_summary_generated.txt		Sample_summary_generated.txt
sus.json		sus.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic-Search-Summarizer

Task :

Dataset :

Approach:

About

Releases

Packages

Languages

udbhav-44/Semantic-Search-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Semantic-Search-Summarizer

Task :

Dataset :

Approach:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages