In this repository lies the code to build a RAG-based assistant, exploiting Qdrant and Hugging Face Spaces API.
Image by Pollinations AI
The first thing to start building is to actually know what you are going to build. In this repository, we will be using climatebert's TCFD recommendations dataset, in order to create a climate financial disclosure counselor that will be able to guide us in the wonders and dangers of climate investments and safety.
To implement RAG (Retrieval-Augmented Generation), we need to first find a database provider that will host our knowledge base.
A possible solution, simple, elegant and fast, which offers up to 1GB of disk space in its free tier, is Qdrant.
So you have to:
- Register to Qdrant Cloud services
- Create your first cluster
- Retrieve the API key and the URL of the endpoint for your cluster
There are various ways to upload your data, one can be found in this Gist I created for the purpose, where I load data from the above mentioned HF dataset and, exploiting Jina AI's jina-embeddings-v2-base-en encoder, I encode them into 768-dimensional vectors, that are sent to my Qdrant cluster along with the actual Natural Language text.
The so-created database is then available for vector search, you just need to have the same encoder loaded in your script and to define some searching functions (I created a class, NeuralSearcher
, in utils.py).
Prior to building the application, consider downloading all the needed dependencies with:
python3 -m pip install requirements.txt
We build the application exploiting Gradio, a popular front-end rendering library for python and JS.
All the code can be found in app.py.
First of all, we save all our crucial but sensitive variables in a .env
file (an example can be found here).
With Gradio, we create a simple ChatBot interface, where the conversation will be displayed.
Prior to that, we define a reply
function that, taking the message from the user, feeds it first to our retriever (NeuralSearcher
, in my case), in order to get out from our knowledge base in Qdrant some valuable contextual information. After that, the retrieved context gets inserted in the prompt that will be submittend to our LLM, alongside with some instructions (optional) and the user's query.
The LLM we are exploiting is Phi-3-mini-128K, queried via HF Spaces API offered by this Space.
Everything is then smoothly rendered with a custom front-end theme, that you can find here.
Chat with my space here: