The codebase demonstrates the fundamental application of RAG for question-answering tasks, leveraging the power of open-source Large Language Models (LLMs) from HuggingFace for interacting with PDFs (chat_with_pdf) and utilizing the OpenAI API key for a simple question-answering website (chat_with_website) and a simple question-answering video (chat_with_video) using langchain.
Retrieval-Augmented Generation Workflow
git clone https://github.com/dinhquy-nguyen-1704/chat-with-pdf-website.git
cd chat-with-pdf-website
conda create --name chat-with-pdf-website python=3.10
conda activate chat-with-pdf-website
pip install -r requirements.txt
First, change to the chat_with_pdf directory and create new folder models.
cd chat_with_pdf
mkdir models
Then, download models (LLM and embedding model) you want to use. In my source code, the default LLM is vinallama-2.7b-chat_q5_0.gguf and the default embedding model is all-MiniLM-L6-v2-f16.gguf. You should organize the folder structure as follows:
- 📁 chat-with-pdf-website
- 📁 chat_with_pdf
- 📂 data
- 📄 your_file.pdf
- 📁 models
- all-MiniLM-L6-v2-f16.gguf
- vinallama-2.7b-chat_q5_0.gguf
- 📁 vectorstores
- 🐍 config.py
- 🐍 create_vector_db.py
- 🐍 qa_bot.py
- 🐍 utils.py
- 📂 data
- 📁 chat_with_website
- 🐍 utils.py
- 🐍 app.py
- 📄 README.md
- 📄 requirements.txt
- 📁 chat_with_pdf
Delete 2 files index.faiss and index.pkl in vectorstores if you want to use your_file.pdf.
After that, run file create_vector_db.py
python create_vector_db.py
When the above command is completed, two files named index.faiss and index.pkl will appear in the vectorstores.
Now, you can use chatbot to ask questions about the information in the your_file.pdf file in the command line environment.
python qa_bot.py --question "your_question"
First, change to the chat_with_website directory
cd chat_with_website
Next, replace the OpenAI API key in the app.py file.
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
api_key = os.getenv("OPENAI_API_KEY")
Now, you can run the app.py file and a Streamlit chatbot interface will appear.
python -m streamlit run app.py
You can paste a link to any website and ask for information related to that website.
Streamlit GUI
First, change to the chat_with_video directory
cd chat_with_video
Change OpenAI API key at the first line of gradio.py
API_KEY = "sk-..."
Change youtube_url
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=tcqEUSNCn8I", add_video_info=True)
Finally, run gradio.py, a link to Gradio interface will appear.
python gradio.py
Gradio
If you have any questions or feedback, please open an issue in this repository
or send an email to nguyendinhquythptcla@gmail.com.