Chat AI which can provide responses with reference documents by Prompt engineering over vector database. It suggests related web pages provided through the integration with my previous product, Texonom.
Pursuing local, private and personal AI without requesting external API attained by optimizing inference performance with GPTQ model quantization. This project was inspired by the langchain projects like notion-qa, localGPT.
cli.mp4
chat.mp4
This project is using rye as package manager Currently only available with CUDA
rye sync
or using pip
CUDA_VERSION=cu118
TORCH_VERSION=2.0.1
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION --force
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION
pip install .
streamlit run chat.py
python main.py chat
Currently code structure is mainly focussed on Notion's csv exported data
# Put document files to ./knowledge folder
python main.py process
# Or use provided Texonom DB
git clone https://huggingface.co/datasets/texonom/md-chroma-instructor-xl db
Default model is orca 3b for now
python main quantize --source_model facebook/opt-125m --output opt-125m-4bit-gptq --push
- MPS support using dynamic model selecting
- Stateful Web App support like chat-langchain
- Langchain for Prompt Engineering
- ChromaDB for storing embeddings
- Transformers for LLM engine
- AutoGPTQ for Quantization & Inference