CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming

For the latest update and integration, please check out the LMCache project!

LMCache: The modules for KV cache encoding / decoding with CacheGen's customized codec
test_data: The example testing cases for CacheGen.
src: Some helper functions used by CacheGen (e.g., transforming tensor to tuple, transforming tuple to tensor etc.)

Installation

To install the required python packages to run CacheGen with conda

conda env create -f env.yaml
conda activate cachegen
pip install -e LMCache
cd LMCache/third_party/torchac_cuda 
python setup.py install

Please refer to the page sigcomm_ae.md for running examples for CacheGen.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
LMCache		LMCache
scripts		scripts
src		src
test_data		test_data
README.md		README.md
cachegen.py		cachegen.py
env.yaml		env.yaml
eval_longchat.py		eval_longchat.py
main.py		main.py
run_adaptation.py		run_adaptation.py
run_cachegen.py		run_cachegen.py
run_quantization_baseline.py		run_quantization_baseline.py
run_vanilla.py		run_vanilla.py
sigcomm_ae.md		sigcomm_ae.md
trace_generator.py		trace_generator.py