For the latest update and integration, please check out the LMCache project!
This is the code repo for CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming (SIGCOMM'24). The code structure is organized as follows:
LMCache
: The modules for KV cache encoding / decoding with CacheGen's customized codectest_data
: The example testing cases for CacheGen.src
: Some helper functions used by CacheGen (e.g., transforming tensor to tuple, transforming tuple to tensor etc.)
To install the required python packages to run CacheGen with conda
conda env create -f env.yaml
conda activate cachegen
pip install -e LMCache
cd LMCache/third_party/torchac_cuda
python setup.py install
Please refer to the page sigcomm_ae.md for running examples for CacheGen.
Yuhan Liu (yuhanl@uchicago.edu)