- [Paper Release] April, 2023: Inference with Reference: Lossless Acceleration of Large Language Models
- Outputs of LLMs often have significant overlaps with some references (e.g, retrieved documents).
- Lossless acceleration of LLM inference by copying from references.
- Applicable to important LLM scenarios such as retrieval-augmented generation and multi-turn conversations.
- 2~3 times speed-up; no additional model required!