Skip to content

Latest commit

 

History

History
18 lines (9 loc) · 757 Bytes

README.md

File metadata and controls

18 lines (9 loc) · 757 Bytes

LLMA: Large Language Model Accelerator

News

  • Outputs of LLMs often have significant overlaps with some references (e.g, retrieved documents).
  • Lossless acceleration of LLM inference by copying from references.
  • Applicable to important LLM scenarios such as retrieval-augmented generation and multi-turn conversations.
  • 2~3 times speed-up; no additional model required!

image

image