1.15.0
New features
downloadModel()
Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:
- Download model via
downloadModel()
- List all downloaded models using
CacheManager.list()
- Delete a downloaded model using
CacheManager.delete()
KV cache reuse in createCompletion
When calling createCompletion
, you can pass useCache: true
as an option. It will reuse the KV cache from the last createCompletion
call. It is equivalent to cache_prompt
option on llama.cpp server.
wllama.createCompletion(input, {
useCache: true,
...
});
For example:
- On the first call, you have 2 messages:
user: hello
,assistant: hi
- On the second call, you add one message:
user: hello
,assistant: hi
,user: who are you?
Then, only the added message user: who are you?
will need to be evaluated.
What's Changed
- Add
downloadModel
function by @ngxson in #95 - fix log print and
downloadModel
by @ngxson in #100 - Add
main
example (chat UI) by @ngxson in #99 - Improve main UI example by @ngxson in #102
- implement KV cache reuse by @ngxson in #103
Full Changelog: 1.14.2...1.15.0