Release 1.15.0 · ngxson/wllama

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

Download model via downloadModel()
List all downloaded models using CacheManager.list()
Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

On the first call, you have 2 messages: user: hello, assistant: hi
On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Add downloadModel function by @ngxson in #95
fix log print and downloadModel by @ngxson in #100
Add main example (chat UI) by @ngxson in #99
Improve main UI example by @ngxson in #102
implement KV cache reuse by @ngxson in #103

Full Changelog: 1.14.2...1.15.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.15.0

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors