Skip to content

Releases: ngxson/wllama

1.16.2

23 Sep 16:15
d9b849e
Compare
Choose a tag to compare

What's Changed

  • decode/encode : do not fail on empty batch by @ngxson in #118
  • Update to latest llama.cpp source code by @ngxson in #119

Full Changelog: 1.16.1...1.16.2

1.16.1

06 Sep 14:29
7beefeb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.16.0...1.16.1

1.16.0

19 Aug 10:04
Compare
Choose a tag to compare

SmolLM-360m is added as a model in main example. Try it now --> https://huggingface.co/spaces/ngxson/wllama

Special thanks to @huggingface team for providing a such powerful model in a very small size!

Screenshot 2024-08-19 at 11 35 22

What's Changed

  • ability to use custom cacheManager by @ngxson in #109

Full Changelog: 1.15.0...1.16.0

1.15.0

03 Aug 20:34
667dd91
Compare
Choose a tag to compare

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

  • Download model via downloadModel()
  • List all downloaded models using CacheManager.list()
  • Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

  • On the first call, you have 2 messages: user: hello, assistant: hi
  • On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Full Changelog: 1.14.2...1.15.0

1.14.2

28 Jul 11:39
d15748b
Compare
Choose a tag to compare

Update to latest upstream llama.cpp source code:

  • Fix support for llama-3.1, phi 3 and SmolLM

Full Changelog: 1.14.0...1.14.2

1.14.0

10 Jul 11:51
94ebb81
Compare
Choose a tag to compare

What's Changed

  • save ETag metadata, add allowOffline option in #90
  • Added experimental support for encoder-decoder architecture #91

Full Changelog: 1.13.0...1.14.0

1.13.0

03 Jul 15:13
44a4de5
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.12.1...1.13.0

1.12.1

27 Jun 20:49
b847495
Compare
Choose a tag to compare

What's Changed

  • Sync with latest upstream source code + adapt to project structure change by @ngxson in #77

Full Changelog: 1.12.0...1.12.1

1.12.0

24 Jun 15:29
896c160
Compare
Choose a tag to compare

Important

In prior versions, if you initialize wllama with embeddings: true, you will still able to generate completions.

From v1.12.0, if you start wllama with embeddings: true, this will throws an error when you try to use createCompletion. You must add wllama.setOptions({ embeddings: false }) to turn of embeddings.

More details: This feature is introduced in ggerganov/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.

What's Changed

Full Changelog: 1.11.0...1.12.0

1.11.0

11 Jun 18:47
a5e919b
Compare
Choose a tag to compare

What's Changed

  • Internally generate the model URL array when the provided URL for loadModelFromUrl method is from a single shard of a model split with the gguf-split tool by @felladrin in #61
  • Allow loading a model using relative path by @felladrin in #64
  • Git ignore also .DS_Store which are created by MacOS Finder by @felladrin in #65
  • v1.11.0 by @ngxson in #68

Full Changelog: 1.10.0...1.11.0