Skip to content

Releases: YellowRoseCx/koboldcpp-rocm

KoboldCPP-v1.71.yr0-ROCm

26 Jul 08:31
Compare
Choose a tag to compare
Update koboldcpp.py

KoboldCPP-v1.70.yr0-ROCm

15 Jul 22:13
24bc828
Compare
Choose a tag to compare

koboldcpp-1.70

mom: we have ChatGPT at home edition

image

  • Updated Kobold Lite:
    • Introducting Corpo Mode: A new beginner friendly UI theme that aims to emulate the ChatGPT look and feel closely, providing a clean, simple and minimalistic interface. It has a limited feature set compared to other UI themes, but should feel very familiar and intuitive for new users. Now available for instruct mode!
    • Settings Menu Rework: The settings menu has also been completely overhauled into 4 distinct panels, and should feel a lot less cramped now, especially on desktop.
    • Sampler Presets and Instruct Presets have been updated and modernized.
    • Added support for importing character cards from aicharactercards.com
    • Added copy for code blocks
    • Added support for dedicated System Tag and System Prompt (you are still encouraged to use the Memory feature instead)
    • Improved accessibility, keyboard tab navigation and screen reader support
  • NEW: DRY dynamic N-gram anti-repetition sampler support has been added (credits @pi6am)
  • Added --unpack, a new self-extraction feature that allows KoboldCpp binary releases to be unpacked into an empty directory. This allows easy modification and access to the files and contents embedded inside the PyInstaller. Can also be used in the GUI launcher.
  • Fix for a Vulkan regression in Q4_K_S mistral models when offloading to GPU (thanks @0cc4m).
  • Experimental support for OpenAI tools and function calling API (credits @teddybear082)
  • Added a workaround for Deepseek crashing due to unicode decoding issues.
  • --chatcompletionsadapter can now be selected on included pre-bundled templates by filename, e.g. Llama-3.json, pre-bundled templates have also been updated for correctness (thanks @xzuyn).
  • Default --contextsize is finally increased to 4096, default Chat Completions API output length is also increased.
  • Merged fixes and improvements from upstream, including multiple Gemma fixes.

To use on Windows, download and run the koboldcpp_rocm.exe OR download koboldcpp_rocm_files.zip and run python koboldcpp.py from Window Terminal or CMD (additional python pip modules might need installed, like customtkinter and tk or python-tk.

To use on Linux, clone the repo or download Source Code (tar.gz or zip) and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

Run it from the command line with the desired launch parameters (see --help), or use the GUI by launching with python koboldcpp.py ((additional python pip modules might need installed, like customtkinter and tk or python-tk).

Once loaded, you can visit the following URL or use it as the API URL for other front-ends like Silly Tavern: http://localhost:5001/

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.69.1.yr0-ROCm

04 Jul 17:11
Compare
Choose a tag to compare
Merge remote-tracking branch 'upstream/concedo'

KoboldCPP-v1.69.yr0-ROCm

02 Jul 11:02
440b8b4
Compare
Choose a tag to compare

Nice

Gemma2 support

KoboldCPP-v1.68.yr0-ROCm

22 Jun 07:27
b9e1db8
Compare
Choose a tag to compare

KoboldCPP-v1.67.yr0-ROCm

05 Jun 18:05
4aa091e
Compare
Choose a tag to compare

Ok, so there are 4 different EXE builds here, the one named "koboldcpp_rocm.exe" has been built for RX6000 and RX7000 series GPUs
The other 3 have been built for the following GPU targets: "gfx803;gfx900;gfx906;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1032;gfx1100;gfx1101;gfx1102"

The 3 of them have been built in slightly different ways as I do not yet know which offers best performance yet, but after some testing, if everything works out okay and it improves koboldcpp-rocm, I'll move it back to 1 or 2 exe files.

koboldcpp_rocm.exe: has been built using the AMD ROCm 5.7.1 provided "Tensile Libraries"/GPU code.

koboldcpp_rocm4allV1.exe: has been built using ROCm-4-All-5.7.1 Tensile Libraries and then added to the AMD ROCm folder with the other provided GPU code before compiling.

koboldcpp_rocm4allV2.exe: has been built by using the AMD ROCm 5.7.1 provided "Tensile Libraries"/GPU code for compiling but then adding only the ROCm-4-All-5.7.1 Tensile Libraries while generating the .exe.

koboldcpp_rocm4allV3.exe: has been built by deleting the entire stock AMD ROCm 5.7.1 GPU code folder and replacing it with only ROCm-4-All-5.7.1 Tensile Library files before compiling.

My gut says koboldcpp_rocm4allV3.exe will probably perform best of the 3 versions. If you have a RX6000 or RX7000 series gpu, I would compare koboldcpp_rocm.exe and koboldcpp_rocm4allV3.exe, there might be a noticeable speed difference.

koboldcpp_rocm4allV1.exe and koboldcpp_rocm4allV2.exe may change generation and processing performance, but I would stick with the original and V3 files as the first ones to try.

Sorry for the whole mess of different .EXEs but hopefully it brings improvement to KoboldCpp-ROCm for Windows!

ROCm-4-All-5.7.1 Tensile Libraries were obtained from https://github.com/brknsoul/ROCmLibs


The full Changelog for this version can be read at https://github.com/LostRuins/koboldcpp/releases/tag/v1.67
The biggest changes being the integration of Whisper.cpp into KoboldCpp and Quantized KV Cache

KoboldCPP-v1.66.1.yr1-ROCm

25 May 22:19
Compare
Choose a tag to compare

Windows-ROCm users, this build should hopefully fix any errors you were receiving the past few updates

KoboldCPP-v1.66.1.yr0-ROCm

25 May 02:41
Compare
Choose a tag to compare

https://github.com/LostRuins/koboldcpp/releases/tag/v1.66

Made FlashAttention on by default in Windows because it supposedly prevents the "access violation reading" error. Not sure if there are performance drawbacks, if so you can turn it off in the Hardware tab of the GUI

Full Changelog: v1.65.yr0-ROCm...v1.66.1.yr0-ROCm

KoboldCPP-v1.65.yr0-ROCm

16 May 18:54
Compare
Choose a tag to compare

koboldcpp-1.65

  • NEW: Added a new standalone UI for Image Generation, thanks to @ayunami2000 for porting StableUI (original by @aqualxx) to KoboldCpp! Now you have a powerful dedicated A1111 compatible GUI for generating images locally, with a similar look and feel to Automatic1111. And it runs in your browser, launching straight from KoboldCpp, simply load a Stable Diffusion model and visit http://localhost:5001/sdui/
  • Added a new API field bypass_eos to skip EOS tokens while still allowing them to be generated.
  • Hopefully fixed tk window resizing issues
  • Increased interrogate mode token amount by 30%, and increased default chat completions token amount by 250%
  • Merged improvements and fixes from upstream
  • Updated Kobold Lite:
    • Added option to insert Instruct System Prompt
    • Added option to bypass (skip) EOS
    • Added toggle to return special tokens
    • Added Chat Names insertion for instruct mode
    • Added button to launch StableUI
    • Various minor fixes, support importing cards from CharacterHub urls.

Important Deprecation Notice:

The flags --smartcontext, --hordeconfig and --sdconfig are being deprecated.

--smartcontext is no longer as useful nowadays with context shifting, and just adds clutter and confusion. With it's removal, if contextshift is enabled, smartcontext will be used as a fallback if contextshift is unavailable, such as with old models. --noshift can still be used to turn both behaviors off.

--hordeconfig and --sdconfig are being replaced, as the number of configurations for these arguments grow, the order of these positional arguments confuses people, and makes it very difficult to add new flags and toggles as well, since a misplaced new parameter breaks existing parameters. Additionally, it also prevented me from properly validating each input for data type and range.

As this is a large change, these deprecated flags will remain functional for now. However, you are strongly advised to switch over to the new replacement flags below:

Replacement Flags:

--hordemodelname  Sets your AI Horde display model name.
--hordeworkername Sets your AI Horde worker name.
--hordekey        Sets your AI Horde API key.
--hordemaxctx     Sets the maximum context length your worker will accept.
--hordegenlen     Sets the maximum number of tokens your worker will generate.

--sdmodel     Specify a stable diffusion model to enable image generation.
--sdthreads   Use a different number of threads for image generation if specified. 
--sdquant     If specified, loads the model quantized to save memory.
--sdclamped   If specified, limit generation steps and resolution settings for shared use.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.64.1.yr0-ROCm

08 May 19:44
Compare
Choose a tag to compare
Merge remote-tracking branch 'upstream/concedo'