Releases · YellowRoseCx/koboldcpp-rocm

28 Jan 05:14

v1.56.yr0-ROCm

e0a3aa3

KoboldCPP-v1.56.yr0-ROCm

Windows build does not contain the Vulkan backend yet.

NEW: Added early support for new Vulkan GPU backend by @0cc4m. You can try it out with the command --usevulkan (gpu id) or via the GUI launcher. Now included with the Windows and Linux prebuilt binaries.

Updated and merged the new GGML backend rework from upstream. This update includes many extensive fixes, improvements and changes across over a hundred commits. Support for earlier non-gguf models has been preserved via a fossilized earlier version of the library. Please open an issue if you encounter problems. The Wiki and Readme have been updated too.

Added support for setting dynatemp_exponent, previously was defaulted at 1.0. Support added over API and in Lite.

Fixed issues with Linux CUDA on Pascal, added more flags to handle conda and colab builds correctly.

Added support for Old CPU fallbacks (NoAVX2 and Failsafe modes) in build targets in the Linux prebuilt binary (and koboldcpp.sh)

Added missing 48k context option, fixed clearing file selection, better abort handling support, fixed aarch64 termux builds, various other fixes.

Updated Kobold Lite with many improvements and new features:

NEW: Added XTTS API Server support (Local AI powered text-to-speech).

Added option to let AI impersonate you for a turn in a chat.

HD image generation options.

Added popup-on-complete browser notification options.

Improved DynaTemp wizard, added options to set exponent

Bugfixes, padding adjustments, A1111 parameter fixes, image color fixes for invert color mode.

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Contributors

0cc4m

Assets 4

11 Jan 00:36

github-actions

v1.55.yr0-ROCm

cdb2b73

KoboldCPP-v1.55.yr0-ROCm

Added Dynamic Temperature (DynaTemp), which is specified by a Temperature Value and a Temperature Range (Credits: @kalomaze). When used, the actual temperature is allowed to be automatically adjusted dynamically between DynaTemp ± DynaTempRange. For example, setting temperature=0.4 and dynatemp_range=0.1 will result in a minimum temp of 0.3 and max of 0.5. For ease of use, a UI to select min and max temperature for dynatemp directly is also provided in Lite. Both inputs will work and auto update the other.

Try to reuse cloudflared file when running remote tunnel, but also handle if cloudflared fails to download correctly.

Added a field to show the most recently used seed in the perf endpoint

Switched cuda pool malloc back to the old implementation

Updated Lite, added support for DynaTemp

Merged new improvements and fixes from upstream llama.cpp

Various minor fixes.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

Contributors

kalomaze

Assets 4

02 Jan 03:59

github-actions

v1.54.yr0-ROCm

2e41f66

KoboldCPP-v1.54.yr0-ROCm

koboldcpp-1.54-ROCm

Merge with @LostRuins latest upstream update

welcome to 2024 edition

Added logit_bias support (for both OpenAI and Kobold APIs. Accepts a dictionary of key-value pairs, which indicate the token IDs (int) and logit bias (float) to apply for that token. Object format is the same as and compatible with the official OpenAI implementation, though token IDs are model specific. (thanks @DebuggingLife46)

Updated Lite, added support for custom background images (thanks @Ar57m), and added customizable settings for stepcount and cfgscale for Horde/A1111 image generation.

Added mouseover tooltips for all labels in the GUI launcher.

Cleaned up and simplified the UI of the quick launch tab in the GUI launcher, some advanced options moved to other tabs.

Bug fixes for garbled output in Termux with q5k Phi

Fixed paged memory fallback when pinned memory alloc fails while not using mmap.

Attempt to fix on-exit segfault on some Linux systems.

Updated KAI United class.py, added new parameters.

Makefile fix for Linux CI build using conda (thanks @henk717)

Merged new improvements and fixes from upstream llama.cpp (includes VMM pool support)

Included prebuilt binary for no-cuda Linux as well.

Various minor fixes.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Contributors

henk717, LostRuins, and Ar57m

Assets 4

23 Dec 09:35

github-actions

v1.53.yr0-ROCm

b85d59e

KoboldCPP-v1.53.yr0-ROCm

koboldcpp-1.53-ROCm

Merge with @LostRuins latest upstream update

Added support for SSL. You can now import your own SSL cert to use with KoboldCpp and serve it over HTTPS with --ssl [cert.pem] [key.pem] or via the GUI. The .pem files must be unencrypted, you can also generate them with OpenSSL, eg. openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -config openssl.cnf -nodes (location of openssl.cnf might differ on linux distros. try searching for it with locate openssl.cnf) for your own self signed certificate.

Added support for presence penalty (alternative rep pen) over the KAI API and in Lite. If Presence Penalty is set over the OpenAI API, and rep_pen is not set, then rep_pen will be set to a default of 1.0 instead of 1.1. Both penalties can be used together, although this is probably not a good idea.

Added fixes for Broken Pipe error, thanks @mahou-shoujo.

Added fixes for aborting ongoing connections while streaming in SillyTavern.

Merged upstream support for Phi models and speedups for Mixtral

The default non-blas batch size for GGUF models is now increased from 8 to 32.

Merged HIPBlas fixes from @YellowRoseCx

Fixed an issue with building convert tools in 1.52

To use, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller.
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001/

For more information, be sure to run the program from command line with the --help flag.

Contributors

mahou-shoujo, LostRuins, and YellowRoseCx

Assets 3

19 Dec 08:02

github-actions

v1.52.2.yr1-ROCm

031c60b

KoboldCPP-v1.52.2.yr1-ROCm

Add --checkforupdates argument
If enabled, the argument --checkforupdates will fetch the KoboldCpp-ROCm release page(via Github API) one time on start up via HTTPS and compare the latest version number with the current version number and notify the user if a new version is available.
A GUI button is shown on the Network tab. Disabled by default.
hipBLAS autopicking and hipBLAS .kcpps bug fixes
Fixed a mistake preventing hipBLAS from being autopicked on startup
Fixed a bug that occurred when importing a .kcpps file with the backend "Use hipBLAS (ROCm)" and it not selecting "Use hipBLAS (ROCm)".

Assets 3

18 Dec 01:37

github-actions

v1.52.2.yr0-ROCm

509ad00

KoboldCPP-v1.52.2.yr0-ROCm

https://github.com/LostRuins/koboldcpp/releases/tag/v1.52.2

NEW: Added a new bare-bones KoboldCpp NoScript WebUI, which does not require Javascript to work. It should be W3C HTML compliant and should run on every browser in the last 20 years, even text-based ones like Lynx (e.g. in the terminal over SSH). It is accessible by default at /noscript e.g. http://localhost:5001/noscript . This can be helpful when running KoboldCpp from systems which do not support a modern browser with Javascript.

Partial per-layer KV offloading is now merged for CUDA. Important: this means that the number of layers you can offload to GPU might be reduced, as each layer now takes up more space. To avoid per-layer KV offloading, use the --usecublas lowvram option (equivalent to -nkvo in llama.cpp). Fully offloaded models should behave the same as before.

The /api/extra/tokencount endpoint now also returns an array of token ids in the response body from the tokenizer.

Merged support for QWEN and Mixtral from upstream. Note: Mixtral seems to perform large batch prompt processing extremely slowly. This is probably an implementation issue. For now, you might have better luck using --noblas or setting --blasbatchsize -1 when using Mixtral

Selecting a .kcpps in the GUI when choosing a model will load the model specified inside that config file instead.

Added the Mamba Multitool script (from @henk717). This is a shell script that can be used in Linux to setup an environment with all dependencies required for building and running KoboldCpp on Linux.

Improved KCPP Embedded Horde Worker fault tolerance, should now gracefully backoff for increasing durations whenever encountering errors polling from AI Horde, and will automatically recover from up to 24 hours of Horde downtime.

Added a new parameter that shows number of Horde Worker errors in the /api/extra/perf endpoint, this can be used to monitor your embedded horde worker if it goes down.

Pulled other fixes and improvements from upstream, updated Kobold Lite, added asynchronous file autosaves (thanks @aleksusklim), various other improvements.

Hotfix 1.52.1: Fixed 'not enough memory' loading errors for large (20B+) models. See #563
NEW: Added Linux PyInstaller binaries
Hotfix 1.52.2: Merged fixes for Mixtral prompt processing

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001/
For more information, be sure to run the program from command line with the --help flag.

Assets 3

13 Sep 08:06

YellowRoseCx

v1.43-ROCm

509ad00

Windows KoboldCPP-ROCm v1.43 .exe

Windows Compiled KoboldCPP with ROCm support!

I want to thank @LostRuins for making KoboldCPP and general guidance, @henk717 for all his dedication to KoboldAI that brought us here in the first place, and to @SlyEcho who originally started the ROCm Port for llama.cpp

You need ROCm to build it, but not to run it: https://rocm.docs.amd.com/en/latest/deploy/windows/quick_start.html

Compiled for the GPU's that have Tensile Libraries/ marked as supported: gfx906, gfx1030, gfx1100, gfx1101, gfx1102

To run, open it; or start via command-line
Example:
./koboldcpp_rocm.exe --usecublas normal mmq --threads 1 --stream --contextsize 4096 --usemirostat 2 6 0.1 --gpulayers 45 C:\Users\YellowRose\llama-2-7b-chat.Q8_0.gguf

This site may be useful, it has some patches for Windows ROCm to help it with compilation that I used, but I'm not sure if it's necessary. https://streamhpc.com/blog/2023-08-01/how-to-get-full-cmake-support-for-amd-hip-sdk-on-windows-including-patches/

Build command used (ROCm Required):

cd koboldcpp-rocm
mkdir build && cd build
cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang++.exe" -DAMDGPU_TARGETS="gfx906;gfx1030;gfx1100;gfx1101;gfx1102"
cmake --build . -j 6

That puts koboldcpp_cublas.dll inside of .\koboldcpp-rocm\build\bin
copy koboldcpp_cublas.dll to the main koboldcpp-rocm folder
(You can run koboldcpp.py like this right away)

To make it into an exe, we use make_pyinst_rocm_hybrid_henk_yellow.bat
But that file's set up to add CLBlast and OpenBlas too, you can either remove those lines so it's just this code:

cd /d "%~dp0"
copy "C:\Program Files\AMD\ROCm\5.5\bin\hipblas.dll" .\ /Y
copy "C:\Program Files\AMD\ROCm\5.5\bin\rocblas.dll" .\ /Y
xcopy /E /I "C:\Program Files\AMD\ROCm\5.5\bin\rocblas" .\rocblas\
 
PyInstaller --noconfirm --onefile --collect-all customtkinter --clean --console --icon ".\niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp_cublas.dll;." --add-data "./hipblas.dll;." --add-data "./rocblas.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rocblas;." --add-data "C:/Windows/System32/msvcp140.dll;." --add-data "C:/Windows/System32/vcruntime140_1.dll;." "./koboldcpp.py" -n "koboldcppRocm.exe"

or you can download w64devkit and cd into the folder and run make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4 then it will build the rest of the backend files

Once they're all built, you should be able to just run make_pyinst_rocm_hybrid_henk_yellow.bat as it is and it'll bundle the files together into koboldcppRocm.exe in the \koboldcpp-rocm\dists folder

Contributors

SlyEcho, henk717, and LostRuins

Assets 3

15 Dec 04:47

github-actions

v1.52.1.yr0-ROCm

eee005e

KoboldCPP-v1.52.1.yr0-ROCm

Merge remote-tracking branch 'upstream/concedo'

Assets 3

13 Dec 21:25

github-actions

v1.52.yr0-ROCm

a2dcd33

KoboldCPP-v1.52.yr0-ROCm

Various new features including new model Mixtral support

Assets 3

12 Dec 20:45

github-actions

v1.52.RC1.yr1-ROCm-Mixtral

4984d0b

Mixtral-Kcpp-v1.52.RC1.yr1-ROCm FanService Ed. Pre-release

Pre-release

Unofficial release candidate build containing experimental features and Mixtral Model support

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KoboldCPP-v1.56.yr0-ROCm

Contributors

KoboldCPP-v1.55.yr0-ROCm

Contributors

koboldcpp-1.54-ROCm

Contributors

koboldcpp-1.53-ROCm

Contributors

Windows Compiled KoboldCPP with ROCm support!

Contributors

Releases: YellowRoseCx/koboldcpp-rocm

KoboldCPP-v1.56.yr0-ROCm

KoboldCPP-v1.56.yr0-ROCm

Contributors

KoboldCPP-v1.55.yr0-ROCm

KoboldCPP-v1.55.yr0-ROCm

Contributors

KoboldCPP-v1.54.yr0-ROCm

koboldcpp-1.54-ROCm

Contributors

KoboldCPP-v1.53.yr0-ROCm

koboldcpp-1.53-ROCm

Contributors

KoboldCPP-v1.52.2.yr1-ROCm

KoboldCPP-v1.52.2.yr0-ROCm

Windows KoboldCPP-ROCm v1.43 .exe

Windows Compiled KoboldCPP with ROCm support!

Contributors

KoboldCPP-v1.52.1.yr0-ROCm

KoboldCPP-v1.52.yr0-ROCm

Mixtral-Kcpp-v1.52.RC1.yr1-ROCm FanService Ed.