Releases: YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.56.yr0-ROCm
KoboldCPP-v1.56.yr0-ROCm
Windows build does not contain the Vulkan backend yet.
- NEW: Added early support for new Vulkan GPU backend by @0cc4m. You can try it out with the command --usevulkan (gpu id) or via the GUI launcher. Now included with the Windows and Linux prebuilt binaries.
- Updated and merged the new GGML backend rework from upstream. This update includes many extensive fixes, improvements and changes across over a hundred commits. Support for earlier non-gguf models has been preserved via a fossilized earlier version of the library. Please open an issue if you encounter problems. The Wiki and Readme have been updated too.
- Added support for setting dynatemp_exponent, previously was defaulted at 1.0. Support added over API and in Lite.
- Fixed issues with Linux CUDA on Pascal, added more flags to handle conda and colab builds correctly.
- Added support for Old CPU fallbacks (NoAVX2 and Failsafe modes) in build targets in the Linux prebuilt binary (and koboldcpp.sh)
- Added missing 48k context option, fixed clearing file selection, better abort handling support, fixed aarch64 termux builds, various other fixes.
- Updated Kobold Lite with many improvements and new features:
- NEW: Added XTTS API Server support (Local AI powered text-to-speech).
- Added option to let AI impersonate you for a turn in a chat.
- HD image generation options.
- Added popup-on-complete browser notification options.
- Improved DynaTemp wizard, added options to set exponent
- Bugfixes, padding adjustments, A1111 parameter fixes, image color fixes for invert color mode.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
(additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
(-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas
openblas
and clblast
.
For Debian: Install libclblast-dev
and libopenblas-dev
.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
KoboldCPP-v1.55.yr0-ROCm
KoboldCPP-v1.55.yr0-ROCm
- Added Dynamic Temperature (DynaTemp), which is specified by a Temperature Value and a Temperature Range (Credits: @kalomaze). When used, the actual temperature is allowed to be automatically adjusted dynamically between DynaTemp ± DynaTempRange. For example, setting temperature=0.4 and dynatemp_range=0.1 will result in a minimum temp of 0.3 and max of 0.5. For ease of use, a UI to select min and max temperature for dynatemp directly is also provided in Lite. Both inputs will work and auto update the other.
- Try to reuse cloudflared file when running remote tunnel, but also handle if cloudflared fails to download correctly.
- Added a field to show the most recently used seed in the perf endpoint
- Switched cuda pool malloc back to the old implementation
- Updated Lite, added support for DynaTemp
- Merged new improvements and fixes from upstream llama.cpp
- Various minor fixes.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
(additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
(-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas
openblas
and clblast
.
For Debian: Install libclblast-dev
and libopenblas-dev
.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
KoboldCPP-v1.54.yr0-ROCm
koboldcpp-1.54-ROCm
Merge with @LostRuins latest upstream update
welcome to 2024 edition
- Added logit_bias support (for both OpenAI and Kobold APIs. Accepts a dictionary of key-value pairs, which indicate the token IDs (int) and logit bias (float) to apply for that token. Object format is the same as and compatible with the official OpenAI implementation, though token IDs are model specific. (thanks @DebuggingLife46)
- Updated Lite, added support for custom background images (thanks @Ar57m), and added customizable settings for stepcount and cfgscale for Horde/A1111 image generation.
- Added mouseover tooltips for all labels in the GUI launcher.
- Cleaned up and simplified the UI of the quick launch tab in the GUI launcher, some advanced options moved to other tabs.
- Bug fixes for garbled output in Termux with q5k Phi
- Fixed paged memory fallback when pinned memory alloc fails while not using mmap.
- Attempt to fix on-exit segfault on some Linux systems.
- Updated KAI United class.py, added new parameters.
- Makefile fix for Linux CI build using conda (thanks @henk717)
- Merged new improvements and fixes from upstream llama.cpp (includes VMM pool support)
- Included prebuilt binary for no-cuda Linux as well.
- Various minor fixes.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
KoboldCPP-v1.53.yr0-ROCm
koboldcpp-1.53-ROCm
Merge with @LostRuins latest upstream update
- Added support for SSL. You can now import your own SSL cert to use with KoboldCpp and serve it over HTTPS with
--ssl [cert.pem] [key.pem]
or via the GUI. The .pem files must be unencrypted, you can also generate them with OpenSSL, eg.openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -config openssl.cnf -nodes
(location of openssl.cnf might differ on linux distros. try searching for it withlocate openssl.cnf
) for your own self signed certificate.- Added support for presence penalty (alternative rep pen) over the KAI API and in Lite. If Presence Penalty is set over the OpenAI API, and rep_pen is not set, then rep_pen will be set to a default of 1.0 instead of 1.1. Both penalties can be used together, although this is probably not a good idea.
- Added fixes for Broken Pipe error, thanks @mahou-shoujo.
- Added fixes for aborting ongoing connections while streaming in SillyTavern.
- Merged upstream support for Phi models and speedups for Mixtral
- The default non-blas batch size for GGUF models is now increased from 8 to 32.
- Merged HIPBlas fixes from @YellowRoseCx
- Fixed an issue with building convert tools in 1.52
To use, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller.
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001/For more information, be sure to run the program from command line with the --help flag.
KoboldCPP-v1.52.2.yr1-ROCm
-
Add
--checkforupdates
argument
If enabled, the argument--checkforupdates
will fetch the KoboldCpp-ROCm release page(via Github API) one time on start up via HTTPS and compare the latest version number with the current version number and notify the user if a new version is available.
A GUI button is shown on the Network tab. Disabled by default. -
hipBLAS autopicking and hipBLAS .kcpps bug fixes
Fixed a mistake preventing hipBLAS from being autopicked on startup
Fixed a bug that occurred when importing a .kcpps file with the backend "Use hipBLAS (ROCm)" and it not selecting "Use hipBLAS (ROCm)".
KoboldCPP-v1.52.2.yr0-ROCm
https://github.com/LostRuins/koboldcpp/releases/tag/v1.52.2
- NEW: Added a new bare-bones KoboldCpp NoScript WebUI, which does not require Javascript to work. It should be W3C HTML compliant and should run on every browser in the last 20 years, even text-based ones like Lynx (e.g. in the terminal over SSH). It is accessible by default at /noscript e.g. http://localhost:5001/noscript . This can be helpful when running KoboldCpp from systems which do not support a modern browser with Javascript.
- Partial per-layer KV offloading is now merged for CUDA. Important: this means that the number of layers you can offload to GPU might be reduced, as each layer now takes up more space. To avoid per-layer KV offloading, use the --usecublas lowvram option (equivalent to -nkvo in llama.cpp). Fully offloaded models should behave the same as before.
- The /api/extra/tokencount endpoint now also returns an array of token ids in the response body from the tokenizer.
- Merged support for QWEN and Mixtral from upstream. Note: Mixtral seems to perform large batch prompt processing extremely slowly. This is probably an implementation issue. For now, you might have better luck using --noblas or setting --blasbatchsize -1 when using Mixtral
- Selecting a .kcpps in the GUI when choosing a model will load the model specified inside that config file instead.
- Added the Mamba Multitool script (from @henk717). This is a shell script that can be used in Linux to setup an environment with all dependencies required for building and running KoboldCpp on Linux.
- Improved KCPP Embedded Horde Worker fault tolerance, should now gracefully backoff for increasing durations whenever encountering errors polling from AI Horde, and will automatically recover from up to 24 hours of Horde downtime.
- Added a new parameter that shows number of Horde Worker errors in the /api/extra/perf endpoint, this can be used to monitor your embedded horde worker if it goes down.
- Pulled other fixes and improvements from upstream, updated Kobold Lite, added asynchronous file autosaves (thanks @aleksusklim), various other improvements.
Hotfix 1.52.1: Fixed 'not enough memory' loading errors for large (20B+) models. See #563
NEW: Added Linux PyInstaller binaries
Hotfix 1.52.2: Merged fixes for Mixtral prompt processingTo use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork hereRun it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001/
For more information, be sure to run the program from command line with the --help flag.
Windows KoboldCPP-ROCm v1.43 .exe
Windows Compiled KoboldCPP with ROCm support!
I want to thank @LostRuins for making KoboldCPP and general guidance, @henk717 for all his dedication to KoboldAI that brought us here in the first place, and to @SlyEcho who originally started the ROCm Port for llama.cpp
You need ROCm to build it, but not to run it: https://rocm.docs.amd.com/en/latest/deploy/windows/quick_start.html
Compiled for the GPU's that have Tensile Libraries/ marked as supported: gfx906, gfx1030, gfx1100, gfx1101, gfx1102
To run, open it; or start via command-line
Example:
./koboldcpp_rocm.exe --usecublas normal mmq --threads 1 --stream --contextsize 4096 --usemirostat 2 6 0.1 --gpulayers 45 C:\Users\YellowRose\llama-2-7b-chat.Q8_0.gguf
This site may be useful, it has some patches for Windows ROCm to help it with compilation that I used, but I'm not sure if it's necessary. https://streamhpc.com/blog/2023-08-01/how-to-get-full-cmake-support-for-amd-hip-sdk-on-windows-including-patches/
Build command used (ROCm Required):
cd koboldcpp-rocm
mkdir build && cd build
cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang++.exe" -DAMDGPU_TARGETS="gfx906;gfx1030;gfx1100;gfx1101;gfx1102"
cmake --build . -j 6
That puts koboldcpp_cublas.dll inside of .\koboldcpp-rocm\build\bin
copy koboldcpp_cublas.dll to the main koboldcpp-rocm folder
(You can run koboldcpp.py like this right away)
To make it into an exe, we use make_pyinst_rocm_hybrid_henk_yellow.bat
But that file's set up to add CLBlast and OpenBlas too, you can either remove those lines so it's just this code:
cd /d "%~dp0"
copy "C:\Program Files\AMD\ROCm\5.5\bin\hipblas.dll" .\ /Y
copy "C:\Program Files\AMD\ROCm\5.5\bin\rocblas.dll" .\ /Y
xcopy /E /I "C:\Program Files\AMD\ROCm\5.5\bin\rocblas" .\rocblas\
PyInstaller --noconfirm --onefile --collect-all customtkinter --clean --console --icon ".\niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp_cublas.dll;." --add-data "./hipblas.dll;." --add-data "./rocblas.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rocblas;." --add-data "C:/Windows/System32/msvcp140.dll;." --add-data "C:/Windows/System32/vcruntime140_1.dll;." "./koboldcpp.py" -n "koboldcppRocm.exe"
or you can download w64devkit and cd into the folder and run make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
then it will build the rest of the backend files
Once they're all built, you should be able to just run make_pyinst_rocm_hybrid_henk_yellow.bat
as it is and it'll bundle the files together into koboldcppRocm.exe in the \koboldcpp-rocm\dists folder
KoboldCPP-v1.52.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.52.yr0-ROCm
Various new features including new model Mixtral support
Mixtral-Kcpp-v1.52.RC1.yr1-ROCm FanService Ed.
Unofficial release candidate build containing experimental features and Mixtral Model support