Skip to content

Releases: YellowRoseCx/koboldcpp-rocm

KoboldCPP-v1.64.yr0-ROCm

02 May 02:37
Compare
Choose a tag to compare
Merge remote-tracking branch 'upstream/concedo'

KoboldCPP-v1.63.yr1-ROCm

24 Apr 02:19
f123ad3
Compare
Choose a tag to compare

TURN MMQ OFF

There was some big changes upstream, that's why it's taken a while to update kcpp-rocm, trying to get it to work.

This seems to work with MMQ DISABLED but I've also had reports of LLAMA3 not working with this version, but llama 3 8b instruct DID work for me

KoboldCPP-v1.61.2.yr1-ROCm

20 Mar 17:25
9c1707d
Compare
Choose a tag to compare
Pre-release
set pyinstaller version to 6.4.0

KoboldCPP-v1.61.2.yr0-ROCm

15 Mar 12:56
Compare
Choose a tag to compare

Release notes coming soon

KoboldCPP-v1.60.1.yr0-ROCm

06 Mar 22:39
Compare
Choose a tag to compare

Upstream Changelog:

KoboldCpp is just a 'Dirty Fork' edition 😩

image

  • KoboldCpp now natively supports Local Image Generation, thanks to the phenomenal work done by @leejet in stable-diffusion.cpp! It provides an A1111 compatible txt2img endpoint which you can use within the embedded Kobold Lite, or in many other compatible frontends such as SillyTavern.
    • Just select a compatible SD1.5 or SDXL .safetensors fp16 model to load, either through the GUI launcher or with --sdconfig
    • Enjoy zero install, portable, lightweight and hassle free image generation directly from KoboldCpp, without installing multi-GBs worth of ComfyUi, A1111, Fooocus or others.
    • With just 8GB VRAM GPU, you can run both a 7B q4 GGUF (lowvram) alongside any SD1.5 image model at the same time, as a single instance, fully offloaded. If you run out of VRAM, select Compress Weights (quant) to quantize the image model to take less memory.
    • KoboldCpp allows you to run in text-gen-only, image-gen-only or hybrid modes, simply set the appropriate launcher configs.
    • Known to not work correctly in Vulkan (for now).
  • When running from command line, --contextsize can now be set to any arbitrary number in range instead of locked to fixed values. However, using a non-recommended value may result in incoherent output depending on your settings. The GUI launcher for this remains unchanged.
  • Added new quant types, pulled and merged improvements and fixes from upstream.
  • Fixed some issues loading older GGUFv1 models, they should be working again.
  • Added cloudflare tunnel support for macOS, (via --remotetunnel, however it probably won't work on M1, only amd64).
  • Updated API docs and Colab for image gen.
  • Updated Kobold Lite:
    • Integrated support for AllTalk TTS
    • Added "Auto Jailbreak" for instruct mode, useful to wrangle stubborn or censored models.
    • Auto enable image gen button if KCPP loads image model
    • Improved Autoscroll and layout, defaults to SSE streaming mode
    • Added option to import and export story via clipboard
    • Added option to set personal notes/comments in story
  • Update v1.60.1: Port fix for CVE-2024-21836 for GGUFv1, enabled LCM sampler, allowed loading gguf SD models, fix SD for metal.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.59.1.yr1-ROCm

26 Feb 08:15
Compare
Choose a tag to compare

Upstream Changelog:

  • Added --nocertify mode which allows you to disable SSL certificate checking on your embedded Horde worker. This can help bypass some SSL certificate errors.
  • Fixed pre-gguf models loading with incorrect thread counts. This issue affected the past 2 versions.
  • Added build target for Old CPU (NoAVX2) Vulkan support.
  • Fixed cloudflare remotetunnel URLs not displaying on runpod.
  • Reverted CLBlast back to 1.6.0, pending CNugteren/CLBlast#533 and other correctness fixes.
  • Smartcontext toggle is now hidden when contextshift toggle is on.
  • Various improvements and bugfixes merged from upstream, which includes google gemma support.
  • Bugfixes and updates for Kobold Lite
  • Changed makefile build flags, fix for tooltips, merged IQ3_S support

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.59.yr1-ROCm

25 Feb 21:27
Compare
Choose a tag to compare
update build file to add vulkan

KoboldCPP-v1.58.yr0-ROCm

18 Feb 05:53
Compare
Choose a tag to compare

KoboldCpp-1.58.yr0-ROCm

Upstream Changelog:

  • Added a toggle for row split mode with cuda multigpu. Split mode changed to layer split by default. If using command line, add rowsplit to --usecublas to enable row split mode. With the GUI launcher, it's a checkbox toggle.
  • Multiple bugfixes: fixed benchmark command, fixed SSL streaming issues, fixed some SSE formatting with OAI endpoints.
  • Make context shifting more forgiving when determining eligibility.
  • Upgraded CLBlast to latest version, should result in a modest prompt processing speedup when using CL.
  • Various improvements and bugfixes merged from upstream.
  • Updated Kobold Lite with many improvements and new features:
    • New: Integrated 'AI Vision' for images, this uses AI Horde or a local A1111 endpoint to perform image interrogation, allowing the AI to recognize and interpret uploaded or generated images. This should provide an option for multimodality similar to llava, although not as precise. Click on any image and you can enable it within Lite. This functionality is not provided by KCPP itself.
    • New: Importing characters from Pygmalion.Chat is now supported in Lite, select it from scenarios.
    • Added option to run Lite in background. It plays a dynamically generated silent audio sound. This should prevent the browser tab from hibernating.
    • Fixed printable view, persist streaming text on error, fixed instruct timestamps
    • Added "Auto" option for idle responses.
    • Allow importing images into story from local disk
    • Multiple minor formatting and bug fixes.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.57.1.yr1-ROCm

11 Feb 03:57
ae6ece1
Compare
Choose a tag to compare

KoboldCpp-1.57.1.yr1-ROCm

Windows build does not contain the Vulkan backend yet.

  • Experimental ROCm Support for Windows was added for the following GPUs thanks to @harish0201 and @jasyuiop:
Desktop GPUs Laptop GPUs
AMD Radeon PRO W6600 AMD Radeon PRO W6600M
AMD Radeon PRO W6600X AMD Radeon PRO W6600X
AMD Radeon RX 6600 AMD Radeon RX 6600S
AMD Radeon RX 6600 XT AMD Radeon RX 6700S
AMD Radeon RX 6650 XT AMD Radeon RX 6800S
AMD Radeon RX 6700 AMD Radeon RX 6650M
AMD Radeon RX 6700 XT AMD Radeon RX 6650M XT
AMD Radeon RX 6750 XT AMD Radeon RX 6700M
AMD Radeon RX 6750 GRE 10 GB AMD Radeon RX 6800M
AMD Radeon RX 6750 GRE 12 GB AMD Radeon RX 6850M XT

Upstream Changelog:

  • Added a benchmarking feature with --benchmark, which automatically runs a benchmark with your provided settings, outputting run parameters, timing and speed information as well as testing for coherence, and exiting on completion. You can provide a filename e.g. --benchmark result.csv and it will write CSV formatted data appended to that file.
  • Added temperature Quad-Sampling (set via API with parameter smoothing_factor) PR from @AAbushady, (credits @kalomaze).
  • Improved timing displays. Also, displays the seed used, and also shows llama.cpp styled timings when run in --debugmode. The timings will appear faster as they do not include overheads, measuring only specific eval functions.
  • Improved abort generation behavior (allows second user aborting while in queue)
  • Vulkan enhancements from @0cc4m merged: APU memory handling and multigpu. To use multigpu, you can now specify additional IDs, for example --usevulkan 0 2 3 which will use GPUs with IDs 0,2, and 3. Allocation is determined by --tensor_split. Multigpu for Vulkan is currently configurable via commandline only, the GUI launcher does not allow selecting multiple devices for Vulkan.
  • Various improvements and bugfixes merged from upstream.
  • Updated Kobold Lite with many improvements and new features:
    • NEW: The Aesthetic UI is now available for Story and Adventure modes as well!
    • Added "AI Impersonate" feature for Instruct mode.
    • Smoothing factor added, can be configured in dynamic temperature panel.
    • Added a toggle to enable printable view (unlock vertical scrolling).
    • Added a toggle to inject timestamps, allowing the AI to be aware of time passing.
    • Persist API info for A1111 and XTTS, allows specifying custom negative prompts for image gen, allows specifying custom horde keys in KCPP mode.
    • Fixes for XTTS to handle devices with over 100 voices, and also adds an option to narrate dialogue only.
    • Toggle to request A1111 backend to save generated images to disk.
    • Fix for chub.ai card fetching.
    • Hotfix1.57.1: Fixed some crashes and fixed multigpu for vulkan.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

KoboldCPP-v1.56.yr1-ROCm | Test Build

31 Jan 05:55
d0d4c80
Compare
Choose a tag to compare

Test build to try adding AMD Radeon™ RX 6700XT, 6750XT, 6700M, and 6800M support for Windows