Skip to content

KoboldCPP-v1.57.1.yr1-ROCm

Compare
Choose a tag to compare
@github-actions github-actions released this 11 Feb 03:57
· 2543 commits to main since this release
ae6ece1

KoboldCpp-1.57.1.yr1-ROCm

Windows build does not contain the Vulkan backend yet.

  • Experimental ROCm Support for Windows was added for the following GPUs thanks to @harish0201 and @jasyuiop:
Desktop GPUs Laptop GPUs
AMD Radeon PRO W6600 AMD Radeon PRO W6600M
AMD Radeon PRO W6600X AMD Radeon PRO W6600X
AMD Radeon RX 6600 AMD Radeon RX 6600S
AMD Radeon RX 6600 XT AMD Radeon RX 6700S
AMD Radeon RX 6650 XT AMD Radeon RX 6800S
AMD Radeon RX 6700 AMD Radeon RX 6650M
AMD Radeon RX 6700 XT AMD Radeon RX 6650M XT
AMD Radeon RX 6750 XT AMD Radeon RX 6700M
AMD Radeon RX 6750 GRE 10 GB AMD Radeon RX 6800M
AMD Radeon RX 6750 GRE 12 GB AMD Radeon RX 6850M XT

Upstream Changelog:

  • Added a benchmarking feature with --benchmark, which automatically runs a benchmark with your provided settings, outputting run parameters, timing and speed information as well as testing for coherence, and exiting on completion. You can provide a filename e.g. --benchmark result.csv and it will write CSV formatted data appended to that file.
  • Added temperature Quad-Sampling (set via API with parameter smoothing_factor) PR from @AAbushady, (credits @kalomaze).
  • Improved timing displays. Also, displays the seed used, and also shows llama.cpp styled timings when run in --debugmode. The timings will appear faster as they do not include overheads, measuring only specific eval functions.
  • Improved abort generation behavior (allows second user aborting while in queue)
  • Vulkan enhancements from @0cc4m merged: APU memory handling and multigpu. To use multigpu, you can now specify additional IDs, for example --usevulkan 0 2 3 which will use GPUs with IDs 0,2, and 3. Allocation is determined by --tensor_split. Multigpu for Vulkan is currently configurable via commandline only, the GUI launcher does not allow selecting multiple devices for Vulkan.
  • Various improvements and bugfixes merged from upstream.
  • Updated Kobold Lite with many improvements and new features:
    • NEW: The Aesthetic UI is now available for Story and Adventure modes as well!
    • Added "AI Impersonate" feature for Instruct mode.
    • Smoothing factor added, can be configured in dynamic temperature panel.
    • Added a toggle to enable printable view (unlock vertical scrolling).
    • Added a toggle to inject timestamps, allowing the AI to be aware of time passing.
    • Persist API info for A1111 and XTTS, allows specifying custom negative prompts for image gen, allows specifying custom horde keys in KCPP mode.
    • Fixes for XTTS to handle devices with over 100 voices, and also adds an option to narrate dialogue only.
    • Toggle to request A1111 backend to save generated images to disk.
    • Fix for chub.ai card fetching.
    • Hotfix1.57.1: Fixed some crashes and fixed multigpu for vulkan.

To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py (additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4 (-j4 can be adjusted to your number of CPU threads for faster build times)

For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas openblas and clblast.
For Debian: Install libclblast-dev and libopenblas-dev.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4

If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.