- Remove the VRAM occupation when zero offloading:
-ngl 0
; - Fix rerank model loading error: gpustack/gte-multilingual-reranker-base-GGUF, gpustack/jina-reranker-v2-base-multilingual-GGUF
- Support tool calling in ChatGLM4 series;
- Introduce DDIM(
ddim_trailing
) sample method; - Support multiple devices offloading image model.