Releases: xorbitsai/inference
v0.11.2.post1
What's new in 0.11.2.post1 (2024-05-24)
These are the changes in inference v0.11.2.post1, a hotfix version of v0.11.2.
Bug fixes
Full Changelog: v0.11.2...v0.11.2.post1
v0.11.2
What's new in 0.11.2 (2024-05-24)
These are the changes in inference v0.11.2.
New features
- FEAT: Add command cal-model-mem by @frostyplanet in #1460
- FEAT: add deepseek llm and coder base by @qinxuye in #1533
- FEAT: add codeqwen1.5 by @qinxuye in #1535
- FEAT: Auto detect rerank type for unknown rerank type by @codingl2k1 in #1538
- FEAT: Provide the functionality to query information on various cached models hosted on the query node. by @hainaweiben in #1522
Enhancements
- ENH: Compatible with
huggingface-hub
v0.23.0
by @ChengjieLi28 in #1514 - ENH: convert command-r to chat by @qinxuye in #1537
- ENH: Support Intern-VL-Chat model by @amumu96 in #1536
- BLD: adapt to langchain 0.2.x, which has breaking changes by @mikeshi80 in #1521
- BLD: Fix pre commit by @frostyplanet in #1527
- BLD: compatible with torch 2.3.0 by @qinxuye in #1534
Bug fixes
- BUG: Fix start worker failed due to None device name by @codingl2k1 in #1539
- BUG: Fix gpu_idx allocate error when set replica > 1 by @amumu96 in #1528
Others
- CHORE: Basic benchmark/benchmark_rerank.py by @codingl2k1 in #1479
Full Changelog: v0.11.1...v0.11.2
v0.11.1
What's new in 0.11.1 (2024-05-17)
These are the changes in inference v0.11.1.
New features
- FEAT: support Yi-1.5 series by @qinxuye in #1489
- FEAT: [UI] embedding and rerank support the specified GPU and CPU. by @yiboyasss in #1491
Enhancements
- ENH: Refactoring the LoRa adaptation method for the LLM model. by @hainaweiben in #1470
- ENH: Add stream_options support by @amumu96 in #1508
Bug fixes
- BUG: fix top_k for vllm backend by @sixsun10 in #1461
- BUG: Docker image issue due to
torchvision
by @ChengjieLi28 in #1485 - BUG: Docker image crash during startup due to
llama-cpp-python
by @ChengjieLi28 in #1507 - BUG: Fix prompt is needed when docker image builds by @ChengjieLi28 in #1512
- BUG:
llama.cpp
model failed when chat due tolora
by @ChengjieLi28 in #1513
Documentation
- DOC: update quick start ipynb by @qinxuye in #1482
- DOC: Update readme for being integrated by RAGFlow by @JinHai-CN in #1493
- DOC: Lora usage by @ChengjieLi28 in #1506
New Contributors
- @sixsun10 made their first contribution in #1461
- @JinHai-CN made their first contribution in #1493
Full Changelog: v0.11.0...v0.11.1
v0.11.0
What's new in 0.11.0 (2024-05-11)
These are the changes in inference v0.11.0.
Break Changes
v0.11.0 introduced break change when launching model that model_engine
should be specified, refer to Model Engine for more information
New features
- FEAT: Support Mixtral-8x22b-instruct-v0.1 by @qinxuye in #1340
- feat: add phi-3-mini series by @orangeclk in #1379
- FEAT: add Starling model by @boy-hack in #1384
- FEAT: support qwen1.5 110b by @qinxuye in #1388
- FEAT: Support query engine with cmdline by @Ago327 in #1380
- FEAT: Ascend support by @qinxuye in #1408
- FEAT: Audio support verbose_json and timestamp by @codingl2k1 in #1402
- FEAT: [UI] Add engine option when launching LLM by @yiboyasss in #1456
Enhancements
- ENH: add custom image model by @amumu96 in #1312
- ENH: Support more quantization with VLLM by @amumu96 in #1372
- ENH: Update chatglm3 6b model version by @codingl2k1 in #1401
- ENH: make qwen_vl support streaming output by @Minamiyama in #1425
- ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in #1429
- ENH: Improve benchmark and add long context generate by @frostyplanet in #1423
- ENH: make yi_vl support streaming output by @Minamiyama in #1443
- ENH: Some minor changes by @frostyplanet in #1453
- ENH: make deepseek_vl support streaming output by @Minamiyama in #1444
- ENH: Rename
model_engine
for more clear inference backend by @ChengjieLi28 in #1466 - BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in #1405
- CLN: Remove actor client by @ChengjieLi28 in #1436
- CLN: Remove all speculative-related codes by @ChengjieLi28 in #1435
- REF: Query for engine by @Ago327 in #1342
- REF: [UI] Refactor register model by @yiboyasss in #1368
- REF: Add the
model_engine
parameter for launching process by @hainaweiben in #1367
Bug fixes
- BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in #1370
- BUG: no role:user msg or content empty got an error. by @liuzhenghua in #1378
- BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in #1389
- BUG: Fix using extra gpus due to match in
__init__
by @ChengjieLi28 in #1400 - BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in #1381
- BUG: Fix tool calls return invalid usage by @codingl2k1 in #1420
- BUG: Fix tools ability by @mikeshi80 in #1447
- BUG: Install error on MacOS due to
auto-gptq
by @ChengjieLi28 in #1457 - BUG: fix some issues in query engine interface by @Ago327 in #1442
Tests
- TST: Pin
huggingface-hub
to pass CI since it has some break changes by @ChengjieLi28 in #1427
Documentation
- DOC: update readme & fix Mac CI by @qinxuye in #1385
- DOC: worker address should be specified for
xinference-worker
by @amumu96 in #1397 - DOC: update docker doc in using xinference by @qinxuye in #1417
- DOC: add the missing backslash in shell command by @mikeshi80 in #1451
- DOC: Usage about
model_engine
by @ChengjieLi28 in #1468
Others
New Contributors
- @liuzhenghua made their first contribution in #1378
- @emulated24 made their first contribution in #1389
- @orangeclk made their first contribution in #1379
- @boy-hack made their first contribution in #1384
- @frostyplanet made their first contribution in #1423
Full Changelog: v0.10.3...v0.11.0
v0.10.3
What's new in 0.10.3 (2024-04-24)
These are the changes in inference v0.10.3.
New features
- FEAT: support llama-3 family by @qinxuye in #1332
- FEAT: Add Belle-whisper-large-v3-zh by @codingl2k1 in #1351
Enhancements
- ENH: fix the max length of codeqwen-7B-chat by @mikeshi80 in #1354
- ENH: Clear cache for embedding and rerank by @codingl2k1 in #1360
Bug fixes
- BUG: Fix Launching embedding or reranking models from commandline fails due to PEFT by @hainaweiben in #1343
- BUG: Fix extra parameters issue when auto-recovering models by @ChengjieLi28 in #1348
- BUG: Fix old rerank models use flag rerank issue by @codingl2k1 in #1350
Documentation
- DOC: Add new models to README by @qinxuye in #1346
- DOC: Update README, add FastGPT to integrations by @yangchuansheng in #1355
New Contributors
- @yangchuansheng made their first contribution in #1355
Full Changelog: v0.10.2.post1...v0.10.3
v0.10.2.post1
What's new in 0.10.2.post1 (2024-04-19)
These are the changes in inference v0.10.2.post1.
Bug fixes
- BUG: Fix
xinference-client
package depends on internal code by @ChengjieLi28 in #1330 - BUG: Fix restful client depends on specific type by @ChengjieLi28 in #1331
Full Changelog: v0.10.2...v0.10.2.post1
v0.10.2
What's new in 0.10.2 (2024-04-19)
These are the changes in inference v0.10.2.
New features
- FEAT: [UI] Add replica configuration when launching
embedding
andrerank
models by @yiboyasss in #1306 - FEAT: Lora multi support by @hainaweiben in #1273
- FEAT: Support SeaLLM-7B and c4ai-command-r-v01 by @mujin2 in #1310
- FEAT: Support BAAI/bge-reranker-v2-* rerank model by @codingl2k1 in #1305
- FEAT: UI supports multi lora by @yiboyasss in #1320
- FEAT: Add_cia4command_modelscope by @mujin2 in #1321
- FEAT: support m3e embedding models by @qinxuye in #1298
- FEAT: hotkey to active search by @Minamiyama in #1287
- FEAT: support codeqwen1.5-chat by @qinxuye in #1322
Enhancements
- ENH: Support custom audio model by @amumu96 in #1279
- ENH: support int and str compare for model size by @mikeshi80 in #1277
- BLD: Add
FlagEmbedding
in cpu docker by @ChengjieLi28 in #1318 - REF: support query for engine feature by @Ago327 in #1294
Others
Full Changelog: v0.10.1...v0.10.2
v0.10.1
What's new in 0.10.1 (2024-04-12)
These are the changes in inference v0.10.1.
New features
- FEAT: add support for qwen1.5 32B chat model by @mikeshi80 in #1249
- FEAT: Support Qwen MoE model for huggingface and modelscope by @xiaodouzi666 in #1263
- FEAT: Enable streaming in tool calls for Qwen when using vllm by @zhanghx0905 in #1215
Enhancements
- ENH: make function create_embedding could receive extra args by @amumu96 in #1224
- ENH: support more GPTQ and AWQ format for some models by @xiaodouzi666 in #1243
- ENH: support multi gpus for qwen-vl and yi-vl by @qinxuye in #1236
- ENH: support llamacpp multiple gpu by @amumu96 in #1229
- ENH: UI: paper material for cards by @Minamiyama in #1261
- REF: Refactor launch model for Web UI by @yiboyasss in #1254
- REF: Remove ctransformers supports by @mujin2 in #1267
Bug fixes
- BUG: Fix docker cpu build by @ChengjieLi28 in #1213
- BUG: Fix cannot start xinference in docker due to
cv2
by @ChengjieLi28 in #1217 - BUG: Cannot start xinference in docker by @ChengjieLi28 in #1219
- BUG: Fix
opencv
issue in docker container by @ChengjieLi28 in #1227 - BUG: Fix the launch bug of OmnilMM 12B. by @hainaweiben in #1241
- BUG: style spell error by @Minamiyama in #1247
- BUG: Fix issue with supervisor not clearing information after worker exit by @hainaweiben in #1231
- BUG: custom models on the web ui by @yiboyasss in #1259
- BUG: fix system prompts for chatglm3 and internlm2 pytorch by @qinxuye in #1271
- BUG: Fix authority and jump issue by @yiboyasss in #1276
- BUG: fix custom vision model by @qinxuye in #1280
Tests
- TST: Fix tests due to
llama-cpp-python
v0.2.58
by @ChengjieLi28 in #1242
Documentation
- DOC: auto gen vllm doc & add chatglm3-{32k, 128k} support for vllm by @qinxuye in #1234
- DOC: update models doc by @qinxuye in #1246
- DOC: update readme by @qinxuye in #1268
New Contributors
- @amumu96 made their first contribution in #1224
- @xiaodouzi666 made their first contribution in #1243
- @yiboyasss made their first contribution in #1254
Full Changelog: v0.10.0...v0.10.1
v0.10.0
What's new in 0.10.0 (2024-03-29)
These are the changes in inference v0.10.0.
New features
- FEAT: launch UI of audio model. by @hainaweiben in #1102
- FEAT: Supports
OmniLMM
chat model by @hainaweiben in #1171 - FEAT: Added vllm support for deepseek models by @ivanzfb in #1200
- FEAT: force to specify worker ip and gpu idx when launching models by @ChengjieLi28 in #1195
- FEAT: OAuth system supports api-key by @Ago327 in #1168
- FEAT: Support deepseek vl by @codingl2k1 in #1175
- FEAT: support some builtin new models by @mujin2 in #1204
Enhancements
- BLD: add autoawq in setup by @utopia2077 in #1190
Bug fixes
- BUG: Fix the incorrect model interface address caused a 307 redirect to HTTP, blocking the request and preventing the display of the model list. by @wertycn in #1182
- BUG: fix doc fail introduced by #1171 & update readme by @qinxuye in #1203
- BUG: Increase validator types for thie 'input' parameter of embeddings to match OpenAI API by @Minamiyama in #1201
Documentation
- DOC: internal design by @1572161937 in #1178
- Doc: update readme and models doc by @qinxuye in #1176
- DOC: Doc for oauth system with api-key by @ChengjieLi28 in #1210
New Contributors
- @utopia2077 made their first contribution in #1190
- @ivanzfb made their first contribution in #1200
Full Changelog: v0.9.4...v0.10.0
v0.9.4
What's new in 0.9.4 (2024-03-21)
These are the changes in inference v0.9.4.
New features
- FEAT: Support CodeShell model by @hainaweiben in #1166
- FEAT: Supports
sglang
backend by @ChengjieLi28 in #1161
Enhancements
- ENH: vLLM latest models support by @1572161937 in #1155
Bug fixes
- BUG: remove
best_of
from benchmark by @qinxuye in #1150 - BUG: fix _eval_qwen_chat_arguments parsing problem by @channingxiao18 in #1098
- BUG: Fix OpenAI compatibility issue during chat by @mujin2 in #1159
Documentation
- DOC: Update doc by @codingl2k1 in #1156
Others
New Contributors
- @channingxiao18 made their first contribution in #1098
- @1572161937 made their first contribution in #1155
Full Changelog: v0.9.3...v0.9.4