24 May 11:52

XprobeBot

ac8f334

v0.11.2.post1

What's new in 0.11.2.post1 (2024-05-24)

These are the changes in inference v0.11.2.post1, a hotfix version of v0.11.2.

Bug fixes

BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543

Full Changelog: v0.11.2...v0.11.2.post1

Contributors

amumu96

Assets 2

24 May 09:10

XprobeBot

v0.11.2

77e79f8

v0.11.2

What's new in 0.11.2 (2024-05-24)

These are the changes in inference v0.11.2.

New features

FEAT: Add command cal-model-mem by @frostyplanet in #1460
FEAT: add deepseek llm and coder base by @qinxuye in #1533
FEAT: add codeqwen1.5 by @qinxuye in #1535
FEAT: Auto detect rerank type for unknown rerank type by @codingl2k1 in #1538
FEAT: Provide the functionality to query information on various cached models hosted on the query node. by @hainaweiben in #1522

Enhancements

ENH: Compatible with huggingface-hub v0.23.0 by @ChengjieLi28 in #1514
ENH: convert command-r to chat by @qinxuye in #1537
ENH: Support Intern-VL-Chat model by @amumu96 in #1536
BLD: adapt to langchain 0.2.x, which has breaking changes by @mikeshi80 in #1521
BLD: Fix pre commit by @frostyplanet in #1527
BLD: compatible with torch 2.3.0 by @qinxuye in #1534

Bug fixes

BUG: Fix start worker failed due to None device name by @codingl2k1 in #1539
BUG: Fix gpu_idx allocate error when set replica > 1 by @amumu96 in #1528

Others

CHORE: Basic benchmark/benchmark_rerank.py by @codingl2k1 in #1479

Full Changelog: v0.11.1...v0.11.2

Contributors

frostyplanet, qinxuye, and 5 other contributors

Assets 2

17 May 07:17

XprobeBot

v0.11.1

55a0200

v0.11.1

What's new in 0.11.1 (2024-05-17)

These are the changes in inference v0.11.1.

New features

FEAT: support Yi-1.5 series by @qinxuye in #1489
FEAT: [UI] embedding and rerank support the specified GPU and CPU. by @yiboyasss in #1491

Enhancements

ENH: Refactoring the LoRa adaptation method for the LLM model. by @hainaweiben in #1470
ENH: Add stream_options support by @amumu96 in #1508

Bug fixes

BUG: fix top_k for vllm backend by @sixsun10 in #1461
BUG: Docker image issue due to torchvision by @ChengjieLi28 in #1485
BUG: Docker image crash during startup due to llama-cpp-python by @ChengjieLi28 in #1507
BUG: Fix prompt is needed when docker image builds by @ChengjieLi28 in #1512
BUG: llama.cpp model failed when chat due to lora by @ChengjieLi28 in #1513

Documentation

DOC: update quick start ipynb by @qinxuye in #1482
DOC: Update readme for being integrated by RAGFlow by @JinHai-CN in #1493
DOC: Lora usage by @ChengjieLi28 in #1506

New Contributors

@sixsun10 made their first contribution in #1461
@JinHai-CN made their first contribution in #1493

Full Changelog: v0.11.0...v0.11.1

Contributors

qinxuye, JinHai-CN, and 5 other contributors

Assets 2

11 May 09:41

XprobeBot

v0.11.0

21be5ab

v0.11.0

What's new in 0.11.0 (2024-05-11)

These are the changes in inference v0.11.0.

Break Changes

v0.11.0 introduced break change when launching model that model_engine should be specified, refer to Model Engine for more information

New features

FEAT: Support Mixtral-8x22b-instruct-v0.1 by @qinxuye in #1340
feat: add phi-3-mini series by @orangeclk in #1379
FEAT: add Starling model by @boy-hack in #1384
FEAT: support qwen1.5 110b by @qinxuye in #1388
FEAT: Support query engine with cmdline by @Ago327 in #1380
FEAT: Ascend support by @qinxuye in #1408
FEAT: Audio support verbose_json and timestamp by @codingl2k1 in #1402
FEAT: [UI] Add engine option when launching LLM by @yiboyasss in #1456

Enhancements

ENH: add custom image model by @amumu96 in #1312
ENH: Support more quantization with VLLM by @amumu96 in #1372
ENH: Update chatglm3 6b model version by @codingl2k1 in #1401
ENH: make qwen_vl support streaming output by @Minamiyama in #1425
ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in #1429
ENH: Improve benchmark and add long context generate by @frostyplanet in #1423
ENH: make yi_vl support streaming output by @Minamiyama in #1443
ENH: Some minor changes by @frostyplanet in #1453
ENH: make deepseek_vl support streaming output by @Minamiyama in #1444
ENH: Rename model_engine for more clear inference backend by @ChengjieLi28 in #1466
BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in #1405
CLN: Remove actor client by @ChengjieLi28 in #1436
CLN: Remove all speculative-related codes by @ChengjieLi28 in #1435
REF: Query for engine by @Ago327 in #1342
REF: [UI] Refactor register model by @yiboyasss in #1368
REF: Add the model_engine parameter for launching process by @hainaweiben in #1367

Bug fixes

BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in #1370
BUG: no role:user msg or content empty got an error. by @liuzhenghua in #1378
BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in #1389
BUG: Fix using extra gpus due to match in __init__ by @ChengjieLi28 in #1400
BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in #1381
BUG: Fix tool calls return invalid usage by @codingl2k1 in #1420
BUG: Fix tools ability by @mikeshi80 in #1447
BUG: Install error on MacOS due to auto-gptq by @ChengjieLi28 in #1457
BUG: fix some issues in query engine interface by @Ago327 in #1442

Tests

TST: Pin huggingface-hub to pass CI since it has some break changes by @ChengjieLi28 in #1427

Documentation

DOC: update readme & fix Mac CI by @qinxuye in #1385
DOC: worker address should be specified for xinference-worker by @amumu96 in #1397
DOC: update docker doc in using xinference by @qinxuye in #1417
DOC: add the missing backslash in shell command by @mikeshi80 in #1451
DOC: Usage about model_engine by @ChengjieLi28 in #1468

Others

BUG：Fix mertics is empty when call /v1/chat/completions by @amumu96 in #1406

New Contributors

@liuzhenghua made their first contribution in #1378
@emulated24 made their first contribution in #1389
@orangeclk made their first contribution in #1379
@boy-hack made their first contribution in #1384
@frostyplanet made their first contribution in #1423

Full Changelog: v0.10.3...v0.11.0

Contributors

frostyplanet, qinxuye, and 12 other contributors

Assets 2

24 Apr 02:57

XprobeBot

v0.10.3

2ba72b0

v0.10.3

What's new in 0.10.3 (2024-04-24)

These are the changes in inference v0.10.3.

New features

FEAT: support llama-3 family by @qinxuye in #1332
FEAT: Add Belle-whisper-large-v3-zh by @codingl2k1 in #1351

Enhancements

ENH: fix the max length of codeqwen-7B-chat by @mikeshi80 in #1354
ENH: Clear cache for embedding and rerank by @codingl2k1 in #1360

Bug fixes

BUG: Fix Launching embedding or reranking models from commandline fails due to PEFT by @hainaweiben in #1343
BUG: Fix extra parameters issue when auto-recovering models by @ChengjieLi28 in #1348
BUG: Fix old rerank models use flag rerank issue by @codingl2k1 in #1350

Documentation

DOC: Add new models to README by @qinxuye in #1346
DOC: Update README, add FastGPT to integrations by @yangchuansheng in #1355

New Contributors

@yangchuansheng made their first contribution in #1355

Full Changelog: v0.10.2.post1...v0.10.3

Contributors

qinxuye, mikeshi80, and 4 other contributors

Assets 2

19 Apr 06:48

XprobeBot

v0.10.2.post1

5001715

v0.10.2.post1

What's new in 0.10.2.post1 (2024-04-19)

These are the changes in inference v0.10.2.post1.

Bug fixes

BUG: Fix xinference-client package depends on internal code by @ChengjieLi28 in #1330
BUG: Fix restful client depends on specific type by @ChengjieLi28 in #1331

Full Changelog: v0.10.2...v0.10.2.post1

Contributors

ChengjieLi28

Assets 2

19 Apr 06:19

XprobeBot

v0.10.2

f19e85b

v0.10.2

What's new in 0.10.2 (2024-04-19)

These are the changes in inference v0.10.2.

New features

FEAT: [UI] Add replica configuration when launching embedding and rerank models by @yiboyasss in #1306
FEAT: Lora multi support by @hainaweiben in #1273
FEAT: Support SeaLLM-7B and c4ai-command-r-v01 by @mujin2 in #1310
FEAT: Support BAAI/bge-reranker-v2-* rerank model by @codingl2k1 in #1305
FEAT: UI supports multi lora by @yiboyasss in #1320
FEAT: Add_cia4command_modelscope by @mujin2 in #1321
FEAT: support m3e embedding models by @qinxuye in #1298
FEAT: hotkey to active search by @Minamiyama in #1287
FEAT: support codeqwen1.5-chat by @qinxuye in #1322

Enhancements

ENH: Support custom audio model by @amumu96 in #1279
ENH: support int and str compare for model size by @mikeshi80 in #1277
BLD: Add FlagEmbedding in cpu docker by @ChengjieLi28 in #1318
REF: support query for engine feature by @Ago327 in #1294

Others

Revert "REF: support query for engine feature" by @qinxuye in #1329

Full Changelog: v0.10.1...v0.10.2

Contributors

qinxuye, Minamiyama, and 8 other contributors

Assets 2

12 Apr 02:47

XprobeBot

v0.10.1

e3a947e

v0.10.1

What's new in 0.10.1 (2024-04-12)

These are the changes in inference v0.10.1.

New features

FEAT: add support for qwen1.5 32B chat model by @mikeshi80 in #1249
FEAT: Support Qwen MoE model for huggingface and modelscope by @xiaodouzi666 in #1263
FEAT: Enable streaming in tool calls for Qwen when using vllm by @zhanghx0905 in #1215

Enhancements

ENH: make function create_embedding could receive extra args by @amumu96 in #1224
ENH: support more GPTQ and AWQ format for some models by @xiaodouzi666 in #1243
ENH: support multi gpus for qwen-vl and yi-vl by @qinxuye in #1236
ENH: support llamacpp multiple gpu by @amumu96 in #1229
ENH: UI: paper material for cards by @Minamiyama in #1261
REF: Refactor launch model for Web UI by @yiboyasss in #1254
REF: Remove ctransformers supports by @mujin2 in #1267

Bug fixes

BUG: Fix docker cpu build by @ChengjieLi28 in #1213
BUG: Fix cannot start xinference in docker due to cv2 by @ChengjieLi28 in #1217
BUG: Cannot start xinference in docker by @ChengjieLi28 in #1219
BUG: Fix opencv issue in docker container by @ChengjieLi28 in #1227
BUG: Fix the launch bug of OmnilMM 12B. by @hainaweiben in #1241
BUG: style spell error by @Minamiyama in #1247
BUG: Fix issue with supervisor not clearing information after worker exit by @hainaweiben in #1231
BUG: custom models on the web ui by @yiboyasss in #1259
BUG: fix system prompts for chatglm3 and internlm2 pytorch by @qinxuye in #1271
BUG: Fix authority and jump issue by @yiboyasss in #1276
BUG: fix custom vision model by @qinxuye in #1280

Tests

TST: Fix tests due to llama-cpp-python v0.2.58 by @ChengjieLi28 in #1242

Documentation

DOC: auto gen vllm doc & add chatglm3-{32k, 128k} support for vllm by @qinxuye in #1234
DOC: update models doc by @qinxuye in #1246
DOC: update readme by @qinxuye in #1268

New Contributors

@amumu96 made their first contribution in #1224
@xiaodouzi666 made their first contribution in #1243
@yiboyasss made their first contribution in #1254

Full Changelog: v0.10.0...v0.10.1

Contributors

qinxuye, Minamiyama, and 8 other contributors

Assets 2

29 Mar 04:56

XprobeBot

v0.10.0

2857ec4

v0.10.0

What's new in 0.10.0 (2024-03-29)

These are the changes in inference v0.10.0.

New features

FEAT: launch UI of audio model. by @hainaweiben in #1102
FEAT: Supports OmniLMM chat model by @hainaweiben in #1171
FEAT: Added vllm support for deepseek models by @ivanzfb in #1200
FEAT: force to specify worker ip and gpu idx when launching models by @ChengjieLi28 in #1195
FEAT: OAuth system supports api-key by @Ago327 in #1168
FEAT: Support deepseek vl by @codingl2k1 in #1175
FEAT: support some builtin new models by @mujin2 in #1204

Enhancements

BLD: add autoawq in setup by @utopia2077 in #1190

Bug fixes

BUG: Fix the incorrect model interface address caused a 307 redirect to HTTP, blocking the request and preventing the display of the model list. by @wertycn in #1182
BUG: fix doc fail introduced by #1171 & update readme by @qinxuye in #1203
BUG: Increase validator types for thie 'input' parameter of embeddings to match OpenAI API by @Minamiyama in #1201

Documentation

DOC: internal design by @1572161937 in #1178
Doc: update readme and models doc by @qinxuye in #1176
DOC: Doc for oauth system with api-key by @ChengjieLi28 in #1210

New Contributors

@utopia2077 made their first contribution in #1190
@ivanzfb made their first contribution in #1200

Full Changelog: v0.9.4...v0.10.0

Contributors

qinxuye, Minamiyama, and 9 other contributors

Assets 2

21 Mar 07:06

XprobeBot

v0.9.4

2c9465a

v0.9.4

What's new in 0.9.4 (2024-03-21)

These are the changes in inference v0.9.4.

New features

FEAT: Support CodeShell model by @hainaweiben in #1166
FEAT: Supports sglang backend by @ChengjieLi28 in #1161

Enhancements

ENH: vLLM latest models support by @1572161937 in #1155

Bug fixes

BUG: remove best_of from benchmark by @qinxuye in #1150
BUG: fix _eval_qwen_chat_arguments parsing problem by @channingxiao18 in #1098
BUG: Fix OpenAI compatibility issue during chat by @mujin2 in #1159

Documentation

DOC: Update doc by @codingl2k1 in #1156

Others

Chore: add assign workflow by @qinxuye in #1131

New Contributors

@channingxiao18 made their first contribution in #1098
@1572161937 made their first contribution in #1155

Full Changelog: v0.9.3...v0.9.4

Contributors

qinxuye, Yukun-Cui, and 5 other contributors

Assets 2

Releases: xorbitsai/inference

v0.11.2.post1

What's new in 0.11.2.post1 (2024-05-24)

Bug fixes

Contributors

v0.11.2

What's new in 0.11.2 (2024-05-24)

New features

Enhancements

Bug fixes

Others

Contributors

v0.11.1

What's new in 0.11.1 (2024-05-17)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.11.0

What's new in 0.11.0 (2024-05-11)

Break Changes

New features

Enhancements

Bug fixes

Tests

Documentation

Others

New Contributors

Contributors

v0.10.3

What's new in 0.10.3 (2024-04-24)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.10.2.post1

What's new in 0.10.2.post1 (2024-04-19)

Bug fixes

Contributors

v0.10.2

What's new in 0.10.2 (2024-04-19)

New features

Enhancements

Others

Contributors

v0.10.1

What's new in 0.10.1 (2024-04-12)

New features

Enhancements

Bug fixes

Tests

Documentation

New Contributors

Contributors

v0.10.0

What's new in 0.10.0 (2024-03-29)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.9.4

What's new in 0.9.4 (2024-03-21)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors