Skip to content

Commit

Permalink
FEAT: Support mistral instruct v0.2 (#796)
Browse files Browse the repository at this point in the history
  • Loading branch information
aresnow1 authored Dec 22, 2023
1 parent 825c6e4 commit 5da3fd2
Show file tree
Hide file tree
Showing 7 changed files with 169 additions and 18 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ potential of cutting-edge AI models.
- Speculative decoding: [#509](https://github.com/xorbitsai/inference/pull/509)
- Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
### New Models
- Built-in support for [mistral-instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2): [#796](https://github.com/xorbitsai/inference/pull/796)
- Built-in support for [deepseek-llm](https://huggingface.co/deepseek-ai) and [deepseek-coder](https://huggingface.co/deepseek-ai): [#786](https://github.com/xorbitsai/inference/pull/786)
- Built-in support for [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1): [#782](https://github.com/xorbitsai/inference/pull/782)
- Built-in support for [OpenHermes 2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B): [#776](https://github.com/xorbitsai/inference/pull/776)
- Built-in support for [Yi](https://huggingface.co/01-ai): [#629](https://github.com/xorbitsai/inference/pull/629)
- Built-in support for [zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) and [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta): [#597](https://github.com/xorbitsai/inference/pull/597)
- Built-in support for [chatglm3](https://huggingface.co/THUDM/chatglm3-6b): [#587](https://github.com/xorbitsai/inference/pull/587)
- Built-in support for [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
Expand Down
4 changes: 2 additions & 2 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
- 投机采样: [#509](https://github.com/xorbitsai/inference/pull/509)
- 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
### 新模型
- 内置 [mistral-instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2): [#796](https://github.com/xorbitsai/inference/pull/796)
- 内置 [deepseek-llm](https://huggingface.co/deepseek-ai)[deepseek-coder](https://huggingface.co/deepseek-ai): [#786](https://github.com/xorbitsai/inference/pull/786)
- 内置 [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1): [#782](https://github.com/xorbitsai/inference/pull/782)
- 内置 [OpenHermes 2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B): [#776](https://github.com/xorbitsai/inference/pull/776)
- 内置 [Yi](https://huggingface.co/01-ai): [#629](https://github.com/xorbitsai/inference/pull/629)
- 内置 [zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta): [#597](https://github.com/xorbitsai/inference/pull/597)
- 内置 [chatglm3](https://huggingface.co/THUDM/chatglm3-6b): [#587](https://github.com/xorbitsai/inference/pull/587)
- 内置 [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)[mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
### 集成
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端,支持 Windows,Mac,以及 Linux。
Expand Down
4 changes: 3 additions & 1 deletion doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The following is a list of built-in LLM in Xinference:
glaive-coder

gorilla-openfunctions-v1

gpt-2

internlm-20b
Expand All @@ -63,6 +63,8 @@ The following is a list of built-in LLM in Xinference:

mistral-instruct-v0.1

mistral-instruct-v0.2

mistral-v0.1

mixtral-instruct-v0.1
Expand Down
43 changes: 43 additions & 0 deletions doc/source/models/builtin/llm/mistral-instruct-v0.2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _models_llm_mistral-instruct-v0.2:

========================================
mistral-instruct-v0.2
========================================

- **Context Length:** 8192
- **Model Name:** mistral-instruct-v0.2
- **Languages:** en
- **Abilities:** chat
- **Description:** The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** 4-bit, 8-bit, none
- **Model ID:** mistralai/Mistral-7B-Instruct-v0.2

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 2 (ggufv2, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 7
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Model ID:** TheBloke/Mistral-7B-Instruct-v0.2-GGUF

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name mistral-instruct-v0.2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization}

26 changes: 13 additions & 13 deletions doc/source/models/builtin/llm/qwen-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,40 +14,40 @@ Specifications
^^^^^^^^^^^^^^


Model Spec 1 (ggmlv3, 7 Billion)
Model Spec 1 (ggufv2, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggmlv3
- **Model Format:** ggufv2
- **Model Size (in billions):** 7
- **Quantizations:** q4_0
- **Model ID:** Xorbits/qwen-chat-7B-ggml
- **Quantizations:** Q4_K_M
- **Model ID:** Xorbits/Qwen-7B-Chat-GGUF

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name qwen-chat --size-in-billions 7 --model-format ggmlv3 --quantization ${quantization}
xinference launch --model-name qwen-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization}


Model Spec 2 (ggmlv3, 14 Billion)
Model Spec 2 (ggufv2, 14 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggmlv3
- **Model Format:** ggufv2
- **Model Size (in billions):** 14
- **Quantizations:** q4_0
- **Model ID:** Xorbits/qwen-chat-14B-ggml
- **Quantizations:** Q4_K_M
- **Model ID:** Xorbits/Qwen-14B-Chat-GGUF

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name qwen-chat --size-in-billions 14 --model-format ggmlv3 --quantization ${quantization}
xinference launch --model-name qwen-chat --size-in-billions 14 --model-format ggufv2 --quantization ${quantization}


Model Spec 3 (pytorch, 1_8 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 1_8
- **Quantizations:** 4-bit, 8-bit, none
- **Quantizations:** none
- **Model ID:** Qwen/Qwen-1_8B-Chat

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
Expand All @@ -61,7 +61,7 @@ Model Spec 4 (pytorch, 7 Billion)

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** 4-bit, 8-bit, none
- **Quantizations:** none
- **Model ID:** Qwen/Qwen-7B-Chat

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
Expand Down Expand Up @@ -89,7 +89,7 @@ Model Spec 6 (pytorch, 72 Billion)

- **Model Format:** pytorch
- **Model Size (in billions):** 72
- **Quantizations:** 4-bit, 8-bit, none
- **Quantizations:** none
- **Model ID:** Qwen/Qwen-72B-Chat

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
Expand Down
58 changes: 58 additions & 0 deletions xinference/model/llm/llm_family.json
Original file line number Diff line number Diff line change
Expand Up @@ -2062,6 +2062,64 @@
]
}
},
{
"version": 1,
"context_length": 8192,
"model_name": "mistral-instruct-v0.2",
"model_lang": [
"en"
],
"model_ability": [
"chat"
],
"model_description": "The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 7,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_id": "mistralai/Mistral-7B-Instruct-v0.2",
"model_revision": "b70aa86578567ba3301b21c8a27bea4e8f6d6d61"
},
{
"model_format": "ggufv2",
"model_size_in_billions": 7,
"quantizations": [
"Q2_K",
"Q3_K_S",
"Q3_K_M",
"Q3_K_L",
"Q4_0",
"Q4_K_S",
"Q4_K_M",
"Q5_0",
"Q5_K_S",
"Q5_K_M",
"Q6_K",
"Q8_0"
],
"model_id": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
"model_file_name_template": "mistral-7b-instruct-v0.2.{quantization}.gguf"
}
],
"prompt_style": {
"style_name": "NO_COLON_TWO",
"system_prompt": "<s>[INST] <<SYS>>\nAn informative and inspiring conversation\n<</SYS>>\n\n",
"roles": [
"[INST]",
"[/INST]"
],
"intra_message_sep": " ",
"inter_message_sep": " </s><s>",
"stop_token_ids": [
2
]
}
},
{
"version": 1,
"context_length": 8192,
Expand Down
48 changes: 48 additions & 0 deletions xinference/model/llm/llm_family_modelscope.json
Original file line number Diff line number Diff line change
Expand Up @@ -1209,6 +1209,54 @@
]
}
},
{
"version": 1,
"context_length": 8192,
"model_name": "mistral-instruct-v0.2",
"model_lang": [
"en"
],
"model_ability": [
"chat"
],
"model_description": "The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 7,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_hub": "modelscope",
"model_id": "AI-ModelScope/Mistral-7B-Instruct-v0.2"
},
{
"model_format": "ggufv2",
"model_size_in_billions": 7,
"quantizations": [
"Q4_K_M"
],
"model_hub": "modelscope",
"model_id": "Xorbits/Mistral-7B-Instruct-v0.2-GGUF",
"model_file_name_template": "mistral-7b-instruct-v0.2.{quantization}.gguf"
}
],
"prompt_style": {
"style_name": "NO_COLON_TWO",
"system_prompt": "<s>[INST] <<SYS>>\nAn informative and inspiring conversation\n<</SYS>>\n\n",
"roles": [
"[INST]",
"[/INST]"
],
"intra_message_sep": " ",
"inter_message_sep": " </s><s>",
"stop_token_ids": [
2
]
}
},
{
"version": 1,
"context_length": 2048,
Expand Down

0 comments on commit 5da3fd2

Please sign in to comment.