FEAT: support some builtin new models (#1204)

Co-authored-by: ChengjieLi <chengjieli23@outlook.com>
xorbitsai · Mar 29, 2024 · 2857ec4 · 2857ec4
1 parent f9392f7
commit 2857ec4
Show file tree

Hide file tree

Showing 17 changed files with 1,519 additions and 4 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,7 +20,7 @@ repos:
       - id: isort
         args: [--sp, setup.cfg]
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.7.1
+    rev: v1.9.0
     hooks:
       - id: mypy
         additional_dependencies: ["tokenize-rt==3.2.0", "types-requests", "types-tabulate"]

diff --git a/doc/source/models/builtin/llm/aquila2-chat-16k.rst b/doc/source/models/builtin/llm/aquila2-chat-16k.rst
@@ -0,0 +1,90 @@
+.. _models_llm_aquila2-chat-16k:
+
+========================================
+aquila2-chat-16k
+========================================
+
+- **Context Length:** 16384
+- **Model Name:** aquila2-chat-16k
+- **Languages:** zh
+- **Abilities:** chat
+- **Description:** AquilaChat2-16k series models are the long-text chat models
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 7
+- **Quantizations:** none
+- **Model ID:** BAAI/AquilaChat2-7B-16K
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-7B-16K>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat-16k --size-in-billions 7 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (ggufv2, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** ggufv2
+- **Model Size (in billions):** 34
+- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0
+- **Model ID:** TheBloke/AquilaChat2-34B-16K-GGUF
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-GGUF>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format ggufv2 --quantization ${quantization}
+
+
+Model Spec 3 (gptq, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** gptq
+- **Model Size (in billions):** 34
+- **Quantizations:** Int4
+- **Model ID:** TheBloke/AquilaChat2-34B-16K-GPTQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-GPTQ>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format gptq --quantization ${quantization}
+
+
+Model Spec 4 (awq, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** awq
+- **Model Size (in billions):** 34
+- **Quantizations:** Int4
+- **Model ID:** TheBloke/AquilaChat2-34B-16K-AWQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-AWQ>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format awq --quantization ${quantization}
+
+
+Model Spec 5 (pytorch, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 34
+- **Quantizations:** none
+- **Model ID:** BAAI/AquilaChat2-34B-16K
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-34B-16K>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B-16K>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/aquila2-chat.rst b/doc/source/models/builtin/llm/aquila2-chat.rst
@@ -0,0 +1,105 @@
+.. _models_llm_aquila2-chat:
+
+========================================
+aquila2-chat
+========================================
+
+- **Context Length:** 2048
+- **Model Name:** aquila2-chat
+- **Languages:** zh
+- **Abilities:** chat
+- **Description:** Aquila2-chat series models are the chat models
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 7
+- **Quantizations:** none
+- **Model ID:** BAAI/AquilaChat2-7B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-7B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (ggufv2, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** ggufv2
+- **Model Size (in billions):** 34
+- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0
+- **Model ID:** TheBloke/AquilaChat2-34B-GGUF
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-GGUF>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format ggufv2 --quantization ${quantization}
+
+
+Model Spec 3 (gptq, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** gptq
+- **Model Size (in billions):** 34
+- **Quantizations:** Int4
+- **Model ID:** TheBloke/AquilaChat2-34B-GPTQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-GPTQ>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B-Int4-GPTQ>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format gptq --quantization ${quantization}
+
+
+Model Spec 4 (awq, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** awq
+- **Model Size (in billions):** 34
+- **Quantizations:** Int4
+- **Model ID:** TheBloke/AquilaChat2-34B-AWQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-AWQ>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format awq --quantization ${quantization}
+
+
+Model Spec 5 (pytorch, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 34
+- **Quantizations:** none
+- **Model ID:** BAAI/AquilaChat2-34B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-34B>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 6 (pytorch, 70 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 70
+- **Quantizations:** none
+- **Model ID:** BAAI/AquilaChat2-70B-Expr
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-70B-Expr>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-70B-Expr>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2-chat --size-in-billions 70 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/aquila2.rst b/doc/source/models/builtin/llm/aquila2.rst
@@ -0,0 +1,60 @@
+.. _models_llm_aquila2:
+
+========================================
+aquila2
+========================================
+
+- **Context Length:** 2048
+- **Model Name:** aquila2
+- **Languages:** zh
+- **Abilities:** generate
+- **Description:** Aquila2 series models are the base language models
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 7
+- **Quantizations:** none
+- **Model ID:** BAAI/Aquila2-7B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/Aquila2-7B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 34
+- **Quantizations:** none
+- **Model ID:** BAAI/Aquila2-34B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/Aquila2-34B>`__, `ModelScope <https://modelscope.cn/models/BAAI/Aquila2-34B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2 --size-in-billions 34 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 3 (pytorch, 70 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 70
+- **Quantizations:** none
+- **Model ID:** BAAI/Aquila2-70B-Expr
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/BAAI/Aquila2-70B-Expr>`__, `ModelScope <https://modelscope.cn/models/BAAI/Aquila2-70B-Expr>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name aquila2 --size-in-billions 70 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/chatglm3-128k.rst b/doc/source/models/builtin/llm/chatglm3-128k.rst
@@ -0,0 +1,30 @@
+.. _models_llm_chatglm3-128k:
+
+========================================
+chatglm3-128k
+========================================
+
+- **Context Length:** 131072
+- **Model Name:** chatglm3-128k
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 6 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 6
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** THUDM/chatglm3-6b-128k
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/chatglm3-6b-128k>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name chatglm3-128k --size-in-billions 6 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/gorilla-openfunctions-v2.rst b/doc/source/models/builtin/llm/gorilla-openfunctions-v2.rst
@@ -0,0 +1,45 @@
+.. _models_llm_gorilla-openfunctions-v2:
+
+========================================
+gorilla-openfunctions-v2
+========================================
+
+- **Context Length:** 4096
+- **Model Name:** gorilla-openfunctions-v2
+- **Languages:** en
+- **Abilities:** chat
+- **Description:** OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 7
+- **Quantizations:** none
+- **Model ID:** gorilla-llm/gorilla-openfunctions-v2
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (ggufv2, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** ggufv2
+- **Model Size (in billions):** 7
+- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K
+- **Model ID:** gorilla-llm//gorilla-openfunctions-v2-GGUF
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/gorilla-llm//gorilla-openfunctions-v2-GGUF>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization}
+