Skip to content

Commit

Permalink
FEAT: support some builtin new models (#1204)
Browse files Browse the repository at this point in the history
Co-authored-by: ChengjieLi <chengjieli23@outlook.com>
  • Loading branch information
mujin2 and ChengjieLi28 authored Mar 29, 2024
1 parent f9392f7 commit 2857ec4
Show file tree
Hide file tree
Showing 17 changed files with 1,519 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ repos:
- id: isort
args: [--sp, setup.cfg]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.7.1
rev: v1.9.0
hooks:
- id: mypy
additional_dependencies: ["tokenize-rt==3.2.0", "types-requests", "types-tabulate"]
Expand Down
90 changes: 90 additions & 0 deletions doc/source/models/builtin/llm/aquila2-chat-16k.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
.. _models_llm_aquila2-chat-16k:

========================================
aquila2-chat-16k
========================================

- **Context Length:** 16384
- **Model Name:** aquila2-chat-16k
- **Languages:** zh
- **Abilities:** chat
- **Description:** AquilaChat2-16k series models are the long-text chat models

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** none
- **Model ID:** BAAI/AquilaChat2-7B-16K
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-7B-16K>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat-16k --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 2 (ggufv2, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 34
- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Model ID:** TheBloke/AquilaChat2-34B-16K-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format ggufv2 --quantization ${quantization}


Model Spec 3 (gptq, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** gptq
- **Model Size (in billions):** 34
- **Quantizations:** Int4
- **Model ID:** TheBloke/AquilaChat2-34B-16K-GPTQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-GPTQ>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format gptq --quantization ${quantization}


Model Spec 4 (awq, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** awq
- **Model Size (in billions):** 34
- **Quantizations:** Int4
- **Model ID:** TheBloke/AquilaChat2-34B-16K-AWQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-16K-AWQ>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format awq --quantization ${quantization}


Model Spec 5 (pytorch, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 34
- **Quantizations:** none
- **Model ID:** BAAI/AquilaChat2-34B-16K
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-34B-16K>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B-16K>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat-16k --size-in-billions 34 --model-format pytorch --quantization ${quantization}

105 changes: 105 additions & 0 deletions doc/source/models/builtin/llm/aquila2-chat.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
.. _models_llm_aquila2-chat:

========================================
aquila2-chat
========================================

- **Context Length:** 2048
- **Model Name:** aquila2-chat
- **Languages:** zh
- **Abilities:** chat
- **Description:** Aquila2-chat series models are the chat models

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** none
- **Model ID:** BAAI/AquilaChat2-7B
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-7B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 2 (ggufv2, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 34
- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_0, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Model ID:** TheBloke/AquilaChat2-34B-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format ggufv2 --quantization ${quantization}


Model Spec 3 (gptq, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** gptq
- **Model Size (in billions):** 34
- **Quantizations:** Int4
- **Model ID:** TheBloke/AquilaChat2-34B-GPTQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-GPTQ>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B-Int4-GPTQ>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format gptq --quantization ${quantization}


Model Spec 4 (awq, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** awq
- **Model Size (in billions):** 34
- **Quantizations:** Int4
- **Model ID:** TheBloke/AquilaChat2-34B-AWQ
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/AquilaChat2-34B-AWQ>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format awq --quantization ${quantization}


Model Spec 5 (pytorch, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 34
- **Quantizations:** none
- **Model ID:** BAAI/AquilaChat2-34B
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-34B>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-34B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization}


Model Spec 6 (pytorch, 70 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 70
- **Quantizations:** none
- **Model ID:** BAAI/AquilaChat2-70B-Expr
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/AquilaChat2-70B-Expr>`__, `ModelScope <https://modelscope.cn/models/BAAI/AquilaChat2-70B-Expr>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2-chat --size-in-billions 70 --model-format pytorch --quantization ${quantization}

60 changes: 60 additions & 0 deletions doc/source/models/builtin/llm/aquila2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
.. _models_llm_aquila2:

========================================
aquila2
========================================

- **Context Length:** 2048
- **Model Name:** aquila2
- **Languages:** zh
- **Abilities:** generate
- **Description:** Aquila2 series models are the base language models

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** none
- **Model ID:** BAAI/Aquila2-7B
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/Aquila2-7B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 2 (pytorch, 34 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 34
- **Quantizations:** none
- **Model ID:** BAAI/Aquila2-34B
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/Aquila2-34B>`__, `ModelScope <https://modelscope.cn/models/BAAI/Aquila2-34B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2 --size-in-billions 34 --model-format pytorch --quantization ${quantization}


Model Spec 3 (pytorch, 70 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 70
- **Quantizations:** none
- **Model ID:** BAAI/Aquila2-70B-Expr
- **Model Hubs**: `Hugging Face <https://huggingface.co/BAAI/Aquila2-70B-Expr>`__, `ModelScope <https://modelscope.cn/models/BAAI/Aquila2-70B-Expr>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name aquila2 --size-in-billions 70 --model-format pytorch --quantization ${quantization}

30 changes: 30 additions & 0 deletions doc/source/models/builtin/llm/chatglm3-128k.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _models_llm_chatglm3-128k:

========================================
chatglm3-128k
========================================

- **Context Length:** 131072
- **Model Name:** chatglm3-128k
- **Languages:** en, zh
- **Abilities:** chat
- **Description:** ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 6 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 6
- **Quantizations:** 4-bit, 8-bit, none
- **Model ID:** THUDM/chatglm3-6b-128k
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/chatglm3-6b-128k>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name chatglm3-128k --size-in-billions 6 --model-format pytorch --quantization ${quantization}

45 changes: 45 additions & 0 deletions doc/source/models/builtin/llm/gorilla-openfunctions-v2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _models_llm_gorilla-openfunctions-v2:

========================================
gorilla-openfunctions-v2
========================================

- **Context Length:** 4096
- **Model Name:** gorilla-openfunctions-v2
- **Languages:** en
- **Abilities:** chat
- **Description:** OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** none
- **Model ID:** gorilla-llm/gorilla-openfunctions-v2
- **Model Hubs**: `Hugging Face <https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 2 (ggufv2, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 7
- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S, Q6_K
- **Model ID:** gorilla-llm//gorilla-openfunctions-v2-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/gorilla-llm//gorilla-openfunctions-v2-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name gorilla-openfunctions-v2 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization}

Loading

0 comments on commit 2857ec4

Please sign in to comment.