add qwen2.5 into llm chat notebooks (#2423)

openvinotoolkit · Sep 30, 2024 · 8e78b35 · 8e78b35
1 parent 8cf2a38
commit 8e78b35
Show file tree

Hide file tree

Showing 6 changed files with 94 additions and 21 deletions.
diff --git a/notebooks/llm-chatbot/README.md b/notebooks/llm-chatbot/README.md
@@ -52,9 +52,8 @@ The available options are:
 * **llama-3.1-8b-instruct** - The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. More details about model can be found in [Meta blog post](https://ai.meta.com/blog/meta-llama-3-1/), [model website](https://llama.meta.com) and [model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
 >**Note**: run model with demo, you will need to accept license agreement. 
 >You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
-* **qwen2-1.5b-instruct/qwen2-7b-instruct** - Qwen2 is the new series of Qwen large language models.Compared with the state-of-the-art open source language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most open source models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
-For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2), and [Documentation](https://qwen.readthedocs.io/en/latest/).
-* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).
+* **qwen2.5-0.5b-instruct/qwen2.5-1.5b-instruct/qwen2.5-3b-instruct/qwen2.5-7b-instruct/qwen2.5-14b-instruct** - Qwen2.5 is the latest series of Qwen large language models. Comparing with Qwen2, Qwen2.5 series brings significant improvements in coding, mathematics and general knowledge skills. Additionally, it brings long-context and multiple languages support including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. 
+For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
 * **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.
 * **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the ShareGPT-Vicuna, [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and Evol-Instruct datasets. More details about the model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).
 * **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)

diff --git a/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb b/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
@@ -312,9 +312,8 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "* **qwen2-1.5b-instruct/qwen2-7b-instruct** - Qwen2 is the new series of Qwen large language models.Compared with the state-of-the-art open source language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most open source models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.\n",
-    "For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2), and [Documentation](https://qwen.readthedocs.io/en/latest/). \n",
-    "* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
+    "* **qwen2.5-0.5b-instruct/qwen2.5-1.5b-instruct/qwen2.5-3b-instruct/qwen2.5-7b-instruct/qwen2.5-14b-instruct** - Qwen2.5 is the latest series of Qwen large language models. Comparing with Qwen2, Qwen2.5 series brings significant improvements in coding, mathematics and general knowledge skills. Additionally, it brings long-context and multiple languages support including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. \n",
+    "For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).\n",
     "* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.\n",
     "* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)\n",
     "* **mistral-7b** - The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the [model card](https://huggingface.co/mistralai/Mistral-7B-v0.1), [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).\n",

diff --git a/notebooks/llm-chatbot/llm-chatbot.ipynb b/notebooks/llm-chatbot/llm-chatbot.ipynb
@@ -263,9 +263,8 @@
     "        notebook_login()\n",
     "```\n",
     "\n",
-    "* **qwen2-1.5b-instruct/qwen2-7b-instruct** - Qwen2 is the new series of Qwen large language models.Compared with the state-of-the-art open source language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most open source models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.\n",
-    "For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2), and [Documentation](https://qwen.readthedocs.io/en/latest/). \n",
-    "* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
+    "* **qwen2.5-0.5b-instruct/qwen2.5-1.5b-instruct/qwen2.5-3b-instruct/qwen2.5-7b-instruct/qwen2.5-14b-instruct** - Qwen2.5 is the latest series of Qwen large language models. Comparing with Qwen2, Qwen2.5 series brings significant improvements in coding, mathematics and general knowledge skills. Additionally, it brings long-context and multiple languages support including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. \n",
+    "For more details, please refer to [model_card](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).\n",
     "* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.\n",
     "* **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. More details about the model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).\n",
     "* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)\n",
@@ -686,6 +685,11 @@
     "            \"group_size\": 128,\n",
     "            \"ratio\": 0.5,\n",
     "        },\n",
+    "        \"qwen2.5-7b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-3b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-14b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-1.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-0.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
     "        \"default\": {\n",
     "            \"sym\": False,\n",
     "            \"group_size\": 128,\n",
@@ -928,6 +932,7 @@
     "from transformers import AutoConfig, AutoTokenizer\n",
     "from optimum.intel.openvino import OVModelForCausalLM\n",
     "\n",
+    "import openvino as ov\n",
     "import openvino.properties as props\n",
     "import openvino.properties.hint as hints\n",
     "import openvino.properties.streams as streams\n",
@@ -948,6 +953,8 @@
     "\n",
     "# On a GPU device a model is executed in FP16 precision. For red-pajama-3b-chat model there known accuracy\n",
     "# issues caused by this, which we avoid by setting precision hint to \"f32\".\n",
+    "core = ov.Core()\n",
+    "\n",
     "if model_id.value == \"red-pajama-3b-chat\" and \"GPU\" in core.available_devices and device.value in [\"GPU\", \"AUTO\"]:\n",
     "    ov_config[\"INFERENCE_PRECISION_HINT\"] = \"f32\"\n",
     "\n",

diff --git a/notebooks/llm-rag-langchain/llm-rag-langchain.ipynb b/notebooks/llm-rag-langchain/llm-rag-langchain.ipynb
@@ -601,6 +601,11 @@
     "            \"group_size\": 128,\n",
     "            \"ratio\": 0.5,\n",
     "        },\n",
+    "        \"qwen2.5-7b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-3b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-14b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-1.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-0.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
     "        \"default\": {\n",
     "            \"sym\": False,\n",
     "            \"group_size\": 128,\n",

diff --git a/notebooks/llm-rag-llamaindex/llm-rag-llamaindex.ipynb b/notebooks/llm-rag-llamaindex/llm-rag-llamaindex.ipynb
@@ -602,6 +602,11 @@
     "            \"group_size\": 128,\n",
     "            \"ratio\": 0.5,\n",
     "        },\n",
+    "        \"qwen2.5-7b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-3b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-14b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-1.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
+    "        \"qwen2.5-0.5b-instruct\": {\"sym\": True, \"group_size\": 128, \"ratio\": 1.0},\n",
     "        \"default\": {\n",
     "            \"sym\": False,\n",
     "            \"group_size\": 128,\n",