add phi3.5

openvinotoolkit · Sep 9, 2024 · db9a826 · db9a826
1 parent 735927a
commit db9a826
Show file tree

Hide file tree

Showing 4 changed files with 22 additions and 2 deletions.
diff --git a/notebooks/llm-chatbot/README.md b/notebooks/llm-chatbot/README.md
@@ -25,6 +25,7 @@ The available options are:
 >You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/google/gemma-2b-it), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
 * **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).
 * **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).
+* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).
 *  **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).
 >**Note**: run model with demo, you will need to accept license agreement. 
 >You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/google/gemma-7b-it), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

diff --git a/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb b/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
@@ -174,7 +174,8 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "* **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
+    "* **phi-3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
+    "* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).\n",
     "* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
     "*  **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",

diff --git a/notebooks/llm-chatbot/llm-chatbot.ipynb b/notebooks/llm-chatbot/llm-chatbot.ipynb
@@ -158,7 +158,9 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "* **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
+    "* **phi-3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
+    "* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).\n",
+    "\n",
     "* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
     "*  **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",

diff --git a/utils/llm_config.py b/utils/llm_config.py
@@ -271,6 +271,22 @@ def qwen_completion_to_prompt(completion):
             <|assistant|>""",
             "completion_to_prompt": phi_completion_to_prompt,
         },
+        "phi-3.5-mini-instruct":{
+            "model_id": "microsoft/Phi-3.5-mini-instruct",
+            "remote_code": True,
+            "start_message": "<|system|>\n{DEFAULT_SYSTEM_PROMPT}<|end|>\n",
+            "history_template": "<|user|>\n{user}<|end|> \n<|assistant|>\n{assistant}<|end|>\n",
+            "current_message_template": "<|user|>\n{user}<|end|> \n<|assistant|>\n{assistant}",
+            "stop_tokens": ["<|end|>"],
+            "rag_prompt_template": f"""<|system|> {DEFAULT_RAG_PROMPT }<|end|>"""
+            + """
+            <|user|>
+            Question: {input} 
+            Context: {context} 
+            Answer: <|end|>
+            <|assistant|>""",
+            "completion_to_prompt": phi_completion_to_prompt,
+        }
     },
     "Chinese": {
         "qwen2-0.5b-instruct": {