Skip to content

Commit

Permalink
add phi3.5
Browse files Browse the repository at this point in the history
  • Loading branch information
eaidova committed Sep 9, 2024
1 parent 735927a commit db9a826
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 2 deletions.
1 change: 1 addition & 0 deletions notebooks/llm-chatbot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The available options are:
>You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/google/gemma-2b-it), carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).
* **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).
* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).
* **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).
>**Note**: run model with demo, you will need to accept license agreement.
>You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/google/gemma-7b-it), carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
Expand Down
3 changes: 2 additions & 1 deletion notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,8 @@
" except OSError:\n",
" notebook_login()\n",
"```\n",
"* **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
"* **phi-3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
"* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).\n",
"* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
"* **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
">**Note**: run model with demo, you will need to accept license agreement. \n",
Expand Down
4 changes: 3 additions & 1 deletion notebooks/llm-chatbot/llm-chatbot.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,9 @@
" except OSError:\n",
" notebook_login()\n",
"```\n",
"* **phi3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
"* **phi-3-mini-instruct** - The Phi-3-Mini is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [Microsoft blog](https://aka.ms/phi3blog-april) and [technical report](https://aka.ms/phi3-tech-report).\n",
"* **phi-3.5-mini-instruct** - Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. More details about model can be found in [model card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Microsoft blog](https://aka.ms/phi3.5-techblog) and [technical report](https://arxiv.org/abs/2404.14219).\n",
"\n",
"* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
"* **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
">**Note**: run model with demo, you will need to accept license agreement. \n",
Expand Down
16 changes: 16 additions & 0 deletions utils/llm_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,22 @@ def qwen_completion_to_prompt(completion):
<|assistant|>""",
"completion_to_prompt": phi_completion_to_prompt,
},
"phi-3.5-mini-instruct":{
"model_id": "microsoft/Phi-3.5-mini-instruct",
"remote_code": True,
"start_message": "<|system|>\n{DEFAULT_SYSTEM_PROMPT}<|end|>\n",
"history_template": "<|user|>\n{user}<|end|> \n<|assistant|>\n{assistant}<|end|>\n",
"current_message_template": "<|user|>\n{user}<|end|> \n<|assistant|>\n{assistant}",
"stop_tokens": ["<|end|>"],
"rag_prompt_template": f"""<|system|> {DEFAULT_RAG_PROMPT }<|end|>"""
+ """
<|user|>
Question: {input}
Context: {context}
Answer: <|end|>
<|assistant|>""",
"completion_to_prompt": phi_completion_to_prompt,
}
},
"Chinese": {
"qwen2-0.5b-instruct": {
Expand Down

0 comments on commit db9a826

Please sign in to comment.