Skip to content

Commit

Permalink
[Agent]update llm config (#1980)
Browse files Browse the repository at this point in the history
Since `DYNAMIC_QUANTIZATION` only working for CPU currently, replace it
with general configuration in case GPU selected.
  • Loading branch information
openvino-dev-samples authored Apr 30, 2024
1 parent 25533f5 commit be9bd87
Showing 1 changed file with 27 additions and 19 deletions.
46 changes: 27 additions & 19 deletions notebooks/llm-agent-langchain/llm-agent-langchain.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -291,25 +291,7 @@
"id": "77244c52",
"metadata": {},
"source": [
"OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/).\n",
"\n",
"You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "60653b85-2304-447e-a6fd-3a2ce9c69d75",
"metadata": {},
"outputs": [],
"source": [
"ov_config = {\n",
" \"KV_CACHE_PRECISION\": \"u8\",\n",
" \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n",
" \"PERFORMANCE_HINT\": \"LATENCY\",\n",
" \"NUM_STREAMS\": \"1\",\n",
" \"CACHE_DIR\": \"\",\n",
"}"
"OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/)."
]
},
{
Expand All @@ -321,6 +303,8 @@
"source": [
"from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline\n",
"\n",
"ov_config = {\"PERFORMANCE_HINT\": \"LATENCY\", \"NUM_STREAMS\": \"1\", \"CACHE_DIR\": \"\"}\n",
"\n",
"ov_llm = HuggingFacePipeline.from_model_id(\n",
" model_id=model_path,\n",
" task=\"text-generation\",\n",
Expand All @@ -330,6 +314,30 @@
")"
]
},
{
"cell_type": "markdown",
"id": "d70905e2",
"metadata": {},
"source": [
"You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization] on CPU(https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "60653b85-2304-447e-a6fd-3a2ce9c69d75",
"metadata": {},
"outputs": [],
"source": [
"ov_config = {\n",
" \"KV_CACHE_PRECISION\": \"u8\",\n",
" \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n",
" \"PERFORMANCE_HINT\": \"LATENCY\",\n",
" \"NUM_STREAMS\": \"1\",\n",
" \"CACHE_DIR\": \"\",\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "52a9a190",
Expand Down

0 comments on commit be9bd87

Please sign in to comment.