[Agent]update llm config (#1980)

Since `DYNAMIC_QUANTIZATION` only working for CPU currently, replace it with general configuration in case GPU selected.
openvinotoolkit · Apr 30, 2024 · be9bd87 · be9bd87
1 parent 25533f5
commit be9bd87
Showing 1 changed file with 27 additions and 19 deletions.
diff --git a/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb b/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb
@@ -291,25 +291,7 @@
    "id": "77244c52",
    "metadata": {},
    "source": [
-    "OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/).\n",
-    "\n",
-    "You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "60653b85-2304-447e-a6fd-3a2ce9c69d75",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ov_config = {\n",
-    "    \"KV_CACHE_PRECISION\": \"u8\",\n",
-    "    \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n",
-    "    \"PERFORMANCE_HINT\": \"LATENCY\",\n",
-    "    \"NUM_STREAMS\": \"1\",\n",
-    "    \"CACHE_DIR\": \"\",\n",
-    "}"
+    "OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/)."
    ]
   },
   {
@@ -321,6 +303,8 @@
    "source": [
     "from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline\n",
     "\n",
+    "ov_config = {\"PERFORMANCE_HINT\": \"LATENCY\", \"NUM_STREAMS\": \"1\", \"CACHE_DIR\": \"\"}\n",
+    "\n",
     "ov_llm = HuggingFacePipeline.from_model_id(\n",
     "    model_id=model_path,\n",
     "    task=\"text-generation\",\n",
@@ -330,6 +314,30 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "d70905e2",
+   "metadata": {},
+   "source": [
+    "You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization] on CPU(https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "60653b85-2304-447e-a6fd-3a2ce9c69d75",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ov_config = {\n",
+    "    \"KV_CACHE_PRECISION\": \"u8\",\n",
+    "    \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n",
+    "    \"PERFORMANCE_HINT\": \"LATENCY\",\n",
+    "    \"NUM_STREAMS\": \"1\",\n",
+    "    \"CACHE_DIR\": \"\",\n",
+    "}"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "52a9a190",