diff --git a/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb b/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb index dcbb5a22a8c..235730d09b3 100644 --- a/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb +++ b/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb @@ -291,25 +291,7 @@ "id": "77244c52", "metadata": {}, "source": [ - "OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/).\n", - "\n", - "You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "60653b85-2304-447e-a6fd-3a2ce9c69d75", - "metadata": {}, - "outputs": [], - "source": [ - "ov_config = {\n", - " \"KV_CACHE_PRECISION\": \"u8\",\n", - " \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n", - " \"PERFORMANCE_HINT\": \"LATENCY\",\n", - " \"NUM_STREAMS\": \"1\",\n", - " \"CACHE_DIR\": \"\",\n", - "}" + "OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/)." ] }, { @@ -321,6 +303,8 @@ "source": [ "from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline\n", "\n", + "ov_config = {\"PERFORMANCE_HINT\": \"LATENCY\", \"NUM_STREAMS\": \"1\", \"CACHE_DIR\": \"\"}\n", + "\n", "ov_llm = HuggingFacePipeline.from_model_id(\n", " model_id=model_path,\n", " task=\"text-generation\",\n", @@ -330,6 +314,30 @@ ")" ] }, + { + "cell_type": "markdown", + "id": "d70905e2", + "metadata": {}, + "source": [ + "You can get additional inference speed improvement with [Dynamic Quantization of activations and KV-cache quantization] on CPU(https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations). These options can be enabled with `ov_config` as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "60653b85-2304-447e-a6fd-3a2ce9c69d75", + "metadata": {}, + "outputs": [], + "source": [ + "ov_config = {\n", + " \"KV_CACHE_PRECISION\": \"u8\",\n", + " \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n", + " \"PERFORMANCE_HINT\": \"LATENCY\",\n", + " \"NUM_STREAMS\": \"1\",\n", + " \"CACHE_DIR\": \"\",\n", + "}" + ] + }, { "cell_type": "markdown", "id": "52a9a190",