From 74562ee42d79b6e14357b92675aba598abc7b57b Mon Sep 17 00:00:00 2001
From: Alexander <kozzzloff@list.ru>
Date: Fri, 6 Sep 2024 11:21:16 +0400
Subject: [PATCH 1/5] Updated LLM compression related information

---
 .../llm_inference_guide/llm-inference-hf.rst  |  4 ++--
 .../weight-compression.rst                    | 24 +++++++++++++------
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
index f8023165b8f74c..1dd554ff101dfb 100644
--- a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
+++ b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -165,8 +165,8 @@ parameters.
    such as ``meta-llama/Llama-2-7b`` or ``Qwen/Qwen-7B-Chat``. These parameters are used by
    default only when ``bits=4`` is specified in the config.
 
-   For more details on compression options, refer to the
-   :doc:`weight compression guide <../../openvino-workflow/model-optimization-guide/weight-compression>`.
+   For more details on compression options, refer to the correspoding `Optimum documentation <https://huggingface.co/docs/optimum/en/intel/openvino/optimization#4-bit>`__.
+   For native NNCF weight quantization options, refer to :doc:`weight compression guide <../../openvino-workflow/model-optimization-guide/weight-compression>`.
 
    OpenVINO also supports 4-bit models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__
    library optimized with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. In this case, there
diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
index 67cd51a9554439..40c4e49ba262cb 100644
--- a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
+++ b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
@@ -182,9 +182,18 @@ trade-offs after optimization:
       ratio=0.9,
     )
 
+* ``scale_estimation`` - boolean parameter that enables the more accurate estimation of 
+  quantization scales. Especially helpful when the weights of all the layers are quantized to
+  4 bits. Requires dataset.
+
+* ``awq`` - boolean parameter that enables the AWQ method for more accurate INT4 weight
+  quantization. Especially helpful when the weights of all the layers are quantized to
+  4 bits. The method can sometimes result in reduced accuracy when used with
+  Dynamic Quantization of activations. Requires dataset.
+
 * ``dataset`` - calibration dataset for data-aware weight compression. It is required
-  for some compression options, for example, some types ``sensitivity_metric`` can use
-  data for precision selection.
+  for some compression options, for example, ``scale_estimation`` or ``awq``. Some types
+  of ``sensitivity_metric`` can use data for precision selection.
 
 * ``sensitivity_metric`` - controls the metric to estimate the sensitivity of compressing
   layers in the bit-width selection algorithm. Some of the metrics require dataset to be
@@ -212,14 +221,15 @@ trade-offs after optimization:
 * ``all_layers`` - boolean parameter that enables INT4 weight quantization of all
   Fully-Connected and Embedding layers, including the first and last layers in the model.
 
-* ``awq`` - boolean parameter that enables the AWQ method for more accurate INT4 weight
-  quantization. Especially helpful when the weights of all the layers are quantized to
-  4 bits. The method can sometimes result in reduced accuracy when used with
-  Dynamic Quantization of activations. Requires dataset.
-
 For data-aware weight compression refer to the following
 `example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama>`__.
 
+.. note::
+
+  Some of the methods can be stacked one on top of another to achieve a better
+  accuracy-performance trade-off after weight quantization. For example, Scale Estimation
+  method can be applied along with AWQ and mixed-precision quantization (``ratio`` parameter).
+
 The example below shows data-free 4-bit weight quantization
 applied on top of OpenVINO IR. Before trying the example, make sure Optimum Intel
 is installed in your environment by running the following command:

From 8dfb998132122af7120ab1ff9d21349ea7801eb2 Mon Sep 17 00:00:00 2001
From: Alexander Kozlov <alexander.kozlov@intel.com>
Date: Tue, 10 Sep 2024 11:30:16 +0400
Subject: [PATCH 2/5] Update
 docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
---
 .../learn-openvino/llm_inference_guide/llm-inference-hf.rst     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
index 1dd554ff101dfb..23166da5c1cf83 100644
--- a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
+++ b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -165,7 +165,7 @@ parameters.
    such as ``meta-llama/Llama-2-7b`` or ``Qwen/Qwen-7B-Chat``. These parameters are used by
    default only when ``bits=4`` is specified in the config.
 
-   For more details on compression options, refer to the correspoding `Optimum documentation <https://huggingface.co/docs/optimum/en/intel/openvino/optimization#4-bit>`__.
+   For more details on compression options, refer to the corresponding `Optimum documentation <https://huggingface.co/docs/optimum/en/intel/openvino/optimization#4-bit>`__.
    For native NNCF weight quantization options, refer to :doc:`weight compression guide <../../openvino-workflow/model-optimization-guide/weight-compression>`.
 
    OpenVINO also supports 4-bit models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__

From 367748958b08bfb17cb10e926b6f335f892fcb7e Mon Sep 17 00:00:00 2001
From: Alexander Kozlov <alexander.kozlov@intel.com>
Date: Tue, 10 Sep 2024 11:30:24 +0400
Subject: [PATCH 3/5] Update
 docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
---
 .../learn-openvino/llm_inference_guide/llm-inference-hf.rst     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
index 23166da5c1cf83..77cd0aca62021d 100644
--- a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
+++ b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -166,7 +166,7 @@ parameters.
    default only when ``bits=4`` is specified in the config.
 
    For more details on compression options, refer to the corresponding `Optimum documentation <https://huggingface.co/docs/optimum/en/intel/openvino/optimization#4-bit>`__.
-   For native NNCF weight quantization options, refer to :doc:`weight compression guide <../../openvino-workflow/model-optimization-guide/weight-compression>`.
+   For native NNCF weight quantization options, refer to the :doc:`weight compression guide <../../openvino-workflow/model-optimization-guide/weight-compression>`.
 
    OpenVINO also supports 4-bit models from Hugging Face `Transformers <https://github.com/huggingface/transformers>`__
    library optimized with `GPTQ <https://github.com/PanQiWei/AutoGPTQ>`__. In this case, there

From 1ff1040c1f6c5021463cd19d62f55bfe0757a89f Mon Sep 17 00:00:00 2001
From: Alexander Kozlov <alexander.kozlov@intel.com>
Date: Tue, 10 Sep 2024 11:30:45 +0400
Subject: [PATCH 4/5] Update
 docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
---
 .../model-optimization-guide/weight-compression.rst           | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
index 40c4e49ba262cb..fb9d196f6f25fe 100644
--- a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
+++ b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
@@ -182,8 +182,8 @@ trade-offs after optimization:
       ratio=0.9,
     )
 
-* ``scale_estimation`` - boolean parameter that enables the more accurate estimation of 
-  quantization scales. Especially helpful when the weights of all the layers are quantized to
+* ``scale_estimation`` - boolean parameter that enables more accurate estimation of 
+  quantization scales. Especially helpful when the weights of all layers are quantized to
   4 bits. Requires dataset.
 
 * ``awq`` - boolean parameter that enables the AWQ method for more accurate INT4 weight

From e01cc5add67e40cadb7ec80be30d40da1977b228 Mon Sep 17 00:00:00 2001
From: Alexander Kozlov <alexander.kozlov@intel.com>
Date: Tue, 10 Sep 2024 11:30:58 +0400
Subject: [PATCH 5/5] Update
 docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
---
 .../model-optimization-guide/weight-compression.rst         | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
index fb9d196f6f25fe..62350d04ace4ec 100644
--- a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
+++ b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
@@ -226,9 +226,9 @@ For data-aware weight compression refer to the following
 
 .. note::
 
-  Some of the methods can be stacked one on top of another to achieve a better
-  accuracy-performance trade-off after weight quantization. For example, Scale Estimation
-  method can be applied along with AWQ and mixed-precision quantization (``ratio`` parameter).
+  Some methods can be stacked on top of one another to achieve a better
+  accuracy-performance trade-off after weight quantization. For example, the Scale Estimation
+  method can be applied along with AWQ and mixed-precision quantization (the ``ratio`` parameter).
 
 The example below shows data-free 4-bit weight quantization
 applied on top of OpenVINO IR. Before trying the example, make sure Optimum Intel