Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Add documentation for batching on NPU plugin #26865

Merged
merged 17 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
96fa77c
Add doc for batching on NPU plugin
pereanub Sep 30, 2024
a894528
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 1, 2024
57c8d69
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
1d6fd1c
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
28ed755
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
c48bda5
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
2149cea
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
343dadf
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
ac8ec87
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
6601349
Update code according to review
pereanub Oct 1, 2024
63e826d
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
a32546d
Update description
pereanub Oct 2, 2024
38cfd47
Update text
pereanub Oct 2, 2024
a18cd79
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
88c79ad
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
3c17fbc
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
df261f9
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ NPU Device
:hidden:

npu-device/remote-tensor-api-npu-plugin
npu-device/batching-on-npu-plugin


The Neural Processing Unit is a low-power hardware solution, introduced with the
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
NPU Plugin Batching
===============================

pereanub marked this conversation as resolved.
Show resolved Hide resolved

.. meta::
:description: OpenVINO™ NPU plugin supports batching
either by executing concurrent inferences or by
relying on native compiler support for batching.

pereanub marked this conversation as resolved.
Show resolved Hide resolved
OpenVINO™ NPU plugin supports batching either by executing concurrent inferences or by relying on native compiler support for batching.

kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
First, the NPU plugin checks if the following conditions are met:
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved

* The batch size is on the first axis.
* All inputs and outputs have the same batch size.
* The model does not contain states.

**If the conditions are met**, the NPU plugin attempts to compile and execute the original model with batch_size forced to 1. This approach is due to current compiler limitations and ongoing work to improve performance for batch_size greater than one.
If the compilation is successful, the plugin detects a difference in batch size between the original model layout (with a batch size set to N)
and the transformed/compiled layout (with a batch size set to 1). Then it executes the following steps:

1. Internally constructs multiple command lists, one for each input.
2. Executes each command list for the proper offsets of input/output buffers.
3. Notifies the user of the completion of the inference request after all command lists have been executed.

This concurrency-based batching mode is transparent to the application. A single inference request handles all inputs from the batch.
While performance may be lower compared to regular batching (based on native compiler support), this mode provides basic batching functionality for use either with older drivers
or when the model cannot yet be compiled with a batch size larger than one.

**If the conditions are not met**, the NPU plugin tries to compile and execute the original model with the given
batch_size to N as any other regular model.

.. note::

With future performance improvements and support for compiling multiple models with a batch size larger
than one, the default order will change. NPU will try first to compile and execute the original model with the
given batch size and fall back to concurrent batching if compilation fails.
Loading