Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Add documentation for batching on NPU plugin #26865

Merged
merged 17 commits into from
Oct 2, 2024
Merged
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
96fa77c
Add doc for batching on NPU plugin
pereanub Sep 30, 2024
a894528
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 1, 2024
57c8d69
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
1d6fd1c
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
28ed755
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
c48bda5
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
2149cea
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
343dadf
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
ac8ec87
Update docs/articles_en/openvino-workflow/running-inference/inference…
pereanub Oct 1, 2024
6601349
Update code according to review
pereanub Oct 1, 2024
63e826d
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
a32546d
Update description
pereanub Oct 2, 2024
38cfd47
Update text
pereanub Oct 2, 2024
a18cd79
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
88c79ad
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
3c17fbc
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
df261f9
Update docs/articles_en/openvino-workflow/running-inference/inference…
kblaszczak-intel Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
NPU Plugin batching
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
===============================
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved

pereanub marked this conversation as resolved.
Show resolved Hide resolved

.. meta::
:description: The Bathing is handled on the NPU plugin in OpenVINO™
in two different modes, concurrency-based inferences
or handled by the compiler.
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
pereanub marked this conversation as resolved.
Show resolved Hide resolved

pereanub marked this conversation as resolved.
Show resolved Hide resolved

kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
The NPU plugin will first check if the following conditions are met:
* Batch size is on the first axis.
* All inputs and outputs have the same batch size.
* Model does not contain states.
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
pereanub marked this conversation as resolved.
Show resolved Hide resolved

In case conditions are met, due to current compiler limitations and ongoing work on performance improvements for batch_size higher than one,
the NPU plugin will first try to compile and execute the original model with forced batch_size to 1.
pereanub marked this conversation as resolved.
Show resolved Hide resolved
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
In case this compilation succeeds, the plugin will detect a difference between the original model layout
and transformed/compiled layout (in batch size) and would:
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
- internally construct multiple command lists, one for each input
- execute each command list for proper offsets of input/output buffers
- once all command lists are executed, the plugin will notify the user of the completion of the inference request.
pereanub marked this conversation as resolved.
Show resolved Hide resolved
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved

This batching mode based on concurrency is transparent to the application. One single inference request will handle all inputs from the batch.
Performance might be lower compared to regular batching; this mode is intended to offer basic batching functionality on older drivers
or in case the model cannot yet be compiled with a batch size larger than one.
pereanub marked this conversation as resolved.
Show resolved Hide resolved
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved

In case these conditions are not met the NPU plugin will try to compile and execute the original model with the given
batch_size to N as any other regular model.
pereanub marked this conversation as resolved.
Show resolved Hide resolved
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved

Note: Once the performance improves and multiple models can compile with a batch size larger than one,
the default order will be changed; NPU will try first to compile and execute the original model with the given
batch size and fall back to concurrent batching if compilation fails.
pereanub marked this conversation as resolved.
Show resolved Hide resolved
kblaszczak-intel marked this conversation as resolved.
Show resolved Hide resolved
Loading