Skip to content

Commit

Permalink
OVEP - PR 1.19 (#21443)
Browse files Browse the repository at this point in the history
### Description
Add OVEP  features for 1.19 

The PR has,
- Added support for EpCtx with ORT Session options for optimized
performance.
- Added bug fixes
- Support for OV 2024.3

---------

Co-authored-by: ubuntu <ubuntu@ubuntu-mtlp-118727.iind.intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: Maheshkar <ankit.maheshkar@intel.com>
  • Loading branch information
6 people authored Jul 25, 2024
1 parent ae3ec2e commit ca47f0f
Show file tree
Hide file tree
Showing 21 changed files with 271 additions and 121 deletions.
4 changes: 2 additions & 2 deletions cmake/onnxruntime_providers_openvino.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

# Header paths
find_package(OpenVINO REQUIRED COMPONENTS Runtime ONNX)
if(OpenVINO_VERSION VERSION_LESS 2023.0)
message(FATAL_ERROR "OpenVINO 2023.0 and newer are supported. Please, latest OpenVINO release")
if(OpenVINO_VERSION VERSION_LESS 2024.0)
message(FATAL_ERROR "OpenVINO 2024.0 and newer are supported. Please, use latest OpenVINO release")
endif()

if (WIN32)
Expand Down
8 changes: 5 additions & 3 deletions docs/python/ReadMeOV.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ OpenVINO™ Execution Provider for ONNX Runtime accelerates inference across man
- Intel® CPUs
- Intel® integrated GPUs
- Intel® discrete GPUs
- Intel® integrated NPUs (Windows only)

Installation
------------
Expand All @@ -15,26 +16,27 @@ Requirements
^^^^^^^^^^^^

- Ubuntu 18.04, 20.04, RHEL(CPU only) or Windows 10 - 64 bit
- Python 3.8 or 3.9 or 3.10 for Linux and only Python3.10 for Windows
- Python 3.9 or 3.10 or 3.11 for Linux and Python 3.10, 3.11 for Windows

This package supports:
- Intel® CPUs
- Intel® integrated GPUs
- Intel® discrete GPUs
- Intel® integrated NPUs (Windows only)

``pip3 install onnxruntime-openvino``

Please install OpenVINO™ PyPi Package separately for Windows.
For installation instructions on Windows please refer to `OpenVINO™ Execution Provider for ONNX Runtime for Windows <https://github.com/intel/onnxruntime/releases/>`_.

**OpenVINO™ Execution Provider for ONNX Runtime** Linux Wheels comes with pre-built libraries of OpenVINO™ version 2023.0.0 eliminating the need to install OpenVINO™ separately. The OpenVINO™ libraries are prebuilt with CXX11_ABI flag set to 0.
**OpenVINO™ Execution Provider for ONNX Runtime** Linux Wheels comes with pre-built libraries of OpenVINO™ version 2024.1.0 eliminating the need to install OpenVINO™ separately.

For more details on build and installation please refer to `Build <https://onnxruntime.ai/docs/build/eps.html#openvino>`_.

Usage
^^^^^

By default, Intel® CPU is used to run inference. However, you can change the default option to either Intel® integrated or discrete GPU.
By default, Intel® CPU is used to run inference. However, you can change the default option to either Intel® integrated GPU, discrete GPU, integrated NPU (Windows only).
Invoke `the provider config device type argument <https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#summary-of-options>`_ to change the hardware on which inferencing is done.

For more API calls and environment variables, see `Usage <https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#configuration-options>`_.
Expand Down
45 changes: 30 additions & 15 deletions onnxruntime/core/providers/openvino/backend_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@ BackendManager::BackendManager(const GlobalContext& global_context,
const onnxruntime::Node& fused_node,
const onnxruntime::GraphViewer& subgraph,
const logging::Logger& logger,
EPCtxHandler& ctx_handle) {
EPCtxHandler& ep_ctx_handle_) {
global_context_ = global_context;
ep_ctx_handle_ = ctx_handle;

openvino_sdk_version_ = std::to_string(global_context_.OpenVINO_Version.at(0)) + "." +
std::to_string(global_context_.OpenVINO_Version.at(1));
Expand Down Expand Up @@ -147,13 +146,20 @@ Status BackendManager::ExportCompiledBlobAsEPCtxNode(const onnxruntime::GraphVie

std::string model_blob_str;
auto compiled_model = concrete_backend_->GetOVCompiledModel();
auto graph_name = global_context_.onnx_model_path_name;
// Remove extension so we can append suffix to form the complete name of output graph
graph_name = [&]() {
size_t dot = graph_name.find_last_of(".");
if (dot == std::string::npos) return graph_name;
return graph_name.substr(0, dot);
}();
std::string graph_name = "";
// Epctx file path from SO is mapped to cache_dir variable for OVEP for readability
if (global_context_.cache_dir != "") {
graph_name = global_context_.cache_dir;
} else {
graph_name = global_context_.onnx_model_path_name;
// Remove extension so we can append suffix to form the complete name of output graph
graph_name = [&]() {
size_t dot = graph_name.find_last_of(".");
if (dot == std::string::npos) return graph_name;
return graph_name.substr(0, dot);
}();
graph_name = graph_name + "-ov_" + GetGlobalContext().device_type + "_blob.onnx";
}
// If embed_mode, then pass on the serialized blob
// If not embed_mode, dump the blob here and only pass on the path to the blob
if (global_context_.ep_context_embed_mode) {
Expand All @@ -162,18 +168,27 @@ Status BackendManager::ExportCompiledBlobAsEPCtxNode(const onnxruntime::GraphVie
model_blob_str = model_blob_stream.str();
ORT_ENFORCE(model_blob_str.size() != 0);
} else {
std::ofstream f(graph_name + ".blob", std::ios::out | std::ios::trunc | std::ios::binary);
compiled_model.export_model(f);
model_blob_str = graph_name + ".blob";
// Remove extension so we can append suffix to form the complete name of output graph
auto blob_name = [&]() {
size_t dot = graph_name.find_last_of(".");
if (dot == std::string::npos) return graph_name;
return graph_name.substr(0, dot);
}();
std::ofstream blob_file(blob_name + ".blob",
std::ios::out | std::ios::trunc | std::ios::binary);
if (!blob_file) {
ORT_THROW("Unable to open file for epctx model dump.");
}
compiled_model.export_model(blob_file);
model_blob_str = blob_name + ".blob";
}

ORT_RETURN_IF_ERROR(ep_ctx_handle_.ExportEPCtxModel(graph_body_viewer,
graph_name,
logger,
global_context_.ep_context_embed_mode,
model_blob_str,
openvino_sdk_version_,
GetGlobalContext().device_type));
openvino_sdk_version_));

return Status::OK();
}
Expand Down Expand Up @@ -248,7 +263,7 @@ static void DumpOpenVINOEPModel(std::string onnx_model_path_name,
ONNX_NAMESPACE::ModelProto* model_proto,
const onnxruntime::Node& fused_node) {
if (openvino_ep::backend_utils::IsDebugEnabled()) {
auto model_name = onnx_model_path_name.empty() ? "unknown.onnx" : onnx_model_path_name;
auto model_name = onnx_model_path_name.empty() ? "unknown.onnx" : std::move(onnx_model_path_name);
#ifdef _WIN32
size_t slash = model_name.find_last_of("\\");
#else
Expand Down
28 changes: 18 additions & 10 deletions onnxruntime/core/providers/openvino/backends/basic_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ BasicBackend::BasicBackend(const ONNX_NAMESPACE::ModelProto& model_proto,
PopulateConfigValue(device_config);

// Enable caching
EnableCaching();
EnableCaching(device_config);

// Setting OpenCL queue throttling for GPU
EnableGPUThrottling(device_config);
Expand Down Expand Up @@ -82,26 +82,28 @@ BasicBackend::BasicBackend(const ONNX_NAMESPACE::ModelProto& model_proto,
ie_cnn_network_, hw_target, device_config, subgraph_context_.subgraph_name);
}
#else // !IO_BUFFER_ENABLED
std::string prec_str = (global_context_.precision_str != "ACCURACY") ? global_context_.precision_str : global_context_.model_precision;
if (is_ep_ctx_graph_) {
// If the blob is held in an EPContext node, then skip FE+Compile
// and directly move on to creating a backend with the executable blob
exe_network_ = global_context_.ie_core.ImportModel(ep_ctx_handle.GetModelBlobStream(),
hw_target,
device_config,
global_context_.ep_context_embed_mode,
subgraph_context_.subgraph_name);
ie_cnn_network_ = exe_network_.Get().get_runtime_model();
} else if (!subgraph_context_.has_dynamic_input_shape) {
} else if ((!subgraph_context_.has_dynamic_input_shape) &&
((hw_target.find("AUTO") == std::string::npos) ||
(global_context_.OpenVINO_Version.at(0) >= 2024 && global_context_.OpenVINO_Version.at(1) > 2))) {
// Optimized OV compile_model API is supported with AUTO from version 2024.3 and above
// Inputs with static dimenstions
std::string prec_str = (global_context_.precision_str != "ACCURACY") ? global_context_.precision_str : global_context_.model_precision;
const std::string model = model_proto.SerializeAsString();
exe_network_ = global_context_.ie_core.CompileModel(model,
hw_target,
prec_str,
global_context_.cache_dir,
device_config,
subgraph_context_.subgraph_name);
ie_cnn_network_ = exe_network_.Get().get_runtime_model();
} else { // Inputs with dynamic dimensions
} else { // For all other types use ov::Model Type
ie_cnn_network_ = CreateOVModel(model_proto, global_context_, const_outputs_map_);
exe_network_ = global_context_.ie_core.CompileModel(
ie_cnn_network_, hw_target, device_config, subgraph_context_.subgraph_name);
Expand Down Expand Up @@ -173,13 +175,19 @@ void BasicBackend::PopulateConfigValue(ov::AnyMap& device_config) {
}
}

void BasicBackend::EnableCaching() {
void BasicBackend::EnableCaching(ov::AnyMap& device_config) {
// cache_dir argument has no effect when working with an embed-mode EPContext Graph
if (is_ep_ctx_graph_) return;

if (!global_context_.cache_dir.empty()) {
if (!global_context_.cache_dir.empty() && !global_context_.export_ep_ctx_blob) {
LOGS_DEFAULT(INFO) << log_tag << "Enables Caching";
global_context_.ie_core.SetCache(global_context_.cache_dir, global_context_.device_type);
if (global_context_.device_type.find("AUTO:GPU") != std::string::npos) {
std::pair<std::string, ov::Any> device_property;
device_property = std::make_pair("CACHE_DIR", global_context_.cache_dir);
device_config.emplace(ov::device::properties("GPU", device_property));
} else {
global_context_.ie_core.SetCache(global_context_.cache_dir);
}
}
}

Expand Down Expand Up @@ -274,7 +282,7 @@ void BasicBackend::StartAsyncInference(Ort::KernelContext& context, OVInferReque
}

try {
infer_request->SetTensor(input_name, tensor_ptr);
infer_request->SetTensor(std::move(input_name), tensor_ptr);
} catch (const char* msg) {
ORT_THROW(msg);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class BasicBackend : public IBackend {
void PopulateCompiledDirectory(std::string, std::string&, std::string&, bool&);
bool ValidateSubgraph(std::map<std::string, std::shared_ptr<ov::Node>>& const_outputs_map);
void PopulateConfigValue(ov::AnyMap& device_config);
void EnableCaching();
void EnableCaching(ov::AnyMap& device_config);
void EnableGPUThrottling(ov::AnyMap& device_config);
void EnableStreams();
void SetNumThreads(ov::AnyMap& device_config);
Expand Down
14 changes: 7 additions & 7 deletions onnxruntime/core/providers/openvino/onnx_ctx_model_helper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ Status EPCtxHandler::ExportEPCtxModel(const GraphViewer& graph_viewer,
const logging::Logger& logger,
const bool& ep_context_embed_mode,
const std::string& model_blob_str,
const std::string& openvino_sdk_version,
const std::string& device_type) const {
const std::string& openvino_sdk_version) const {
auto model_build = graph_viewer.CreateModel(logger);
auto& graph_build = model_build->MainGraph();

Expand Down Expand Up @@ -77,9 +76,12 @@ Status EPCtxHandler::ExportEPCtxModel(const GraphViewer& graph_viewer,
model_proto->set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);

// Finally, dump the model
std::ofstream dump(graph_name + "-ov_" + device_type + "_blob.onnx",
std::ios::out | std::ios::trunc | std::ios::binary);
model_proto->SerializeToOstream(dump);
std::ofstream epctx_onnx_model(graph_name,
std::ios::out | std::ios::trunc | std::ios::binary);
if (!epctx_onnx_model) {
ORT_THROW("Unable to create epctx onnx model file ");
}
model_proto->SerializeToOstream(epctx_onnx_model);

LOGS_DEFAULT(VERBOSE) << "[OpenVINO EP] Export blob as EPContext Node";

Expand All @@ -90,9 +92,7 @@ Status EPCtxHandler::ImportBlobFromEPCtxModel(const GraphViewer& graph_viewer) {
auto node = graph_viewer.GetNode(0);
auto& attrs = node->GetAttributes();
ORT_ENFORCE(attrs.count(EP_CACHE_CONTEXT) > 0);

model_stream_ = std::make_shared<std::istringstream>(attrs.at(EP_CACHE_CONTEXT).s());

LOGS_DEFAULT(VERBOSE) << "[OpenVINO EP] Read blob from EPContext Node";

is_valid_ep_ctx_graph_ = true;
Expand Down
3 changes: 1 addition & 2 deletions onnxruntime/core/providers/openvino/onnx_ctx_model_helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ class EPCtxHandler {
const logging::Logger& logger,
const bool& ep_context_embed_mode,
const std::string& model_blob_str,
const std::string& openvino_sdk_version,
const std::string& device_type) const;
const std::string& openvino_sdk_version) const;
Status ImportBlobFromEPCtxModel(const GraphViewer& graph_viewer);
bool CheckForOVEPCtxNode(const GraphViewer& graph_viewer, std::string openvino_sdk_version) const;
bool IsValidOVEPCtxGraph() const { return is_valid_ep_ctx_graph_; }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ OpenVINOExecutionProvider::OpenVINOExecutionProvider(const OpenVINOExecutionProv
global_context_->export_ep_ctx_blob = info.export_ep_ctx_blob_;
global_context_->enable_qdq_optimizer = info.enable_qdq_optimizer_;
global_context_->disable_cpu_fallback = info.disable_cpu_fallback_;
global_context_->ep_context_embed_mode = info.so_epctx_embed_mode_;

// to check if target device is available
// using ie_core capability GetAvailableDevices to fetch list of devices plugged in
Expand All @@ -47,7 +48,7 @@ OpenVINOExecutionProvider::OpenVINOExecutionProvider(const OpenVINOExecutionProv
info.device_type_.find("AUTO") != std::string::npos) {
device_found = true;
} else {
for (std::string device : available_devices) {
for (const std::string& device : available_devices) {
if (device.rfind(info.device_type_, 0) == 0) {
if (info.device_type_.find("GPU") != std::string::npos && (info.precision_ == "FP32" ||
info.precision_ == "FP16" ||
Expand Down
Loading

0 comments on commit ca47f0f

Please sign in to comment.