YOLO-NAS is a state-of-the-art object detector by Deci AI. This project implements the YOLO-NAS object detector in C++ with an OpenVINO backend to speed up inference performance.
- Supports both image and video inference.
- Faster inference speeds.
The following instructions demonstrates how to build this project on a Windows system and Linux systems supported by OpenVINO.
-
CMake v3.8+ - found at https://cmake.org/
-
MSVC 2017++ (Windows Build) - MinGW will not work on Windows Build as OpenVINO libraries are not compatible with MinGW.
-
GNU (Linux Build) - tested on v11.4.0.
-
OpenVINO Toolkit - tested on 2022.1. Download here.
-
OpenCV v4.0+ - tested on v4.7. Download here.
- Set the
OpenCV_DIR
environment variable to point to your../../opencv/build
directory. - Set the
OpenVINO_DIR
environment variable to point to your../../openvino/runtime/cmake
directory. - Run the following build commands:
a. [Windows] VS Developer Command Prompt:
cd \d <yolo-nas-openvino-cpp-directory>
cmake -S. -Bbuild -DCMAKE_BUILD_TYPE=Release
cd build
MSBuild yolo-nas-openvino-cpp.sln /property:Configuration=Release
b. [Linux] Bash:
cd <yolo-nas-openvino-cpp-directory>
cmake -S. -Bbuild -DCMAKE_BUILD_TYPE=Release
cd build
make
- The compiled
.exe
will be inside theRelease
folder for Windows build, while the executable will be in root folder for Linux build.
- Export the ONNX file:
from super_gradients.training import models
model = models.get("yolo_nas_s", pretrained_weights="coco")
model.eval()
model.prep_model_for_conversion(input_size=(1, 3, 640, 640))
models.convert_to_onnx(model=model, prep_model_for_conversion_kwargs={"input_size":(1, 3, 640, 640)}, out_path="yolo_nas_s.onnx")
- Convert the ONNX model to OpenVINO IR:
mo --input_model yolo_nas_s.onnx -s 255 --reverse_input_channels
- To run the inference, execute the following command:
yolo-nas-openvino-cpp --model <OPENVINO_IR_XML_PATH> [-i <IMAGE_PATH> | -v <VIDEO_PATH>] [--imgsz IMAGE_SIZE] [--gpu] [--iou-thresh IOU_THRESHOLD] [--score-thresh CONFIDENCE_THRESHOLD]
The following benchmarks were done on Google Colab using Intel® Xeon® Processor E5-2699 v4 @ 2.20GHz with 2 vCPUs.
Backend | Latency | FPS | Implementation |
---|---|---|---|
PyTorch | 867.02ms | 1.15 | Native (model.predict() in super_gradients ) |
ONNX C++ (via OpenCV DNN) | 962.27ms | 1.04 | Hyuotu |
ONNX Python | 626.37ms | 1.59 | Hyuotu |
OpenVINO C++ | 628.04ms | 1.59 | Y-T-G |
- Mohammed Yasin - @Y-T-G
Thanks to @Hyuto for his work on ONNX implementation of YOLO-NAS in C++ which was utilized in this project.
This project is licensed under the MIT License - see the LICENSE file for details.