Deploying YOLOv8 to ARTPEC-8 cameras #144

philippe-heitzmann · 2023-04-10T22:03:00Z

philippe-heitzmann
Apr 10, 2023

Can ARTPEC-8 cameras run YOLOv8?

Could the Axis team please advise if ARTPEC-8 DLPU chips are able to support running int8 per-tensor quantized YOLOv8m? For reference our team is seeing model loading errors when attempting to load this model to the inference server leveraging the object_detector_python script, with these errors pointing to the model containing > max 16 graph partitions and certain operations not supported on ARTPEC-8 cameras (see docs for full overview of YOLOv8 architecture / layers), which we were curious to learn more about why this would break. Any pointers on this would be much appreciated if possible, thank you.

inference-server_1        | ERROR in Inference: Failed to load model yolov8m_int8.tflite (Could not load model: Model contains too many graph partitions (137 > 16) and 68 of the graph partitions can't be run on the device. Consider redesigning the model to better utilize the device. (Is it fully integer quantized? Is it using non-supported operations?))

To reproduce

Export YOLOv8m to int8 per-tensor quantized .tflite weights using exporter.py script made available here
Quantization parameters:

int8
per-tensor

Deploy exported .tflite weights to camera
Run docker-compose with below command leveraging object_detector_python scripts

docker compose --env-file ./config/env.$ARCH.$CHIP up
# env file:
ARCH=aarch64
CHIP=artpec8

Observe above failed model loading errors

Environment

Axis device model: AXIS P3267-LVE Dome Camera
Axis device firmware version: 11.3.70
Stack trace or logs: Console output when running
OS and version: [e.g. macOS v13.2.1]
Client and server application scripts: object_detector_python

Answered by Corallo

Apr 13, 2023

Hello @philippe-heitzmann

Runningjournalctl -u larodafter trying to load the model (and failing) gives you more info about the problem with the model.
Specifically, running it after loading your model you'll see
Apr 13 11:17:43 axis-b8a44f277efe sh[1156]: ERROR: hybrid data type is not supported in conv2d.
This means that the conv2d are quantized only in the kernel parameters, but the convolution excepts input and produce outputs as float. This is not supported.

Besides, looking at your model with netron you can see that other layers are not quantized, like the Add and Mul nodes, this will make the execution fall back to the cpu after each convolution, and that's why you see that error s…

View full answer

Corallo · 2023-04-11T07:40:05Z

Corallo
Apr 11, 2023

Hello @philippe-heitzmann

Can you show the command that you used to run the export.py script and produce a tflite quantized per tensor?
Could you maybe also share the tflite that you get from the conversion?
That error is introduced to prevent users to use a model that has many nodes (>16) that fall back to the CPU in the middle of the graph. Since falling back to the CPU has a big time overhead, doing it more than ~16 times would make it so inefficient to make it not worth to use the accelerator, but rather run it directly on the CPU.
In your case, after the quantization, your graph have many of these nodes. It is not clear from the architecture description what is the problem specifically, you would need to look at the tflite and see if there are some floating point layers.

0 replies

philippe-heitzmann · 2023-04-11T19:43:19Z

philippe-heitzmann
Apr 11, 2023
Author

Hi @Corallo yes definitely please see below for code used for exporting this YOLOv8m model, and link for the model weights outputted from the below:

# running using YOLOv8 Dockerfile (https://github.com/ultralytics/ultralytics/blob/main/docker/Dockerfile)
from ultralytics import YOLO

model = YOLO('/weights/yolov8m.pt')
# Export to int8, uses per-tensor quantization using onnx2tf module automatically 
# adds `-oiqt -qt per-tensor` flags to onnx2tf command, see onnx2tf repo (https://github.com/PINTO0309/onnx2tf)
model.export(format='tflite', int8=True)

As it seems the error message is indicating there are ~68 graph partitions with unsupported operations in the model graph, we were wondering if the Axis team could please advise if any intuition on which of these layers may be problematic in this case, especially in the context of previous reports such as #112 of YOLOv5s being able to run (albeit slowly) on ARPTEC-8 DLPU chips, given both of these v5 & v8 models would use mostly identical types of convolutions / operations etc. Any pointers on this would be much appreciated if possible, thank you.

0 replies

Corallo · 2023-04-13T09:33:48Z

Corallo
Apr 13, 2023

Hello @philippe-heitzmann

Runningjournalctl -u larodafter trying to load the model (and failing) gives you more info about the problem with the model.
Specifically, running it after loading your model you'll see
Apr 13 11:17:43 axis-b8a44f277efe sh[1156]: ERROR: hybrid data type is not supported in conv2d.
This means that the conv2d are quantized only in the kernel parameters, but the convolution excepts input and produce outputs as float. This is not supported.

Besides, looking at your model with netron you can see that other layers are not quantized, like the Add and Mul nodes, this will make the execution fall back to the cpu after each convolution, and that's why you see that error saying that the graph is divided in 60+ pieces.

This seems to be a problem with onnx2tf, I am not sure if it has a flag to ask to quantize not only the filters, but also everything else.

0 replies

Corallo · 2023-04-14T09:04:52Z

Corallo
Apr 14, 2023

@philippe-heitzmann
I took another look at the onnx2tf tool, but I tried it on yolov5, maybe this will still apply to you:
Running
onnx2tf -i ./yolov5s.onnx -oiqt -qt per-tensor -ioqd uint8
produce several tflite models

yolov5s_relu_dynamic_range_quant.tflite
yolov5s_relu_float16.tflite
yolov5s_relu_float32.tflite
yolov5s_relu_full_integer_quant.tflite
yolov5s_relu_full_integer_quant_with_int16_act.tflite
yolov5s_relu_integer_quant.tflite
yolov5s_relu_integer_quant_with_int16_act.tflite

The model with the correct quantization is yolov5s_relu_full_integer_quant.tflite
Maybe you can try the same, exporting the model from yolov8 to onnx and quantizing by yourself with onnx2tf.

Let us know how that works, we would be happy to hear that you succeeded.

0 replies

egSat · 2023-04-27T12:49:37Z

egSat
Apr 27, 2023

Hi Corallo. Thank you very much for your efforts.
I have tried it with yolov7 from https://github.com/WongKinYiu/yolov7 but it fails.
First I exported it to onnx with the following command: python3.8 export.py --weights yolov7.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640

Then I converted it with your command: onnx2tf -i yolov7.onnx -oiqt -qt per-tensor -ioqd uint8
I picked up yolov7_full_integer_quant.tflite but when loading to the camera fails, saying:
ERROR in Inference: Failed to load model yolov7_full_integer_quant.tflite (Could not load model: Could not build an interpreter of the model)

From the journactl of larod the log is more concise:
Didn't find op for builtin opcode 'MUL' version '5'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

EDIT: the same failure happens with yolov7-tiny

11 replies

Corallo May 3, 2023

Nice. Did you use onnx2tf to quantize it?

The precision might get severely damage by quantization, and some models are more sensible than others.
The first thing to check would be to verify that give a fixed input, the output is the same when running it on your computer and on the device. In this other discussion, I linked a script which you can adapt for your testing.

Regarding the crash, can you share the log of the camera after the restart?

egSat May 3, 2023

Yes, I used the same commands from #144 (comment)
I updated the acap runtime to 1.3.1 from 1.2.0 that has the git, in env.aarch64.artpec8, the new line is INFERENCE_SERVER_IMAGE=axisecp/acap-runtime:1.3.1-aarch64-containerized

As for the error, I attach the log. I hope it helps
serverreport_cgi.txt

kind regards

Corallo May 3, 2023

In the log I see
2023-04-28T16:10:29.225+02:00 axis-b8a44f27e63c [ ERR ] kernel: [ 268.528870][ T2396] Out of memory: Killed process 1202 (larod)
Seems that the 1Gb of RAM of your device is not enough to handle this model.

egSat May 3, 2023

I've had several reboots with models that can be run, but were loaded after trying a different model. Maybe the docker-compose down doesnt clear all the memory so the new model load crashes due to out of memory? I have to manually switch the docker daemon on before running again the load comands.

I'll try to attach more logs whenever I encounter the same issue.

Corallo May 3, 2023

The linux Out of memory is not always predictable. You could check if your model is somehow still loaded by running larod-client -l (after ssh in your device)

Corallo · 2023-04-27T14:19:23Z

Corallo
Apr 27, 2023

I'll move this to a discussion as it is not an issue with the examples.

0 replies

Corallo · 2024-02-13T08:31:18Z

Corallo
Feb 13, 2024

Hi again,
It is now possible to run Yolov5 on Artpec-8 on devices running Axis OS >= 11.7
See the guide here

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying YOLOv8 to ARTPEC-8 cameras #144

{{title}}

Replies: 7 comments 11 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deploying YOLOv8 to ARTPEC-8 cameras #144

Can ARTPEC-8 cameras run YOLOv8?

To reproduce

Environment

Replies: 7 comments · 11 replies

philippe-heitzmann Apr 11, 2023 Author

Replies: 7 comments 11 replies

philippe-heitzmann
Apr 11, 2023
Author