Significant Inference Time Difference: TensorFlow Serving vs TensorFlow Lite Runtime on Axis Camera with ARTPEC-8 Chip #192

Sathishmahi · 2024-09-08T09:23:36Z

Sathishmahi
Sep 8, 2024

Hi, I'm working on a video analytics project with an Axis camera that has an ARTPEC-8 chip and aarch64 architecture. For my project, I initially used the Computer Vision SDK, and many examples I found involved performing inference using TensorFlow Serving. However, I found TensorFlow Serving a bit inconvenient for my setup, so I decided to switch to TensorFlow Lite runtime for model inference.

After implementing this, I noticed a significant difference in inference times. When using TensorFlow Serving, the inference time for a single prediction was around 150ms. However, with TensorFlow Lite runtime, the same model takes almost 2 seconds for a single inference. I’m using a YOLOv5n INT8 quantized model.

In my opinion, the difference might be because TensorFlow Lite is only utilizing the CPU, whereas TensorFlow Serving might be taking advantage of a different chip. I'm not entirely sure about this, though.

My question is: why is there such a large difference in inference times between the two methods, and how can I resolve this issue?

Here’s a sample of my code:

`from tflite_runtime.interpreter import Interpreter

interpreter = Interpreter(model_path=model_path,num_threads=4)

interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]

output_details = interpreter.get_output_details()[0]

interpreter.set_tensor(input_details['index'], process_imarr)

interpreter.invoke()

output_data = interpreter.get_tensor(output_details['index'])`

Corallo · 2024-10-04T18:32:07Z

Corallo
Oct 4, 2024

@Sathishmahi The inference server provided by the SDK gives you access to the DLPU of the device that speed up the inference. You don't have access to it by using your own TensorFlow Lite runtime. There is no way around it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Inference Time Difference: TensorFlow Serving vs TensorFlow Lite Runtime on Axis Camera with ARTPEC-8 Chip #192

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Significant Inference Time Difference: TensorFlow Serving vs TensorFlow Lite Runtime on Axis Camera with ARTPEC-8 Chip #192

Sathishmahi Sep 8, 2024

Replies: 1 comment

Corallo Oct 4, 2024

Sathishmahi
Sep 8, 2024

Corallo
Oct 4, 2024