Significant Inference Time Difference: TensorFlow Serving vs TensorFlow Lite Runtime on Axis Camera with ARTPEC-8 Chip #192
Unanswered
Sathishmahi
asked this question in
Q&A
Replies: 1 comment
-
@Sathishmahi The inference server provided by the SDK gives you access to the DLPU of the device that speed up the inference. You don't have access to it by using your own TensorFlow Lite runtime. There is no way around it. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm working on a video analytics project with an Axis camera that has an ARTPEC-8 chip and aarch64 architecture. For my project, I initially used the Computer Vision SDK, and many examples I found involved performing inference using TensorFlow Serving. However, I found TensorFlow Serving a bit inconvenient for my setup, so I decided to switch to TensorFlow Lite runtime for model inference.
After implementing this, I noticed a significant difference in inference times. When using TensorFlow Serving, the inference time for a single prediction was around 150ms. However, with TensorFlow Lite runtime, the same model takes almost 2 seconds for a single inference. I’m using a YOLOv5n INT8 quantized model.
In my opinion, the difference might be because TensorFlow Lite is only utilizing the CPU, whereas TensorFlow Serving might be taking advantage of a different chip. I'm not entirely sure about this, though.
My question is: why is there such a large difference in inference times between the two methods, and how can I resolve this issue?
Here’s a sample of my code:
`from tflite_runtime.interpreter import Interpreter
interpreter = Interpreter(model_path=model_path,num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
interpreter.set_tensor(input_details['index'], process_imarr)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details['index'])`
Beta Was this translation helpful? Give feedback.
All reactions