Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.53 KB

benchmark_libtorch_cpp.md

File metadata and controls

33 lines (26 loc) · 1.53 KB

GPU Benchmark (libtorch-cpp)

Configuration

Data set:

A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes.

./funasr-onnx-offline-rtf \
    --model-dir    ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
    --vad-dir   ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
    --punc-dir  ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
    --gpu \
    --thread-num 20 \
    --bladedisc true \
    --batch-size 20 \
    --wav-path     ./long_test.scp

Node: run in docker, ref to (docs)

Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

concurrent-tasks batch RTF Speedup Rate
1 1 0.0076 130
1 20 0.0048 208
5 20 0.0011 850
10 20 0.0008 1200+
20 20 0.0008 1200+

Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+