This repo contains device agnostic codes of framework-wise benchmark (adapted from u93kun) and also layer-wise plus model-wise benchmark (adapted from avik-pal). Few other results were added based on my own test results.
The scripts for layer-wise and model-wise are Pytorch based, framework-wise includes Pytorch, Caffe2, and TensorFlow. Performance of CPUs and GPUs are compared, including the effect of adjusting the floating point precision (the new Volta architecture allows performance boost by utilizing half/mixed-precision calculations.)
By default, it should run on GPU.
It will run on CPU either when GPU is not detected, or
you manually remove 'cuda:0' if torch.cuda.is_available() else
from the following line
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
python3 layer_benchmark.py
Tested layers:
- Conv3x3, stride 1, padding 1
- Conv5x5, stride 1, padding 2
- Conv3x3, stride 2, padding 1
- Conv5x5, stride 2, padding 2
- Maxpool, stride 2, padding 1
- Meanpool, stride 2, padding 1
- Batchnorm
- Dense
python3 model_benchmark.py
The following models were tested with TensorFlow:
- vgg16
- resnet50 *NEWLY ADDED!
- resnet152
The following models are available to test with Pytorch:
- vgg16
- vgg16_bn
- vgg19
- vgg19_bn
- resnet18
- resnet34
- resnet50
- resnet101
- resnet152
- Densenet161
python3 framework_benchmark.py -f <framework_name>
Available frameworks to test:
- pytorch
- tensorFlow (GPU only)
caffe2 (GPU only)
P.S.:
- for some reason, with Zotac 1080Ti, caffe2 seems to have "out of memory" error for fp16 benchmark. It wasn't the case With GV100 and P5000 from NVIDIA. UPDATE: it turns out something was wrong about pFP16Initializer which becomes PseudoFP16Initializer.
- Caffe2 container does not officialy supported for TITAN RTX, and since it philosophically designed with high emphasis on industrial mobile application, it makes no sense to measure it based on GPGPUs. Therefore from now on, only pytorch and tensorflow will be considered for comparison.
The following command will create a result subdirectory (if it doesn't exist), and run all specified benchmarks by default.
./run_all_benchmark_docker.sh <device_name>
The results are now visualized here.