Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.
-
Validated Quantization Examples
1.1. TensorFlow Models with Intel TensorFlow 2.11.0
1.2. PyTorch Models with Torch 1.13.0+cpu in PTQ Mode
1.3. PyTorch Models with Torch 1.13.0+cpu in QAT Mode
1.4. PyTorch Models with Torch and Intel® Extension for PyTorch* 1.13.0+cpu
-
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
Performance results test on 01/04/2023 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 8 instances and batch size 1.
Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Model | Example | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
BERT base MRPC | CKPT | 86.52% | 86.52% | 0.00% | 170.44 | 93.69 | 1.82x |
BERT large SQuAD | pb | 92.40 | 92.99 | -0.63% | 18.39 | 9.92 | 1.85x |
BERT large SQuAD (ONNX Model Zoo) | pb | 92.41 | 92.98 | -0.61% | 20.41 | 11.16 | 1.83x |
Densenet 121 | pb | 73.61% | 72.89% | 0.99% | 274.61 | 148.72 | 1.85x |
Densenet 161 | pb | 76.30% | 76.29% | 0.01% | 132.35 | 95.24 | 1.39x |
Densenet 169 | pb | 74.38% | 74.65% | -0.36% | 191.31 | 118.99 | 1.61x |
Faster R-CNN Inception ResNet V2 | pb | 37.44% | 38.31% | -2.27% | 3.31 | 1.81 | 1.83x |
Faster R-CNN Inception ResNet V2 | SavedModel | 37.55% | 38.31% | -1.98% | 3.32 | 1.81 | 1.84x |
Faster R-CNN ResNet101 | pb | 30.33% | 30.39% | -0.20% | 42.57 | 13.25 | 3.21x |
Faster R-CNN ResNet101 | SavedModel | 30.33% | 30.39% | -0.20% | 43.41 | 11.73 | 3.70x |
Faster R-CNN ResNet50 | pb | 26.64% | 26.59% | 0.19% | 51.70 | 16.45 | 3.14x |
Inception ResNet V2 | pb | 80.34% | 80.40% | -0.07% | 139.29 | 76.65 | 1.82x |
Inception ResNet V2 | keras | 80.35% | 80.40% | -0.05% | 99.42 | 54.50 | 1.82x |
Inception V1 | pb | 70.44% | 69.74% | 1.00% | 955.20 | 328.15 | 2.91x |
Inception V2 | pb | 74.34% | 73.97% | 0.50% | 709.92 | 282.40 | 2.51x |
Inception V3 | pb | 76.71% | 76.75% | -0.05% | 337.09 | 160.07 | 2.11x |
Inception V3 | keras | 77.73% | 77.83% | -0.13% | 438.52 | 204.76 | 2.14x |
Inception V4 | pb | 80.18% | 80.27% | -0.11% | 223.02 | 105.44 | 2.12x |
Mask R-CNN Inception V2 | pb | 28.50% | 28.73% | -0.80% | 69.42 | 33.00 | 2.10x |
Mask R-CNN Inception V2 | CKPT | 28.50% | 28.73% | -0.80% | 69.47 | 32.88 | 2.11x |
MobileNet V1 | pb | 71.85% | 70.96% | 1.25% | 1347.65 | 439.05 | 3.07x |
MobileNet V2 | pb | 72.56% | 71.76% | 1.11% | 1192.01 | 492.92 | 2.42x |
MobileNet V2 | keras | 71.10% | 71.76% | -0.91% | 412.75 | 376.34 | 1.10x |
MobileNet V3 | pb | 74.00% | 75.31% | -1.74% | 662.07 | 397.69 | 1.66x |
ResNet101 | pb | 77.50% | 76.45% | 1.37% | 299.23 | 154.67 | 1.93x |
ResNet101 | keras | 61.38% | 61.47% | -0.16% | 476.39 | 227.24 | 2.10x |
ResNet50 fashion | keras | 78.04% | 78.12% | -0.10% | 2734.43 | 1299.73 | 2.10x |
ResNet50 v1.0 | pb | 74.12% | 74.27% | -0.20% | 498.76 | 178.72 | 2.79x |
ResNet50 v1.5 | pb | 76.23% | 76.46% | -0.30% | 427.46 | 173.25 | 2.47x |
ResNetV2 101 | pb | 72.65% | 71.87% | 1.09% | 194.11 | 146.42 | 1.33x |
ResNetV2 101 | keras | 71.48% | 71.57% | -0.12% | 237.09 | 187.24 | 1.27x |
ResNetV2 152 | pb | 73.07% | 72.37% | 0.97% | 155.04 | 112.01 | 1.38x |
ResNetV2 50 | pb | 70.44% | 69.64% | 1.15% | 302.55 | 215.50 | 1.40x |
ResNet v2 50 | keras | 69.20% | 69.03% | 0.25% | 346.99 | 312.15 | 1.11x |
SSD MobileNet V1 | pb | 23.12% | 23.13% | -0.04% | 277.10 | 173.61 | 1.60x |
SSD MobileNet v1 | CKPT | 23.10% | 23.13% | -0.13% | 273.51 | 118.46 | 2.31x |
SSD ResNet34 | pb | 21.70% | 22.09% | -1.77% | 33.95 | 8.81 | 3.85x |
SSD ResNet50 V1 | pb | 37.75% | 38.00% | -0.66% | 34.11 | 15.67 | 2.18x |
SSD ResNet50 v1 | CKPT | 37.82% | 38.00% | -0.47% | 34.57 | 13.68 | 2.53x |
Transformer lt MLPerf | pb | 27.12 | 27.17 | -0.18% | 3.26 | 2.63 | 1.24x |
VGG16 | pb | 72.64% | 70.89% | 2.47% | 219.11 | 91.30 | 2.40x |
VGG19 | pb | 72.69% | 71.01% | 2.37% | 193.61 | 78.47 | 2.47x |
Wide Deep large DS | pb | 77.75% | 77.67% | 0.10% | 11506.91 | 9665.07 | 1.19x |
Xception | keras | 78.43% | 78.94% | -0.65% | 262.83 | 137.35 | 1.91x |
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ALBERT base MRPC | EAGER | 88.85% | 88.50% | 0.40% | 25.68 | 21.58 | 1.19x |
Barthez MRPC | EAGER | 83.92% | 83.81% | 0.14% | 143.37 | 70.96 | 2.02x |
BERT base COLA | FX | 58.80% | 58.84% | -0.07% | 223.51 | 101.39 | 2.20x |
BERT base MRPC | FX | 89.90% | 90.69% | -0.88% | 209.80 | 100.96 | 2.08x |
BERT base RTE | FX | 69.31% | 69.68% | -0.52% | 221.92 | 101.36 | 2.19x |
BERT base SST-2 | FX | 91.06% | 91.86% | -0.87% | 224.19 | 101.23 | 2.21x |
BERT base STSB | FX | 89.10% | 89.75% | -0.72% | 218.04 | 101.15 | 2.16x |
BERT large COLA | FX | 64.12% | 62.57% | 2.48% | 75.42 | 29.32 | 2.57x |
BERT large MRPC | FX | 89.50% | 90.38% | -0.97% | 75.10 | 29.41 | 2.55x |
BERT large QNLI | FX | 90.90% | 91.82% | -1.00% | 74.80 | 29.17 | 2.56x |
BERT large RTE | FX | 73.29% | 74.01% | -0.97% | 40.38 | 29.28 | 1.38x |
BERT large SQuAD | FX | 92.61 | 93.16 | -0.58% | 18.53 | 9.82 | 1.89x |
BlendCNN | EAGER | 68.40% | 68.40% | 0.00% | 4885.60 | 3715.36 | 1.31x |
CamemBERT base MRPC | EAGER | 86.70% | 86.82% | -0.14% | 206.00 | 98.50 | 2.09x |
Ctrl MRPC | EAGER | 81.87% | 82.00% | -0.15% | 19.39 | 7.19 | 2.70x |
Deberta MRPC | EAGER | 90.88% | 90.91% | -0.04% | 125.42 | 67.67 | 1.85x |
DistilBERT base MRPC | EAGER | 88.23% | 89.16% | -1.05% | 366.27 | 197.76 | 1.85x |
DistilBERT base MRPC | FX | 88.54% | 89.16% | -0.69% | 399.63 | 197.47 | 2.02x |
FlauBERT MRPC | EAGER | 79.87% | 80.19% | -0.40% | 592.53 | 385.01 | 1.54x |
GPT J WikiText | FX | 3.36 | 2.34 | 43.84% | 0.52 | 0.20 | 2.60x |
HuBERT | EAGER | 97.63% | 97.84% | -0.21% | 10.00 | 7.26 | 1.38x |
Inception V3 | EAGER | 69.43% | 69.52% | -0.13% | 446.65 | 181.41 | 2.46x |
Layoutlm MRPC | EAGER | 81.22% | 78.01% | 4.12% | 204.22 | 96.26 | 2.12x |
Longformer MRPC | EAGER | 91.01% | 91.46% | -0.49% | 18.68 | 14.25 | 1.31x |
Mask R-CNN | FX | 37.60% | 37.80% | -0.53% | 7.20 | 4.77 | 1.51x |
Mbart wnli | EAGER | 56.34% | 56.34% | 0.00% | 56.32 | 24.77 | 2.27x |
MobileNet V2 | EAGER | 70.54% | 71.84% | -1.81% | 625.38 | 451.25 | 1.39x |
lvwerra/pegasus-samsum | EAGER | 42.10 | 42.67 | -1.35% | 3.58 | 1.06 | 3.38x |
Peleenet | EAGER | 71.64% | 72.10% | -0.64% | 402.33 | 312.37 | 1.29x |
Pokemon Diffusers | FX | 275.80 | 334.48 | -17.54% | 0.03 | 0.02 | 1.48x |
Reformer Crime and Punishment | EAGER | 1.88 | 1.87 | 0.43% | 162.34 | 153.65 | 1.06x |
ResNet18 | EAGER | 69.57% | 69.76% | -0.27% | 657.72 | 327.69 | 2.01x |
ResNet18 | FX | 69.62% | 69.76% | -0.20% | 812.99 | 344.99 | 2.36x |
ResNet50 | EAGER | 75.98% | 76.15% | -0.21% | 360.16 | 161.44 | 2.23x |
Resnext101 32x8d | EAGER | 79.08% | 79.31% | -0.29% | 182.84 | 60.55 | 3.02x |
Roberta base MRPC | EAGER | 88.25% | 88.18% | 0.08% | 207.41 | 98.71 | 2.10x |
SqueezeBERT MRPC | EAGER | 86.87% | 87.65% | -0.89% | 195.00 | 150.09 | 1.30x |
SSD ResNet34 | FX | 19.47 | 19.63 | -0.83% | 18.56 | 6.75 | 2.75x |
Transfo-xl MRPC | EAGER | 81.97% | 81.20% | 0.94% | 9.73 | 6.92 | 1.41x |
Wave2Vec2 | FX | 95.71% | 96.60% | -0.92% | 23.78 | 19.45 | 1.22x |
Xlm Roberta MRPC | EAGER | 88.24% | 88.24% | 0.00% | 102.19 | 102.58 | 1.00x |
Xlm Roberta-base MRPC | EAGER | 88.03% | 88.62% | -0.67% | 115.16 | 98.75 | 1.17x |
YOLO V3 | EAGER | 24.60% | 24.54% | 0.21% | 76.15 | 31.80 | 2.39x |
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
BERT base MRPC | FX | 89.20% | 89.50% | -0.34% | 232.16 | 101.89 | 2.28x |
ResNet 18 | EAGER | 69.68% | 69.76% | -0.12% | 664.99 | 329.15 | 2.02x |
ResNet 18 | FX | 69.84% | 69.76% | 0.12% | 832.32 | 338.48 | 2.46x |
ResNet 50 | EAGER | 76.03% | 76.15% | -0.15% | 433.83 | 164.98 | 2.63x |
Model | Example | Accuracy | Performance Throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 | IPEX | 76.01% | 76.15% | -0.17% | 836.38 | 207.89 | 4.02x |
ResNet18 | IPEX | 69.65% | 69.76% | -0.15% | 1396.52 | 463.95 | 3.01x |
SSD ResNet34 | IPEX | 19.93% | 20.00% | -0.36% | 30.08 | 7.66 | 3.93x |
BERT large | IPEX | 92.81 | 93.16 | -0.37% | 46.44 | 6.73 | 6.90x |
Distilbert base | IPEX | 85.97 | 86.84 | -0.99% | 159.90 | 68.95 | 2.32x |
Model | Example | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
AlexNet (ONNX Model Zoo) | QLinear | 54.73% | 54.79% | -0.11% | 968.22 | 473.31 | 2.05x |
AlexNet (ONNX Model Zoo) | QDQ | 54.71% | 54.79% | -0.15% | 958.75 | 477.77 | 2.01x |
ArcFace (ONNX Model Zoo) | QLinear | 99.80% | 99.80% | 0.00% | 225.10 | 126.56 | 1.78x |
BERT base MRPC DYNAMIC | QLinear | 85.29% | 86.03% | -0.86% | 298.33 | 124.67 | 2.39x |
BERT base MRPC STATIC | QLinear | 85.54% | 86.03% | -0.57% | 624.43 | 254.64 | 2.45x |
BERT SQuAD model zoo DYNAMIC (ONNX Model Zoo) | QLinear | 80.44 | 80.67 | -0.29% | 97.81 | 52.75 | 1.85x |
Caffenet (ONNX Model Zoo) | QLinear | 56.21% | 56.30% | -0.16% | 1432.98 | 540.28 | 2.65x |
Caffenet (ONNX Model Zoo) | QDQ | 56.25% | 56.30% | -0.09% | 1460.21 | 540.81 | 2.70x |
Densenet (ONNX Model Zoo) | QLinear | 60.53% | 60.96% | -0.71% | 357.41 | 265.22 | 1.35x |
Distilbert base MRPC | QLinear | 85.54% | 84.56% | 1.16% | 1365.72 | 477.62 | 2.86x |
Distilbert base MRPC | QDQ | 84.56% | 84.56% | 0.00% | 524.96 | 476.39 | 1.10x |
DUC (ONNX Model Zoo) | QLinear | 81.62% | 81.92% | -0.37% | 5.66 | 2.82 | 2.01x |
EfficientNet (ONNX Model Zoo) | QLinear | 77.57% | 77.70% | -0.17% | 1211.10 | 758.41 | 1.60x |
EfficientNet (ONNX Model Zoo) | QDQ | 77.61% | 77.70% | -0.12% | 856.64 | 762.48 | 1.12x |
Emotion Ferplus (ONNX Model Zoo) | QLinear | 8.00% | 8.00% | 0.00% | 925.43 | 694.99 | 1.33x |
Faster R-CNN (ONNX Model Zoo) | QLinear | 34.09% | 34.37% | -0.81% | 13.82 | 5.89 | 2.35x |
Faster R-CNN (ONNX Model Zoo) | QDQ | 33.90% | 34.37% | -1.37% | 9.59 | 6.09 | 1.57x |
FCN (ONNX Model Zoo) | QLinear | 64.54% | 64.98% | -0.68% | 40.49 | 11.92 | 3.40x |
FCN (ONNX Model Zoo) | QDQ | 64.40% | 64.98% | -0.89% | 26.87 | 11.92 | 2.25x |
GoogleNet-12 (ONNX Model Zoo) | QLinear | 67.71% | 67.79% | -0.12% | 771.39 | 571.35 | 1.35x |
GoogleNet-12 (ONNX Model Zoo) | QDQ | 67.73% | 67.79% | -0.09% | 763.79 | 579.95 | 1.32x |
HF ALBERT-base-V2 DYNAMIC | QLinear | 91.40% | 92.32% | -1.00% | 156.96 | 105.89 | 1.48x |
HF BERT-base-multilingual-cased DYNAMIC | QLinear | 88.70 | 89.13 | -0.48% | 47.68 | 23.95 | 1.99x |
HF BERT-base-uncased DYNAMIC | QLinear | 89.58% | 90.42% | -0.93% | 199.37 | 104.85 | 1.90x |
HF CamemBERT-base DYNAMIC | QLinear | 88.47% | 89.28% | -0.91% | 182.60 | 105.45 | 1.73x |
HF Distilbert-base-uncased DYNAMIC | QLinear | 90.37% | 91.06% | -0.76% | 449.71 | 164.21 | 2.74x |
HF minilm-l12-h384-uncased DYNAMIC | QLinear | 91.07% | 90.97% | 0.11% | 466.59 | 247.71 | 1.88x |
HF minilm-l6-h384-uncased DYNAMIC | QLinear | 89.91% | 90.14% | -0.26% | 523.59 | 354.05 | 1.48x |
HF Roberta-base DYNAMIC | QLinear | 90.85% | 91.38% | -0.58% | 183.59 | 107.70 | 1.70x |
HF Spanbert DYNAMIC | QLinear | 91.40 | 91.98 | -0.63% | 48.36 | 24.03 | 2.01x |
HF Xlm Roberta-base DYNAMIC | QLinear | 89.45% | 90.10% | -0.72% | 208.16 | 64.60 | 3.22x |
Inception V1 (ONNX Model Zoo) | QLinear | 67.21% | 67.24% | -0.04% | 795.38 | 600.03 | 1.33x |
Inception v1 (ONNX Model Zoo) | QDQ | 67.21% | 67.24% | -0.04% | 780.70 | 591.81 | 1.32x |
Mask R-CNN (ONNX Model Zoo) | QLinear | 33.13% | 33.72% | -1.75% | 11.61 | 5.58 | 2.08x |
Mask R-CNN (ONNX Model Zoo) | QDQ | 33.28% | 33.72% | -1.30% | 8.64 | 5.53 | 1.56x |
MobileBERT MRPC | QLinear | 86.27% | 86.27% | 0.00% | 591.94 | 515.49 | 1.15x |
MobileBERT SQuAD MLPerf DYNAMIC | QLinear | 89.82 | 90.03 | -0.23% | 85.66 | 74.12 | 1.16x |
MobileNet V2 | QLinear | 65.59% | 66.89% | -1.94% | 2370.93 | 1526.33 | 1.55x |
MobileNet V2 | QDQ | 65.82% | 66.89% | -1.60% | 2216.02 | 1506.85 | 1.47x |
MobileNet V3 MLPerf | QLinear | 75.58% | 75.74% | -0.21% | 2078.85 | 1028.31 | 2.02x |
MobileNet V3 MLPerf | QDQ | 75.57% | 75.74% | -0.22% | 1762.62 | 999.31 | 1.76x |
MobileNetV2-12 (ONNX Model Zoo) | QLinear | 68.38% | 69.48% | -1.58% | 2615.52 | 1645.08 | 1.59x |
MobileNetV2-12 (ONNX Model Zoo) | QDQ | 68.51% | 69.48% | -1.40% | 2461.25 | 1674.36 | 1.47x |
ResNet v1.5 MLPerf | QLinear | 76.15% | 76.46% | -0.41% | 766.33 | 431.92 | 1.77x |
ResNet v1.5 MLPerf | QDQ | 76.14% | 76.46% | -0.42% | 575.34 | 430.83 | 1.34x |
ResNet50 v1.5 | QLinear | 72.26% | 72.29% | -0.04% | 747.31 | 431.09 | 1.73x |
ResNet50 v1.5 | QDQ | 72.20% | 72.29% | -0.12% | 564.21 | 431.50 | 1.31x |
ResNet50-v1-12 (ONNX Model Zoo) | QLinear | 74.81% | 74.99% | -0.24% | 594.29 | 449.21 | 1.32x |
ResNet50-v1-12 (ONNX Model Zoo) | QDQ | 74.76% | 74.99% | -0.31% | 590.51 | 449.93 | 1.31x |
Roberta base MRPC | QLinear | 90.69% | 89.95% | 0.82% | 643.03 | 253.04 | 2.54x |
ShuffleNet V2-12 (ONNX Model Zoo) | QLinear | 66.13% | 66.36% | -0.35% | 2354.51 | 1461.47 | 1.61x |
ShuffleNet V2-12 (ONNX Model Zoo) | QDQ | 66.12% | 66.36% | -0.36% | 1850.09 | 1368.35 | 1.35x |
SqueezeNet (ONNX Model Zoo) | QLinear | 56.54% | 56.87% | -0.58% | 2484.36 | 1912.37 | 1.30x |
SqueezeNet (ONNX Model Zoo) | QDQ | 56.39% | 56.87% | -0.83% | 2526.02 | 1911.32 | 1.32x |
SSD MobileNet V1 | QLinear | 22.44% | 23.10% | -2.86% | 710.17 | 549.55 | 1.29x |
SSD MobileNet V1 | QDQ | 22.44% | 23.10% | -2.86% | 622.58 | 497.42 | 1.25x |
SSD MobileNet V1 (ONNX Model Zoo) | QLinear | 22.96% | 23.02% | -0.26% | 652.14 | 507.77 | 1.28x |
SSD MobileNet V1 (ONNX Model Zoo) | QDQ | 22.96% | 23.02% | -0.26% | 573.30 | 470.42 | 1.22x |
SSD MobileNet V2 | QLinear | 24.03% | 24.67% | -2.59% | 527.67 | 396.27 | 1.33x |
SSD-12 (ONNX Model Zoo) | QLinear | 18.92% | 18.98% | -0.32% | 31.24 | 8.77 | 3.56x |
SSD-12 (ONNX Model Zoo) | QDQ | 18.63% | 18.98% | -1.84% | 23.72 | 8.87 | 2.68x |
Tiny YOLO V3 (ONNX Model Zoo) | QLinear | 11.82% | 12.42% | -4.83% | 647.17 | 514.42 | 1.26x |
Ultraface (ONNX Model Zoo) | QLinear | 83.34% | 83.65% | -0.37% | 314.50 | 125.56 | 2.50x |
VGG16 | QLinear | 66.67% | 66.69% | -0.03% | 221.62 | 98.20 | 2.26x |
VGG16 | QDQ | 66.69% | 66.69% | 0.00% | 304.09 | 98.33 | 3.09x |
VGG16 (ONNX Model Zoo) | QLinear | 72.32% | 72.40% | -0.11% | 316.54 | 98.49 | 3.21x |
VGG16 (ONNX Model Zoo) | QDQ | 72.31% | 72.40% | -0.12% | 315.61 | 98.46 | 3.21x |
YOLO V3 (ONNX Model Zoo) | QLinear | 26.92% | 28.73% | -6.30% | 119.63 | 53.37 | 2.24x |
YOLO V4 (ONNX Model Zoo) | QLinear | 32.33% | 33.71% | -4.09% | 49.30 | 32.88 | 1.50x |
ZFNet (ONNX Model Zoo) | QLinear | 55.84% | 55.96% | -0.21% | 462.28 | 268.32 | 1.72x |
ZFNet (ONNX Model Zoo) | QDQ | 55.86% | 55.96% | -0.18% | 465.44 | 265.58 | 1.75x |
Model | Accuracy | Performance Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
|
Inception V3 | 77.77% | 77.65% | 0.16% | 94.24 | 58.05 | 1.62x |
MobileNet 1.0 | 71.61% | 72.23% | -0.86% | 436.46 | 314.81 | 1.39x |
MobileNet V2 1.0 | 70.75% | 70.87% | -0.16% | 270.78 | 229.21 | 1.18x |
ResNet 152 V1 | 78.30% | 78.54% | -0.30% | 66.62 | 36.55 | 1.82x |
ResNet 18 V1 | 70.01% | 70.14% | -0.19% | 429.86 | 224.10 | 1.92x |
ResNet 50 V1 | 75.94% | 76.33% | -0.50% | 182.56 | 94.15 | 1.94x |
SqueezeNet 1.0 | 56.82% | 56.97% | -0.26% | 331.72 | 242.76 | 1.37x |
SSD MobileNet 1.0 | 74.94% | 75.54% | -0.79% | 53.66 | 27.16 | 1.98x |
SSD ResNet50 V1 | 80.19% | 80.23% | -0.05% | 37.63 | 16.80 | 2.24x |
Model | Task Dataset |
Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
---|---|---|---|---|---|
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=77.62 |
+0.98% | 50% structured 2:4 |
snip momentum balanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=86.15 |
-0.86% | 80% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=87.50 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=87.78 |
-0.92% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=89.40 |
+0.91% | 50% structured 2:4 |
snip momentum balanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=90.91 |
-0.35% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=91.67 |
+0.48% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.22 |
-0.34% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.33 |
-0.22% | 90% structured 4x1 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.89 |
-0.72% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.8 |
-0.83% | 60% structured per channel |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=89.85 |
-0.46% | 90% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=90.88 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=87.73 |
+0.14% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 50% structured per channel |
snip momentum unbalanced |
ResNet50 | image recognition ImageNet |
top1 acc = 78.95 top1 acc = 80.10 |
-1.43% | 75% structured 2x1 |
snip momentum unbalanced |
YOLO-v5s6 | object detection COCO |
AP0.50:0.95/AP0.50=0.404/0.6 AP0.50:0.95/AP0.50=0.393/0.584 |
-2.72% | 80% unstructured |
snip momentum unbalanced |
Bert-Large | question answering SQuAD-v1.1 |
f1=91.34 f1=90.7 |
-0.07% | 80% structured 2x1 |
group lasso unbalanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] |
[-2.51%, -1.80%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11] |
[-1.62%, -0.80%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.51 |
-0.88% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 92.20 |
-0.13% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.97 |
-0.38% | 20% unstructured |
gradient sensitivity balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] |
[-0.68%, -1.12%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78] |
[-0.20%, -0.31%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.39 |
-1.26% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.87 |
-0.73% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] |
[-2.61%, -1.54%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50] |
[-1.65%, -0.69%] | 50% structured 1:2 |
Prune once for all balanced |
Example Name | Dataset | Student (Metrics) |
Teacher (Metrics) |
Student With Distillation (Metrics Improvement) |
Student With Distributed Distillation (Metrics Improvement) |
---|---|---|---|---|---|
MobileNet example | CIFAR-10 | MobileNetV2-0.35 (0.7965 ACC) |
WideResNet40-2 (0.9522 ACC) |
0.8178 ACC (0.0213 ACC) |
0.8235 ACC (0.027 ACC) |
CNN example | CIFAR-100 | CNN-2 (0.5494 ACC) |
CNN-10 (0.7153 ACC) |
0.5540 ACC (0.0046 ACC) |
0.5523 ACC (0.0029 ACC) |
VGG example | CIFAR-100 | VGG-8-BN (0.7022 ACC) |
VGG-13-BN (0.7415 ACC) |
0.7025 ACC (0.0003 ACC) |
WIP |
ResNet example | ImageNet | ResNet18 (0.6739 ACC) |
ResNet50 (0.7399 ACC) |
0.6845 ACC (0.0106 ACC) |
WIP |
BlendCnn example | MRPC | BlendCnn (0.7034 ACC) |
BERT-Base (0.8382 ACC) |
0.7034 ACC (0 ACC) |
WIP |
BiLSTM example | SST-2 | BiLSTM (0.8314 ACC) |
RoBERTa-Base (0.9403 ACC) |
0.9048 ACC (0.0734 ACC) |
WIP |
DistilBERT example | SQuAD | DistilBERT (0.7323/0.8256 EM/F1) |
BERT-Base (0.8084/0.8814 EM/F1) |
0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1) |
WIP |
TinyBERT example | MNLI | TinyBERT (0.8018/0.8044 m/mm) |
BERT-Base (0.8363/0.8411 m/mm) |
0.8025/0.8074 m/mm (0.0007/0.0030 m/mm) |
WIP |
BERT-3 example | QQP | BERT-3 (0.8626/0.8213 EM/F1) |
BERT-Base (0.9091/0.8782 EM/F1) |
0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1) |
WIP |
DistilRoBERTa example | COLA | DistilRoBERTa (0.6057 ACC) |
RoBERTa-Large (0.6455 ACC) |
0.6187 ACC (0.0130 ACC) |
WIP |
Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
---|---|---|---|---|
ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
SSD | 18.63% | 18.54% | 18.63% | 18.61% |
AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |