Skip to content

Latest commit

 

History

History
2448 lines (2415 loc) · 52.1 KB

validated_model_list.md

File metadata and controls

2448 lines (2415 loc) · 52.1 KB

Validated Models

Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.

  1. Validated Quantization Examples

    1.1. TensorFlow Models with Intel TensorFlow 2.11.0

    1.2. PyTorch Models with Torch 1.13.0+cpu in PTQ Mode

    1.3. PyTorch Models with Torch 1.13.0+cpu in QAT Mode

    1.4. PyTorch Models with Torch and Intel® Extension for PyTorch* 1.13.0+cpu

    1.5. ONNX Models with ONNX Runtime 1.13.1

    1.6. MXNet Models with MXNet 1.9.1

  2. Validated Pruning Examples

  3. Validated Knowledge Distillation Examples

  4. Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples

Performance results test on ​​01/04/2023 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 8 instances and batch size 1.

Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with Intel TensorFlow 2.11.0

Model Example Accuracy Performance
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
BERT base MRPC CKPT 86.52% 86.52% 0.00% 170.44 93.69 1.82x
BERT large SQuAD pb 92.40 92.99 -0.63% 18.39 9.92 1.85x
BERT large SQuAD (ONNX Model Zoo) pb 92.41 92.98 -0.61% 20.41 11.16 1.83x
Densenet 121 pb 73.61% 72.89% 0.99% 274.61 148.72 1.85x
Densenet 161 pb 76.30% 76.29% 0.01% 132.35 95.24 1.39x
Densenet 169 pb 74.38% 74.65% -0.36% 191.31 118.99 1.61x
Faster R-CNN Inception ResNet V2 pb 37.44% 38.31% -2.27% 3.31 1.81 1.83x
Faster R-CNN Inception ResNet V2 SavedModel 37.55% 38.31% -1.98% 3.32 1.81 1.84x
Faster R-CNN ResNet101 pb 30.33% 30.39% -0.20% 42.57 13.25 3.21x
Faster R-CNN ResNet101 SavedModel 30.33% 30.39% -0.20% 43.41 11.73 3.70x
Faster R-CNN ResNet50 pb 26.64% 26.59% 0.19% 51.70 16.45 3.14x
Inception ResNet V2 pb 80.34% 80.40% -0.07% 139.29 76.65 1.82x
Inception ResNet V2 keras 80.35% 80.40% -0.05% 99.42 54.50 1.82x
Inception V1 pb 70.44% 69.74% 1.00% 955.20 328.15 2.91x
Inception V2 pb 74.34% 73.97% 0.50% 709.92 282.40 2.51x
Inception V3 pb 76.71% 76.75% -0.05% 337.09 160.07 2.11x
Inception V3 keras 77.73% 77.83% -0.13% 438.52 204.76 2.14x
Inception V4 pb 80.18% 80.27% -0.11% 223.02 105.44 2.12x
Mask R-CNN Inception V2 pb 28.50% 28.73% -0.80% 69.42 33.00 2.10x
Mask R-CNN Inception V2 CKPT 28.50% 28.73% -0.80% 69.47 32.88 2.11x
MobileNet V1 pb 71.85% 70.96% 1.25% 1347.65 439.05 3.07x
MobileNet V2 pb 72.56% 71.76% 1.11% 1192.01 492.92 2.42x
MobileNet V2 keras 71.10% 71.76% -0.91% 412.75 376.34 1.10x
MobileNet V3 pb 74.00% 75.31% -1.74% 662.07 397.69 1.66x
ResNet101 pb 77.50% 76.45% 1.37% 299.23 154.67 1.93x
ResNet101 keras 61.38% 61.47% -0.16% 476.39 227.24 2.10x
ResNet50 fashion keras 78.04% 78.12% -0.10% 2734.43 1299.73 2.10x
ResNet50 v1.0 pb 74.12% 74.27% -0.20% 498.76 178.72 2.79x
ResNet50 v1.5 pb 76.23% 76.46% -0.30% 427.46 173.25 2.47x
ResNetV2 101 pb 72.65% 71.87% 1.09% 194.11 146.42 1.33x
ResNetV2 101 keras 71.48% 71.57% -0.12% 237.09 187.24 1.27x
ResNetV2 152 pb 73.07% 72.37% 0.97% 155.04 112.01 1.38x
ResNetV2 50 pb 70.44% 69.64% 1.15% 302.55 215.50 1.40x
ResNet v2 50 keras 69.20% 69.03% 0.25% 346.99 312.15 1.11x
SSD MobileNet V1 pb 23.12% 23.13% -0.04% 277.10 173.61 1.60x
SSD MobileNet v1 CKPT 23.10% 23.13% -0.13% 273.51 118.46 2.31x
SSD ResNet34 pb 21.70% 22.09% -1.77% 33.95 8.81 3.85x
SSD ResNet50 V1 pb 37.75% 38.00% -0.66% 34.11 15.67 2.18x
SSD ResNet50 v1 CKPT 37.82% 38.00% -0.47% 34.57 13.68 2.53x
Transformer lt MLPerf pb 27.12 27.17 -0.18% 3.26 2.63 1.24x
VGG16 pb 72.64% 70.89% 2.47% 219.11 91.30 2.40x
VGG19 pb 72.69% 71.01% 2.37% 193.61 78.47 2.47x
Wide Deep large DS pb 77.75% 77.67% 0.10% 11506.91 9665.07 1.19x
Xception keras 78.43% 78.94% -0.65% 262.83 137.35 1.91x

PyTorch Models with Torch 1.13.0+cpu in PTQ Mode

Model Example Accuracy Performance
Throughput (samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ALBERT base MRPC EAGER 88.85% 88.50% 0.40% 25.68 21.58 1.19x
Barthez MRPC EAGER 83.92% 83.81% 0.14% 143.37 70.96 2.02x
BERT base COLA FX 58.80% 58.84% -0.07% 223.51 101.39 2.20x
BERT base MRPC FX 89.90% 90.69% -0.88% 209.80 100.96 2.08x
BERT base RTE FX 69.31% 69.68% -0.52% 221.92 101.36 2.19x
BERT base SST-2 FX 91.06% 91.86% -0.87% 224.19 101.23 2.21x
BERT base STSB FX 89.10% 89.75% -0.72% 218.04 101.15 2.16x
BERT large COLA FX 64.12% 62.57% 2.48% 75.42 29.32 2.57x
BERT large MRPC FX 89.50% 90.38% -0.97% 75.10 29.41 2.55x
BERT large QNLI FX 90.90% 91.82% -1.00% 74.80 29.17 2.56x
BERT large RTE FX 73.29% 74.01% -0.97% 40.38 29.28 1.38x
BERT large SQuAD FX 92.61 93.16 -0.58% 18.53 9.82 1.89x
BlendCNN EAGER 68.40% 68.40% 0.00% 4885.60 3715.36 1.31x
CamemBERT base MRPC EAGER 86.70% 86.82% -0.14% 206.00 98.50 2.09x
Ctrl MRPC EAGER 81.87% 82.00% -0.15% 19.39 7.19 2.70x
Deberta MRPC EAGER 90.88% 90.91% -0.04% 125.42 67.67 1.85x
DistilBERT base MRPC EAGER 88.23% 89.16% -1.05% 366.27 197.76 1.85x
DistilBERT base MRPC FX 88.54% 89.16% -0.69% 399.63 197.47 2.02x
FlauBERT MRPC EAGER 79.87% 80.19% -0.40% 592.53 385.01 1.54x
GPT J WikiText FX 3.36 2.34 43.84% 0.52 0.20 2.60x
HuBERT EAGER 97.63% 97.84% -0.21% 10.00 7.26 1.38x
Inception V3 EAGER 69.43% 69.52% -0.13% 446.65 181.41 2.46x
Layoutlm MRPC EAGER 81.22% 78.01% 4.12% 204.22 96.26 2.12x
Longformer MRPC EAGER 91.01% 91.46% -0.49% 18.68 14.25 1.31x
Mask R-CNN FX 37.60% 37.80% -0.53% 7.20 4.77 1.51x
Mbart wnli EAGER 56.34% 56.34% 0.00% 56.32 24.77 2.27x
MobileNet V2 EAGER 70.54% 71.84% -1.81% 625.38 451.25 1.39x
lvwerra/pegasus-samsum EAGER 42.10 42.67 -1.35% 3.58 1.06 3.38x
Peleenet EAGER 71.64% 72.10% -0.64% 402.33 312.37 1.29x
Pokemon Diffusers FX 275.80 334.48 -17.54% 0.03 0.02 1.48x
Reformer Crime and Punishment EAGER 1.88 1.87 0.43% 162.34 153.65 1.06x
ResNet18 EAGER 69.57% 69.76% -0.27% 657.72 327.69 2.01x
ResNet18 FX 69.62% 69.76% -0.20% 812.99 344.99 2.36x
ResNet50 EAGER 75.98% 76.15% -0.21% 360.16 161.44 2.23x
Resnext101 32x8d EAGER 79.08% 79.31% -0.29% 182.84 60.55 3.02x
Roberta base MRPC EAGER 88.25% 88.18% 0.08% 207.41 98.71 2.10x
SqueezeBERT MRPC EAGER 86.87% 87.65% -0.89% 195.00 150.09 1.30x
SSD ResNet34 FX 19.47 19.63 -0.83% 18.56 6.75 2.75x
Transfo-xl MRPC EAGER 81.97% 81.20% 0.94% 9.73 6.92 1.41x
Wave2Vec2 FX 95.71% 96.60% -0.92% 23.78 19.45 1.22x
Xlm Roberta MRPC EAGER 88.24% 88.24% 0.00% 102.19 102.58 1.00x
Xlm Roberta-base MRPC EAGER 88.03% 88.62% -0.67% 115.16 98.75 1.17x
YOLO V3 EAGER 24.60% 24.54% 0.21% 76.15 31.80 2.39x

PyTorch Models with Torch 1.13.0+cpu in QAT Mode

Model Example Accuracy Performance
Throughput (samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
BERT base MRPC FX 89.20% 89.50% -0.34% 232.16 101.89 2.28x
ResNet 18 EAGER 69.68% 69.76% -0.12% 664.99 329.15 2.02x
ResNet 18 FX 69.84% 69.76% 0.12% 832.32 338.48 2.46x
ResNet 50 EAGER 76.03% 76.15% -0.15% 433.83 164.98 2.63x

PyTorch Models with Torch and Intel® Extension for PyTorch* 1.13.0+cpu

Model Example Accuracy Performance
Throughput (samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
ResNet50 IPEX 76.01% 76.15% -0.17% 836.38 207.89 4.02x
ResNet18 IPEX 69.65% 69.76% -0.15% 1396.52 463.95 3.01x
SSD ResNet34 IPEX 19.93% 20.00% -0.36% 30.08 7.66 3.93x
BERT large IPEX 92.81 93.16 -0.37% 46.44 6.73 6.90x
Distilbert base IPEX 85.97 86.84 -0.99% 159.90 68.95 2.32x

ONNX Models with ONNX Runtime 1.13.1

Model Example Accuracy Performance
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
AlexNet (ONNX Model Zoo) QLinear 54.73% 54.79% -0.11% 968.22 473.31 2.05x
AlexNet (ONNX Model Zoo) QDQ 54.71% 54.79% -0.15% 958.75 477.77 2.01x
ArcFace (ONNX Model Zoo) QLinear 99.80% 99.80% 0.00% 225.10 126.56 1.78x
BERT base MRPC DYNAMIC QLinear 85.29% 86.03% -0.86% 298.33 124.67 2.39x
BERT base MRPC STATIC QLinear 85.54% 86.03% -0.57% 624.43 254.64 2.45x
BERT SQuAD model zoo DYNAMIC (ONNX Model Zoo) QLinear 80.44 80.67 -0.29% 97.81 52.75 1.85x
Caffenet (ONNX Model Zoo) QLinear 56.21% 56.30% -0.16% 1432.98 540.28 2.65x
Caffenet (ONNX Model Zoo) QDQ 56.25% 56.30% -0.09% 1460.21 540.81 2.70x
Densenet (ONNX Model Zoo) QLinear 60.53% 60.96% -0.71% 357.41 265.22 1.35x
Distilbert base MRPC QLinear 85.54% 84.56% 1.16% 1365.72 477.62 2.86x
Distilbert base MRPC QDQ 84.56% 84.56% 0.00% 524.96 476.39 1.10x
DUC (ONNX Model Zoo) QLinear 81.62% 81.92% -0.37% 5.66 2.82 2.01x
EfficientNet (ONNX Model Zoo) QLinear 77.57% 77.70% -0.17% 1211.10 758.41 1.60x
EfficientNet (ONNX Model Zoo) QDQ 77.61% 77.70% -0.12% 856.64 762.48 1.12x
Emotion Ferplus (ONNX Model Zoo) QLinear 8.00% 8.00% 0.00% 925.43 694.99 1.33x
Faster R-CNN (ONNX Model Zoo) QLinear 34.09% 34.37% -0.81% 13.82 5.89 2.35x
Faster R-CNN (ONNX Model Zoo) QDQ 33.90% 34.37% -1.37% 9.59 6.09 1.57x
FCN (ONNX Model Zoo) QLinear 64.54% 64.98% -0.68% 40.49 11.92 3.40x
FCN (ONNX Model Zoo) QDQ 64.40% 64.98% -0.89% 26.87 11.92 2.25x
GoogleNet-12 (ONNX Model Zoo) QLinear 67.71% 67.79% -0.12% 771.39 571.35 1.35x
GoogleNet-12 (ONNX Model Zoo) QDQ 67.73% 67.79% -0.09% 763.79 579.95 1.32x
HF ALBERT-base-V2 DYNAMIC QLinear 91.40% 92.32% -1.00% 156.96 105.89 1.48x
HF BERT-base-multilingual-cased DYNAMIC QLinear 88.70 89.13 -0.48% 47.68 23.95 1.99x
HF BERT-base-uncased DYNAMIC QLinear 89.58% 90.42% -0.93% 199.37 104.85 1.90x
HF CamemBERT-base DYNAMIC QLinear 88.47% 89.28% -0.91% 182.60 105.45 1.73x
HF Distilbert-base-uncased DYNAMIC QLinear 90.37% 91.06% -0.76% 449.71 164.21 2.74x
HF minilm-l12-h384-uncased DYNAMIC QLinear 91.07% 90.97% 0.11% 466.59 247.71 1.88x
HF minilm-l6-h384-uncased DYNAMIC QLinear 89.91% 90.14% -0.26% 523.59 354.05 1.48x
HF Roberta-base DYNAMIC QLinear 90.85% 91.38% -0.58% 183.59 107.70 1.70x
HF Spanbert DYNAMIC QLinear 91.40 91.98 -0.63% 48.36 24.03 2.01x
HF Xlm Roberta-base DYNAMIC QLinear 89.45% 90.10% -0.72% 208.16 64.60 3.22x
Inception V1 (ONNX Model Zoo) QLinear 67.21% 67.24% -0.04% 795.38 600.03 1.33x
Inception v1 (ONNX Model Zoo) QDQ 67.21% 67.24% -0.04% 780.70 591.81 1.32x
Mask R-CNN (ONNX Model Zoo) QLinear 33.13% 33.72% -1.75% 11.61 5.58 2.08x
Mask R-CNN (ONNX Model Zoo) QDQ 33.28% 33.72% -1.30% 8.64 5.53 1.56x
MobileBERT MRPC QLinear 86.27% 86.27% 0.00% 591.94 515.49 1.15x
MobileBERT SQuAD MLPerf DYNAMIC QLinear 89.82 90.03 -0.23% 85.66 74.12 1.16x
MobileNet V2 QLinear 65.59% 66.89% -1.94% 2370.93 1526.33 1.55x
MobileNet V2 QDQ 65.82% 66.89% -1.60% 2216.02 1506.85 1.47x
MobileNet V3 MLPerf QLinear 75.58% 75.74% -0.21% 2078.85 1028.31 2.02x
MobileNet V3 MLPerf QDQ 75.57% 75.74% -0.22% 1762.62 999.31 1.76x
MobileNetV2-12 (ONNX Model Zoo) QLinear 68.38% 69.48% -1.58% 2615.52 1645.08 1.59x
MobileNetV2-12 (ONNX Model Zoo) QDQ 68.51% 69.48% -1.40% 2461.25 1674.36 1.47x
ResNet v1.5 MLPerf QLinear 76.15% 76.46% -0.41% 766.33 431.92 1.77x
ResNet v1.5 MLPerf QDQ 76.14% 76.46% -0.42% 575.34 430.83 1.34x
ResNet50 v1.5 QLinear 72.26% 72.29% -0.04% 747.31 431.09 1.73x
ResNet50 v1.5 QDQ 72.20% 72.29% -0.12% 564.21 431.50 1.31x
ResNet50-v1-12 (ONNX Model Zoo) QLinear 74.81% 74.99% -0.24% 594.29 449.21 1.32x
ResNet50-v1-12 (ONNX Model Zoo) QDQ 74.76% 74.99% -0.31% 590.51 449.93 1.31x
Roberta base MRPC QLinear 90.69% 89.95% 0.82% 643.03 253.04 2.54x
ShuffleNet V2-12 (ONNX Model Zoo) QLinear 66.13% 66.36% -0.35% 2354.51 1461.47 1.61x
ShuffleNet V2-12 (ONNX Model Zoo) QDQ 66.12% 66.36% -0.36% 1850.09 1368.35 1.35x
SqueezeNet (ONNX Model Zoo) QLinear 56.54% 56.87% -0.58% 2484.36 1912.37 1.30x
SqueezeNet (ONNX Model Zoo) QDQ 56.39% 56.87% -0.83% 2526.02 1911.32 1.32x
SSD MobileNet V1 QLinear 22.44% 23.10% -2.86% 710.17 549.55 1.29x
SSD MobileNet V1 QDQ 22.44% 23.10% -2.86% 622.58 497.42 1.25x
SSD MobileNet V1 (ONNX Model Zoo) QLinear 22.96% 23.02% -0.26% 652.14 507.77 1.28x
SSD MobileNet V1 (ONNX Model Zoo) QDQ 22.96% 23.02% -0.26% 573.30 470.42 1.22x
SSD MobileNet V2 QLinear 24.03% 24.67% -2.59% 527.67 396.27 1.33x
SSD-12 (ONNX Model Zoo) QLinear 18.92% 18.98% -0.32% 31.24 8.77 3.56x
SSD-12 (ONNX Model Zoo) QDQ 18.63% 18.98% -1.84% 23.72 8.87 2.68x
Tiny YOLO V3 (ONNX Model Zoo) QLinear 11.82% 12.42% -4.83% 647.17 514.42 1.26x
Ultraface (ONNX Model Zoo) QLinear 83.34% 83.65% -0.37% 314.50 125.56 2.50x
VGG16 QLinear 66.67% 66.69% -0.03% 221.62 98.20 2.26x
VGG16 QDQ 66.69% 66.69% 0.00% 304.09 98.33 3.09x
VGG16 (ONNX Model Zoo) QLinear 72.32% 72.40% -0.11% 316.54 98.49 3.21x
VGG16 (ONNX Model Zoo) QDQ 72.31% 72.40% -0.12% 315.61 98.46 3.21x
YOLO V3 (ONNX Model Zoo) QLinear 26.92% 28.73% -6.30% 119.63 53.37 2.24x
YOLO V4 (ONNX Model Zoo) QLinear 32.33% 33.71% -4.09% 49.30 32.88 1.50x
ZFNet (ONNX Model Zoo) QLinear 55.84% 55.96% -0.21% 462.28 268.32 1.72x
ZFNet (ONNX Model Zoo) QDQ 55.86% 55.96% -0.18% 465.44 265.58 1.75x

MXNet Models with MXNet 1.9.1

Model Accuracy Performance
Throughput(samples/sec)
INT8 FP32 Accuracy Ratio
[(INT8-FP32)/FP32]
INT8 FP32 Performance Ratio
[INT8/FP32]
Inception V3 77.77% 77.65% 0.16% 94.24 58.05 1.62x
MobileNet 1.0 71.61% 72.23% -0.86% 436.46 314.81 1.39x
MobileNet V2 1.0 70.75% 70.87% -0.16% 270.78 229.21 1.18x
ResNet 152 V1 78.30% 78.54% -0.30% 66.62 36.55 1.82x
ResNet 18 V1 70.01% 70.14% -0.19% 429.86 224.10 1.92x
ResNet 50 V1 75.94% 76.33% -0.50% 182.56 94.15 1.94x
SqueezeNet 1.0 56.82% 56.97% -0.26% 331.72 242.76 1.37x
SSD MobileNet 1.0 74.94% 75.54% -0.79% 53.66 27.16 1.98x
SSD ResNet50 V1 80.19% 80.23% -0.05% 37.63 16.80 2.24x

Validated Pruning Examples

Model Task
Dataset
Dense Accuracy
Sparse Accuracy
Relative Drop Sparsity ratio
Sparsity Pattern
Comments
Balanced
or unbalanced ratio
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=76.2
-0.80% 80%
structured 4x1
snip momentum
unbalanced
Bert-Mini question answering
SQuAD-v1.1
f1=76.87
f1=77.62
+0.98% 50%
structured 2:4
snip momentum
balanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=86.15
-0.86% 80%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased question answering
SQuAD-v1.1
f1=86.90
f1=87.50
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=87.78
-0.92% 80%
structured 4x1
snip momentum
unbalanced
Bert-base-uncased question answering
SQuAD-v1.1
f1=88.59
f1=89.40
+0.91% 50%
structured 2:4
snip momentum
balanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=90.91
-0.35% 80%
structured 4x1
snip momentum
unbalanced
Bert-large question answering
SQuAD-v1.1
f1=91.23
f1=91.67
+0.48% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.22
-0.34% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
MRPC
f1=87.52
f1=87.33
-0.22% 90%
structured 4x1
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.89
-0.72% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
MRPC
f1=87.52
f1=86.8
-0.83% 60%
structured per channel
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=89.85
-0.46% 90%
structured 4x1
snip momentum
unbalanced
Distilbert-base-uncased text classification
MRPC
f1=90.26
f1=90.88
+0.69% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 90%
structured 4x1
snip momentum
unbalanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=87.73
+0.14% 50%
structured 2:4
snip momentum
balanced
Bert-Mini text classification
SST-2
accuracy=87.61
accuracy=86.92
-0.79% 50%
structured per channel
snip momentum
unbalanced
ResNet50 image recognition
ImageNet
top1 acc = 78.95
top1 acc = 80.10
-1.43% 75%
structured 2x1
snip momentum
unbalanced
YOLO-v5s6 object detection
COCO
AP0.50:0.95/AP0.50=0.404/0.6
AP0.50:0.95/AP0.50=0.393/0.584
-2.72% 80%
unstructured
snip momentum
unbalanced
Bert-Large question answering
SQuAD-v1.1
f1=91.34
f1=90.7
-0.07% 80%
structured 2x1
group lasso
unbalanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [82.45, 83.27]
[-2.51%, -1.80%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
MNLI
[m, mm] = [84.57, 84.79]
[m, mm] = [83.20, 84.11]
[-1.62%, -0.80%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.51
-0.88% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 92.20
-0.13% 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
SST-2
accuracy = 92.32
accuracy = 91.97
-0.38% 20%
unstructured
gradient sensitivity
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.48, 87.06]
[-0.68%, -1.12%] 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QQP
[accuracy, f1] = [91.10, 88.05]
[accuracy, f1] = [90.92, 87.78]
[-0.20%, -0.31%] 50%
structured 1:2
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.39
-1.26% 70%
unstructured
Prune once for all
balanced
Bert-Base text classification
QNLI
accuracy = 91.54
accuracy = 90.87
-0.73% 50%
structured 1:2
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [77.27, 85.75]
[-2.61%, -1.54%] 70%
unstructured
Prune once for all
balanced
Bert-Base question answering [em, f1] = [79.34, 87.10]
[em, f1] = [78.03, 86.50]
[-1.65%, -0.69%] 50%
structured 1:2
Prune once for all
balanced

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Metrics)
Teacher
(Metrics)
Student With Distillation
(Metrics Improvement)
Student With
Distributed Distillation
(Metrics Improvement)
MobileNet example CIFAR-10 MobileNetV2-0.35
(0.7965 ACC)
WideResNet40-2
(0.9522 ACC)
0.8178 ACC
(0.0213 ACC)
0.8235 ACC
(0.027 ACC)
CNN example CIFAR-100 CNN-2
(0.5494 ACC)
CNN-10
(0.7153 ACC)
0.5540 ACC
(0.0046 ACC)
0.5523 ACC
(0.0029 ACC)
VGG example CIFAR-100 VGG-8-BN
(0.7022 ACC)
VGG-13-BN
(0.7415 ACC)
0.7025 ACC
(0.0003 ACC)
WIP
ResNet example ImageNet ResNet18
(0.6739 ACC)
ResNet50
(0.7399 ACC)
0.6845 ACC
(0.0106 ACC)
WIP
BlendCnn example MRPC BlendCnn
(0.7034 ACC)
BERT-Base
(0.8382 ACC)
0.7034 ACC
(0 ACC)
WIP
BiLSTM example SST-2 BiLSTM
(0.8314 ACC)
RoBERTa-Base
(0.9403 ACC)
0.9048 ACC
(0.0734 ACC)
WIP
DistilBERT example SQuAD DistilBERT
(0.7323/0.8256 EM/F1)
BERT-Base
(0.8084/0.8814 EM/F1)
0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1)
WIP
TinyBERT example MNLI TinyBERT
(0.8018/0.8044 m/mm)
BERT-Base
(0.8363/0.8411 m/mm)
0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm)
WIP
BERT-3 example QQP BERT-3
(0.8626/0.8213 EM/F1)
BERT-Base
(0.9091/0.8782 EM/F1)
0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1)
WIP
DistilRoBERTa example COLA DistilRoBERTa
(0.6057 ACC)
RoBERTa-Large
(0.6455 ACC)
0.6187 ACC
(0.0130 ACC)
WIP

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Model (ONNX QDQ) AWS c6i.2xlarge (Intel)
CPU Execution Provider
AWS c6a.2xlarge (AMD)
CPU Execution Provider
AWS c6g.2xlarge (ARM)
CPU Execution Provider
NVidia A100
CUDA Execution
Provider
ResNet50 74.76% 68.95% 74.76% 74.75%
BERT-base 85.54% 84.56% 85.54% 84.31%
ResNet50 V1.5 72.20% 67.70% 72.20% 72.29%
MobileNet V2 65.82% 58.56% 65.83% 65.63%
SSD MobileNet V1 22.45% 16.53% 22.45% 22.35%
DistilBERT base MRPC 84.56% 83.82% 84.56% 84.56%
SqueezeNet 56.54% 53.52% 56.54% 56.55%
SSD 18.63% 18.54% 18.63% 18.61%
AlexNet 54.71% 47.06% 54.71% 54.79%
CaffeNet 56.25% 52.35% 56.27% 56.24%
GoogleNet 67.73% 63.56% 67.72% 67.76%
ZFNet 55.86% 45.09% 55.86% 55.89%
Inception V1 67.21% 63.03% 67.20% 67.21%
SSD MobileNet V1 (ONNX Model Zoo) 22.86% 16.94% 22.80% 22.87%
Mobile bert MRPC 85.54% 84.56% 85.54% 85.54%
Roberta base MRPC 89.46% 90.44% 89.71% 89.71%
ResNet50 V1.5 MLPerf 76.14% 72.80% 76.14% 76.17%
VGG16 66.69% 64.25% 66.69% 66.64%
VGG16 (ONNX Model Zoo) 72.31% 69.35% 72.32% 72.34%
MobileNet V3 MLPerf 75.57% 70.78% 75.56% 75.52%
EfficientNet 77.61% 76.52% 77.56% 77.60%
MobileNet V2 (ONNX Model Zoo) 68.51% 62.48% 68.58% 68.48%
ShuffleNet V2 66.12% 58.41% 66.11% 66.11%