This repo is out of maintenance. Browse the new leaderboard from here! New repo link: https://github.com/sokcertifiedrobustness/sokcertifiedrobustness.github.io.
Recently, provable (i.e. certified) adversarial robustness training and verification methods have demonstrated their effectiveness against adversarial attacks. In contrast to empirical robustness and empirical adversarial attacks, the provable robustness verification provides rigorous lower bound of robustness for a given neural network, such that no existing or future attacks will attack further.
Note that the training methods towards robust networks are usually connected with the corresponding verification approach. For instance, after training, the robustness bound is often measure on the test set in terms of "robust accuracy"(RACC). One data sample is considered to be provable robust if and only if we can prove that there is no adversarial samples exist in the neighborhood, i.e., the model always outputs the current prediction label in the neighborhood. The neighborhood is usually defined by L-norm distance.
Tighter provable robustness bound can be achieved by better robust training approaches, and tighter robustness verification approaches, or jointly.
Updates:(Jun 2022) The accompanying SoK paper is accepted by IEEE S&P (Oakland) 2023!
If you find this repo helpful, please consider cite our paper:
@inproceedings{li2023sok,
title={SoK: Certified Robustness for Deep Neural Networks},
author={Linyi Li and Tao Xie and Bo Li},
booktitle={44th {IEEE} Symposium on Security and Privacy, {SP} 2023, San Francisco, CA, USA, 22-26 May 2023},
publisher={IEEE},
year={2023}
}
News:
- We are happy to announce the FIRST large-scale study of representative certifiably robust defenses with interesting insights @ https://arxiv.org/abs/2009.04131! (It is also a useful paper list for certified robustness of DNNs)
- We also release a unified toolbox VeriGauge for implementing robustness verification approaches conveniently with PyTorch: https://github.com/AI-secure/VeriGauge. Feel free to try it and give us feedback!
- We include a taxnomy tree of representative approaches in this field (adapted from our large-scale study paper) at the bottom and here.
Table of Contents
- Main Leaderboard
- Reference: Empirical Robustness
- Taxonomy Tree
Current works mainly focus on image classification tasks with datasets MNIST, CIFAR10, ImageNet, FashionMNIST, and SVHN.
We focus on perturbation measured by L-2 and L-infty norms.
This repo mainly records recent progress of above settings, while advances in other settings are recorded in the attached paperlist.
We only consider single model robustness.
We are trying to keep track of all important advances of provable robustness approaches, but may still miss some.
Please feel free to contact us (Linyi(linyi2@illinois.edu) @ UIUC Secure Learning Lab & Illinois ASE Group) or commit your updates :)
All input images contain three channels; each pixel is in range [0, 255].
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Certified Robustness to Adversarial Examples with Differential Privacy | Lecuyer et al | Inception V3 | 40% |
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-50 | 57% | |
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salman et al | ResNet-50 | 56% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-50 | 49% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-50 | 48% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-50 | 43% | |
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salman et al | ResNet-50 | 43% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-50 | 41% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-50 | 37% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salman et al | ResNet-50 | 27% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-50 | 25% | |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-50 | 25% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-50 | 19% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salman et al | ResNet-50 | 20% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-50 | 18% | |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-50 | 14% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-50 | 12% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salman et al | ResNet-50 | 36.8% | transformed from L-2 robustness; wrong prob. 0.001 |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-50 | 28.6% | transformed from L-2 robustness by Salman et al; wrong prob. 0.001 |
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | Gowal et al | WideResNet-10-10 | 6.13% | Dataset downscaled to 64 x 64 |
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
MixTrain: Scalable Training of Verifiably Robust Neural Networks | Wang et al | ResNet | 19.4% | |
Scaling provable adversarial defenses | Wong et al | ResNet | 5.1% | Run and reported by Wang et al |
In above table, the dataset is ImageNet-200 rather than ImageNet-1000 in other tables.
All input images have three channels; 32 x 32 x 3 size; each pixel is in range [0, 255].
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Scaling provable adversarial defenses | Wong et al | Resnet | 51.96% | 36/255; transformed from L-infty 2/255 |
Lipschitz-Certifiable Training with a Tight Outer Bound | Lee et al | 6C2F | 51.30% | |
Globally-Robust Neural Networks | Leino et al | GloRo-T | 51.0% | |
Certified Robustness to Adversarial Examples with Differential Privacy | Lecuyer et al | Resnet | 40% | |
(Verification) Efficient Neural Network Robustness Certification with General Activation Functions | Zhang et al | ResNet-20 | 0% | Reported by Cohen et al |
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salmon et al | ResNet-110 | 82% | |
Unlabeled Data Improves Adversarial Robustness | Carmon et al | ResNet 28-10 | 72% | interpolated from Fig. 1 |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-110 | 71% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-110 | 67.5% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-110 | 61% | |
(Verification) Efficient Neural Network Robustness Certification with General Activation Functions | Zhang et al | ResNet-20 | 0% | Reported by Cohen et al |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salmon et al | ResNet-110 | 65% | |
Unlabeled Data Improves Adversarial Robustness | Carmon et al | ResNet 28-10 | 61% | interpolated from Fig. 1 |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-110 | 59% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-110 | 57.7% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-110 | 43% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salmon et al | ResNet-110 | 39% | |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-110 | 38% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-110 | 37.8% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-110 | 22% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
Defense | Author | Model Structure | RACC | |
---|---|---|---|---|
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers | Salmon et al | ResNet-110 | 32% | |
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | ResNet-110 | 27.0% | |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Zhai et al | ResNet-110 | 25% | |
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | ResNet-110 | 14% |
All above approaches use Randomized Smoothing (Cohen et al) to derive certification, with wrong probability at 0.1%.
All input images are grayscale; 28 x 28 size; each pixel is in range [0, 1].
eps=1.58 is transformed from L-infty eps=0.1.
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Consistency Regularization for Certified Robustness of Smoothed Classifiers | Jeong et al | LeNet | 82.2% | slightly smaller radius 1.5 |
Globally-Robust Neural Networks | Leino et al | GloRo-T | 51.9% | |
Lipschitz-Certifiable Training with a Tight Outer Bound | Lee et al | 4C3F | 47.95% | |
Scaling provable adversarial defenses | Wong et al | Large CNN | 44.53% | |
Second-Order Provable Defenses against Adversarial Attacks | Singla and Feizi | 2x[1024], softplus | 69.79% |
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Towards Stable and Efficient Training of Verifiably Robust Neural Networks | Zhang et al | large CNN | 87.94% | pick the best number |
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | Gowal et al | CNN | 85.12% | |
Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models | Morawiecki et al | large CNN | 84.42% | |
(Verification) Evaluating Robustness of Neural Networks with Mixed Integer Programming | Tjeng et al | small CNN | 51.02% |
The image size is 32 x 32 x 3 (3-channel in color). Pixel colors in [0, 255]. When calculating eps, these values are rescaled to [0, 1].
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | Resnet-20 | ~95% | Interpolate from Cohen et al |
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks | Tsuzuku et al | Resnet-20 | 0% | Interpolate from Cohen et al |
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Certified Adversarial Robustness via Randomized Smoothing | Cohen et al | Resnet-20 | ~88% | Interpolate from Cohen et al |
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks | Tsuzuku et al | Resnet-20 | 0% | Interpolate from Cohen et al |
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Adversarial Training and Provable Defenses: Bridging the Gap | Baunovic et al | 3-layer CNN | 70.2% | |
Training Verified Learners with Learned Verifiers | Dvijotham et al | Predictor-Verifier | 62.44% | |
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | Gowal et al | CNN | 62.40% | |
Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope | Wong et al | CNN | 59.33% | |
Differentiable Abstract Interpretation for Provably Robust Neural Networks | Mirman et al | small CNN | 11.0% |
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | Gowal et al | large CNN | 47.63% | Report by Morawiecki et al |
*Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models | Morawiecki et al | small CNN | 46.03% | May not reproducible. The normalization effect seems not considered in their code |
This is a MNIST-like dataset. Images are 28 x 28 and grayscale. Values are in [0, 1].
Defense/Verification | Author | Model Structure | RACC | |
---|---|---|---|---|
Towards Stable and Efficient Training of Verifiably Robust Neural Networks (arXiv:v1) | Zhang et al | large CNN | 78.73% | pick the best number |
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models | Gowal et al | large CNN | 77.63% | pick the best number, reported by Zhang et al |
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks | Andriushchenko & Hein | Boosted trees | 76.83% | |
Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope | Wong et al | CNN | 65.47% |
*. Within one dataset, L-2 and L-Infty balls are mutually transformable. After transformation, a corresponding tight bound may exist but not listed.
Notes:
-
Some papers use rarely-used epsilon to report their results, which may increase comparison difficulty. Some papers use epsilon after regularization instead of raw one, which may also induce confusion.
We would suggest to adapt common evaluation epsilon values and settings.
-
Instead of evaluating on above benchmarks and reporting the robust accuracy, some papers tend to report average robust radius. We will add comparison table for such metric later.
-
Besides the on-the-board results, all these papers have their own unique takeaways. For interested reader and stackholders, we recommend to not only value the approach with higher numbers, but also dig into their technical meat.
For comparison, here we cite numbers from MadryLab repositories for MNIST challenge and CIFAR-10 challenge, which records the best attacks towarding their robust model with secret weights.
Block-Box
Attack | Submitted by | Accuracy | Submission Date |
---|---|---|---|
PGD on the cross-entropy loss for the adversarially trained public network | (initial entry) | 63.39% | Jul 12, 2017 |
White-Box
Attack | Submitted by | Accuracy | Submission Date |
---|---|---|---|
MultiTargeted | Sven Gowal | 44.03% | Aug 28, 2019 |
Black-Box
Attack | Submitted by | Accuracy | Submission Date |
---|---|---|---|
AdvGAN from "Generating Adversarial Examples with Adversarial Networks" | AdvGAN | 92.76% | Sep 25, 2017 |
White-Box
Attack | Submitted by | Accuracy | Submission Date |
---|---|---|---|
First-Order Adversary with Quantized Gradients | Zhuanghua Liu | 88.32% | Oct 16, 2019 |
Full introduction of these approaches is available at https://arxiv.org/abs/2009.04131.
Maintained by Linyi.
Last updated: Sept 24, 2020