This code reproduces part of the results presented in the paper Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent. This code implementation is done in Linux, using Python 3 and PyTorch.
The main objective of the paper is to quantitatively answer the question "To what direction the gradients of models with respect to its inputs align after robust training?". We propose to answer that question with the direction that connects the current input with the closest example in the support of the closest inaccurate class in decision space. To test if this direction is directly related to robustness, we propose a metric measuring the alignment between gradient and the proposed direction. We show that this alignment increases with robust training and that the proposed direction gets a closer alignment than another gradient alignment metric from the literature. Also, we show that increasing the alignment of the gradient with a penalty term on the loss increases robustness.
The COPD dataset used in the paper is private and is not available for outside investigators. The respective code is only provided for reference purposes.
- To install all the needed libraries, you can use the
requirements_indirect.sh
andrequirements_direct.sh
files. They assumes you have conda or miniconda installed and creates conda environments calledgdrm_indirect
andgdrm_direct
, with prerequisites installed. Activate the environment before running the code, usingconda activate gdrm_indirect
orconda activate gdrm_direct
, for running experiments with the indirect method for estimating (Section 2.3.2 in the paper), or running experiments with the direct method for estimating (Section 2.3.1 in the paper), respectively. - Each folder inside the
src/
folder has its ownrequirements.txt
listing imported libraries and versions.
Check the file running_commands.csv for commands used for all training and validation done for the paper and results presented here.
- All commands select the GPU indexed by 0, but you can change the argument
gpus
according to your needs. - For the test commands, replace the
<timestamps-id>
expression with the respective value of the training experiment folder. - For the test commands with the CIFAR-10 dataset, replace the
<best_epoch>
expression with the best epoch in terms of epsilon_0.5 considering values from epoch 33 to epoch 100. - For the ImageNet command, replace the
<robustbench_model_name>
expression with the desired model from the Linf eps=4/255 ImageNet RobustBench leaderboard. - You can run
python -m src.indirect_method.train --help
andpython -m src.direct_method.train --help
to see all available options for modifying the runs. - All commands should be run from the project base folder.
- To check test scores, open the log.txt file inside the experiment folder (
./runs/...
). - The first time some datasets are used, H5 files are created for faster loading of datasets in subsequent runs, so the first run may take an unusually long time to start producing outputs.
Example pre-trained models are available at https://www.sci.utah.edu/~datasets/gradient-direction-of-robust-models/pretrained_models.zip. To get the numbers provided in the tables below, use the commands provided in running_commands.csv, replacing the respective --load_checkpoint_g=
and --load_checkpoint_d=
with the respective paths to the provided pre-trained models. For the MNIST-3/5, MNIST, and CIFAR-10 datasets, use the generator
folder for training the models with the cosine alignment penalty, and the generator_reference
folder for testing the alignment of the robust methods. For example, for generating the result generator alignments and images for the Squares dataset, use
python -m src.train --experiment=square_vrgan_test --gpus=0 --nepochs=1 --dataset_to_use=squares --skip_train=true --split_validation=test --vrgan_training=true --load_checkpoint_g=./pretrained_models/square/generator/state_dict_g_best_epoch
. As another example, for getting the images and black-box numbers for the method for the MNIST-3/5 dataset, use
python -m src.train --dataset_to_use=mnist --experiment=mnist_cosine_test_bbox --gpus=0 --split_validation=test --unet_downsamplings=2 --load_checkpoint_g=./pretrained_models/mnist35/generator_reference/state_dict_g_best_epoch --nepochs=1 --skip_train=true --epsilons_val_attack 0.02 0.04 0.06 0.1 0.14 0.2 0.4 0.6 0.8 1.0 1.2 1.4 --load_checkpoint_d=./pretrained_models/mnist35/cosine/state_dict_d_best_epoch --blackbox_attack=true
. More complete results for the method are given in the paper, including averaged results over five random seeds.
This section shows results for estimating the vector connecting an input to its closest example of the opposite class in binary datasets.
Dataset |
Input image |
Estimated vector to the closest point of the opposite class |
Groundtruth for |
Cosine similarity between and |
---|---|---|---|---|
Square | 0.869 | |||
MNIST-3/5 |
Class |
|||||
---|---|---|---|---|---|
0 | 1.0 | 0.31 | 0.3 | 1.29 | 1.3 |
1 | 1.3 | 0.34 | 0.3 | 1.00 | 1.0 |
This section shows results for indirectly estimating the vector connecting an input to its closest example in the support of another class.
Dataset |
Input image |
Estimated vector to the closest point of the closest class |
Groundtruth for |
Cosine similarity between and |
---|---|---|---|---|
Square32 | 0.608 | |||
MNIST | ||||
CIFAR-10 |
Method |
Accuracy (%) |
|||||
---|---|---|---|---|---|---|
Baseline | 100% | 0.0055 | 0.096 | 0.0062 | 0.637 | 0.637 |
100% | 0.0077 | 0.133 | 0.0084 | 0.886 | 0.886 | |
PGD | 100% | 0.0074 | 0.127 | 0.0082 | 0.852 | 0.852 |
All the outputs of the model are saved in the runs
folder, inside a folder for the specific experiment you are running (<experiment name>_<timestamp-id>
). These are the files that are saved:
- tensorboard/events.out.tfevents.<...>: tensorboard file for following the training losses and validation score in real-time and for checking their evolution through the epochs.
- real_samples.png: a fixed batch of validation examples for which outputs will be printed
- real_samples_gt.txt: the label for each of the fixed validation images
- delta_x_gt.png: ground truth for , when training the direct generator or the generated , when training the classifier.
- robust_<epoch>.png: graph of accuracy as a function of perturbation norm of attacks.
- cosine_similarity_correct_val_<epoch>.png: histogram of cosine similarities between the gradient of the model with respect to the inputs and (histogram of )
- adversarial_samples_val_attack<epoch>.png: examples of images attacked with the selected attack method.
- adversarial_samples_gradient_val<epoch>.png: gradients of loss with respect to images, normalized to -1 to 1 range.
- delta_x_samples_<epoch>.png: estimated residual to the closest example of the closest class, created when training the direct generator.
- xprime_samples_<epoch>.png: generated closest example of the closest class, at the end of that epoch, when training the direct generator.
- real_samplesxzinit<destination_class>.png: the generated closest example after the first 600 iterations of optimization to project into the manifold of the indirect generator, with the penalty to the z norm equals to 0.
- real_samples<destination_class>.png: the generated closest example of the destination class, when performing optimization to project into the manifold of the indirect generator.
- state_dict_g_best_epoch: checkpoint for the generator model for the epoch with the highest validation score.
- state_dict_d_best_epoch: checkpoint for the classifier model for the epoch with the highest validation score.
- log.txt: a way to check the configurations used for that run and check the losses and scores of the model in text format, without loading tensorboard.
- command: command used to run the python script, including all the parser arguments.
- csv_file.csv: table containing per-example statistics for the alignment from all classes as described in Table 6, Section 3.4 from the paper.
- The code included in
src/cgan/
was cloned from https://github.com/ilyakava/BigGAN-PyTorch and modified to include more datasets. The code included insrc/indirect_method/bg
is copied from the same repository, but only includes the code necessary to run a loaded generator. - The code included in
src/.../advertorch
was copied from https://github.com/BorealisAI/advertorch/pull/74/files and https://github.com/BorealisAI/advertorch/blob/c18b5882b2c1eb2a3f650c8c9296b920e6635521/advertorch/attacks/spatial.py and slightly modified. - The code included in
src/indirect_method/util_defense_GAN.py
was inspired by https://raw.githubusercontent.com/sky4689524/DefenseGAN-Pytorch/master/util_defense_GAN.py
Most files of this project are licensed under the MIT License. Some of the files in this repository have code snippets originated from files licensed with MIT License. Files that are not licensed under the MIT License:
- Files in
src/indirect_method/advertorch/
andsrc/direct_method/advertorch/
are licensed under the GNU LESSER GENERAL PUBLIC LICENSE Version 3.
By: Ricardo Bigolin Lanfredi, ricbl@sci.utah.edu, ricbl.github.io.