You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a model that suffers from gradient masking, so I have to use multiple (16) random restarts. I developed an implementation that is faster that the current one while using multiple restarts.
My experiments involve training a BatchNorm Free ResNet-26 on a v100 GPU for CIFAR-10 with a batch size of 64 and 16 random restarts, 20 steps. My implementation costs 170 minutes per epoch, while the current implementation costs 250 minutes per epoch.
This is a 1.47x speed-up.
The training curves of both implementations match, I'm pretty sure my implementation is correct.
Please let me know if you guys are interested in a Pull Request.
Details
The current implementation for PGD with random restarts works like this:
Input: X, Y, model
output = X.clone()
for restart_i in range(num_random_restarts):
noise = random_constrained_noise(shape=X.shape) if restart_i > 0 else 0
pert_X = X + noise
adv, is_misclass = run_pgd(pert_X, Y, model)
output[is_misclass] = adv[is_misclass]
return output
If the user has enough GPU memory - which will often be the case for CIFAR10 -, then the following implementation would improve GPU utilization:
If num_restarts is so huge that the input of size num_restarts x batch_size x C x H x W does not fit in GPU memory, then the user could also be allowed to specify a mini_num_restarts < num_restarts so that the GPU processes mini_num_restarts batches at a time. i.e. input size is reduced to mini_num_restarts x batch_size x C x H x W.
My only concern is this might negatively affect BatchNorm. But given how computationally intensive adversarial training is, this might be a worthwhile option to provide users.
The text was updated successfully, but these errors were encountered:
SohamTamba
changed the title
Better utilize GPU resources for PGD with random restarts
[Reduce Runtime] Better utilize GPU resources for PGD with random restarts
May 12, 2021
Summary
I'm working on a model that suffers from gradient masking, so I have to use multiple (16) random restarts. I developed an implementation that is faster that the current one while using multiple restarts.
My experiments involve training a BatchNorm Free ResNet-26 on a v100 GPU for CIFAR-10 with a batch size of 64 and 16 random restarts, 20 steps.
My implementation costs 170 minutes per epoch, while the current implementation costs 250 minutes per epoch.
This is a 1.47x speed-up.
The training curves of both implementations match, I'm pretty sure my implementation is correct.
Please let me know if you guys are interested in a Pull Request.
Details
The current implementation for PGD with random restarts works like this:
Input: X, Y, model
If the user has enough GPU memory - which will often be the case for CIFAR10 -, then the following implementation would improve GPU utilization:
If num_restarts is so huge that the input of size
num_restarts x batch_size x C x H x W
does not fit in GPU memory, then the user could also be allowed to specify amini_num_restarts < num_restarts
so that the GPU processesmini_num_restarts
batches at a time. i.e. input size is reduced tomini_num_restarts x batch_size x C x H x W
.My only concern is this might negatively affect BatchNorm. But given how computationally intensive adversarial training is, this might be a worthwhile option to provide users.
Reference:
robustness/robustness/attacker.py
Line 235 in 79d371f
The text was updated successfully, but these errors were encountered: