Optimization is a critical step in the deep learning process as it determines how well the network will learn from the training data. SuperGradients supports out-of-the-box pytorch optimizers( SGD, Adam, AdamW, RMS_PROP), but also RMSpropTF and Lamb.
Optimizers should be part of the training parameters.
from super_gradients import Trainer
trainer = Trainer(...)
trainer.train(
training_params={"optimizer": "Adam", "optimizer_params": {"eps": 1e-3}, ...},
...
)
Note:
The optimizer_params
is a dictionary of all the optimizer parameters you want to set. It can be any argument defined in the optimizer __init__
method , except for params
because this argument corresponds to the model to optimize and is automatically provided by the Trainer.
When working with recipes, you need to modify the recipes/training_hyperparams as below:
# recipes/training_hyperparams/my_training_recipe.yaml
...
optimizer: Adam
optimizer_params:
eps: 1e-3
If your own optimizer is not natively supported by SuperGradients, you can always register it!
from super_gradients.common.registry.registry import register_optimizer
@register_optimizer()
class CustomOptimizer:
def __init__(
self,
params, # This arg is the only required regardless of your optimizer, the rest depends on your optimizer.
alpha: float,
betas: float
):
defaults = dict(alpha=alpha, betas=betas)
super(CustomOptimizer, self).__init__(params, defaults)
...
And then update your training hyperparameters:
# my_training_hyperparams.yaml
...
optimizer: CustomOptimizer
optimizer_params:
alpha: 1e-3
betas: 1e-3
You can define the learning rate to use on each section of your model by working with initialize_param_groups
and update_param_groups
.
initialize_param_groups
defines the groups, and the learning rate to use for each group. It is called on instantiation.update_param_groups
updates the learning rate for each group. It is called by LR callbacks (such asLRCallbackBase
) during the training.
If your model (i.e. any torch.nn.Module
) is lacking these methods, the same learning rate will be applied to every block.
But if you implement them, it will be taken into account by the Trainer just like with any other SuperGradients model.
Assuming that you have your own custom model and that you want work with a different learning rate on the backbone.
You first need to implement the initialize_param_groups
and update_param_groups
accordingly.
import torch
from super_gradients.common.registry.registry import register_model
@register_model() # Required if working with recipe
class MyModel(torch.nn.Module):
...
def initialize_param_groups(self, lr: float, training_params) -> list:
# OPTIMIZE BACKBONE USING CUSTOM LR
backbone_params = {
"named_params": self.backbone.named_parameters(),
"lr": lr * training_params['multiply_backbone_lr'] # You can use any parameter, just make sure to define it when you set up training_params
}
# OPTIMIZE MAIN ARCHITECTURE LAYERS
decoder_named_params = list(self.decoder.named_parameters())
aux_head_named_parameters = list(self.aux_head.named_parameters())
layers_params = {
"named_params": decoder_named_params + aux_head_named_parameters,
"lr": lr
}
param_groups = [backbone_params, layers_params]
return param_groups
def update_param_groups(self, param_groups: list, lr: float, epoch: int, iter: int, training_params, total_batch: int) -> list:
"""
Update the params_groups defined in initialize_param_groups
"""
param_groups[0]["lr"] = lr * training_params['multiply_backbone_lr']
param_groups[1]["lr"] = lr
return param_groups
Note: If working with recipe, don't forget to register your model.
Now you just need to set a value for multiply_backbone_lr
in the training recipe.
# my_training_hyperparams.yaml
...
multiply_backbone_lr: 10 # This is used in our implementation of initialize_param_groups/update_param_groups
optimizer: OptimizerName # Any optimizer as described in the previous sections
optimizer_params: {} # Any parameter for the optimizer you chose