This repo contains pytorch scheduler classes for implementing the following:
- Arbitrary LR and momentum schedulers
- Lambda function-based scheduler based on lr_scheduler.LambdaLR
- List-based scheduler that accepts explicitly defined schedule lists for LR and momentum
- Learning rate range finder for preparing the 1cycle policy
- The 1cycle policy scheduler
These classes inherit from, and and based on, the core learning rate schedulers included in Pytorch, and can be used in an identical manner, with the added ability to schedule momentum.
See detailed documentation and implementation by running:
import scheduler
help(scheduler.LambdaScheduler)
help(scheduler.ListScheduler)
help(scheduler.RangeFinder)
help(scheduler.OneCyclePolicy)
LambdaScheduler
: based on pytorch'sLambdaLR
, but can also (optionally) schedule momentum in the same way. Note that, like LambdaLR, individual schedules can be defined for each parameter group in the optimizer by passing a list of lambdas/functions/callables for LR and momentum.ListScheduler
: similar to theLambdaScheduler
, but defines LR and momentum using passed lists. Per-parameter schedules are specified using lists of lists or 2D numpy arrays.RangeFinder
: a simple predefined schedule that varies LR from 1e-7 to 1 over a certain number of epochs. This is a preparatory step for the One Cycle Policy.OneCyclePolicy
: The One Cycle Policy scheduler for LR and momentum, see References.
- Import modules and define some test data:
import torch from torch import nn from torch import optim from scheduler import * epochs = 50 x = torch.randn(100, 10)
- Instantiate model:
mdl = nn.Sequential( nn.Linear(10, 10), nn.ReLU(), nn.Linear(10, 1), nn.Sigmoid() )
- Run range test to find suitable LR:
Based on results above, let's say the max LR is 1e-2
optimizer = optim.SGD(mdl.parameters(), lr=1.23e-4) # optimizer LR is ignored range_finder = RangeFinder(optimizer, epochs) losses = [] for epoch in range(epochs): # Print achieved schedule current_lr = [g['lr'] for g in optimizer.param_groups] current_mom = [g['momentum'] for g in optimizer.param_groups] print('LR: {}, Momentum: {}'.format(current_lr, current_mom)) loss = mdl(x).mean() loss.backward() optimizer.step() optimizer.zero_grad() range_finder.step() losses.append(loss.item())
- Re-instantiate model:
mdl = nn.Sequential( nn.Linear(10, 10), nn.ReLU(), nn.Linear(10, 1), nn.Sigmoid() )
- Define 1cycle policy optimizer:
optimizer = optim.SGD(mdl.parameters(), lr=1.23e-4) # optimizer LR is ignored one_cycle = OneCyclePolicy(optimizer, 1e-2, epochs, momentum_rng=[0.85, 0.95])
- Train model:
losses = [] for epoch in range(epochs): # Print achieved schedule current_lr = [g['lr'] for g in optimizer.param_groups] current_mom = [g['momentum'] for g in optimizer.param_groups] print('LR: {}, Momentum: {}'.format(current_lr, current_mom)) loss = mdl(x).mean() loss.backward() optimizer.step() optimizer.zero_grad() one_cycle.step() losses.append(loss.item())
- A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. Leslie N. Smith, 2018, arXiv:1803.09820.