Skip to content

Latest commit

 

History

History
105 lines (89 loc) · 3.65 KB

README.md

File metadata and controls

105 lines (89 loc) · 3.65 KB

lr-momentum-scheduler

This repo contains pytorch scheduler classes for implementing the following:

  • Arbitrary LR and momentum schedulers
    • Lambda function-based scheduler based on lr_scheduler.LambdaLR
    • List-based scheduler that accepts explicitly defined schedule lists for LR and momentum
  • Learning rate range finder for preparing the 1cycle policy
  • The 1cycle policy scheduler

These classes inherit from, and and based on, the core learning rate schedulers included in Pytorch, and can be used in an identical manner, with the added ability to schedule momentum.

Schedulers

See detailed documentation and implementation by running:

import scheduler
help(scheduler.LambdaScheduler)
help(scheduler.ListScheduler)
help(scheduler.RangeFinder)
help(scheduler.OneCyclePolicy)
  1. LambdaScheduler: based on pytorch's LambdaLR, but can also (optionally) schedule momentum in the same way. Note that, like LambdaLR, individual schedules can be defined for each parameter group in the optimizer by passing a list of lambdas/functions/callables for LR and momentum.
  2. ListScheduler: similar to the LambdaScheduler, but defines LR and momentum using passed lists. Per-parameter schedules are specified using lists of lists or 2D numpy arrays.
  3. RangeFinder: a simple predefined schedule that varies LR from 1e-7 to 1 over a certain number of epochs. This is a preparatory step for the One Cycle Policy.
  4. OneCyclePolicy: The One Cycle Policy scheduler for LR and momentum, see References.

The One Cycle Policy

  1. Import modules and define some test data:
    import torch
    from torch import nn
    from torch import optim
    from scheduler import *
    
    epochs = 50
    x = torch.randn(100, 10)
  2. Instantiate model:
    mdl = nn.Sequential(
        nn.Linear(10, 10),
        nn.ReLU(),
        nn.Linear(10, 1),
        nn.Sigmoid()
    )
  3. Run range test to find suitable LR:
    optimizer = optim.SGD(mdl.parameters(), lr=1.23e-4) # optimizer LR is ignored
    range_finder = RangeFinder(optimizer, epochs)
    
    losses = []
    for epoch in range(epochs):
        # Print achieved schedule
        current_lr = [g['lr'] for g in optimizer.param_groups]
        current_mom = [g['momentum'] for g in optimizer.param_groups]
        print('LR: {}, Momentum: {}'.format(current_lr, current_mom))
    
        loss = mdl(x).mean()
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        range_finder.step()
        losses.append(loss.item())
    Based on results above, let's say the max LR is 1e-2
  4. Re-instantiate model:
    mdl = nn.Sequential(
        nn.Linear(10, 10),
        nn.ReLU(),
        nn.Linear(10, 1),
        nn.Sigmoid()
    )
  5. Define 1cycle policy optimizer:
    optimizer = optim.SGD(mdl.parameters(), lr=1.23e-4) # optimizer LR is ignored
    one_cycle = OneCyclePolicy(optimizer, 1e-2, epochs, momentum_rng=[0.85, 0.95])
  6. Train model:
    losses = []
    for epoch in range(epochs):
        # Print achieved schedule
        current_lr = [g['lr'] for g in optimizer.param_groups]
        current_mom = [g['momentum'] for g in optimizer.param_groups]
        print('LR: {}, Momentum: {}'.format(current_lr, current_mom))
    
        loss = mdl(x).mean()
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        one_cycle.step()
        losses.append(loss.item())

References

  • A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. Leslie N. Smith, 2018, arXiv:1803.09820.