Count-Sketch Optimizers

Compressing Gradient Optimizers via Count-Sketches

An ICML 2019 paper by Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

BERT-Large Training Results

Trained with Activation Checkpointing and Mixed Precision Training (FP16) on Nvidia V100 DGX-1 servers

BERT-Large	Adam	Count-Min Sketch (CMS) - RMSprop
Time (Days)	5.32	5.52
Size (MB)	7,097	5,133
Test Perplexity	4.04	4.18

Instructions

Install Requirements
Add optimizers folder to $PYTHONPATH

Requirements

torch
torchvision
cupy
pynvrtc

Examples

ImageNet - ResNet-18
LM1B - Transformer / LSTM
Wikitext-2 - LSTM

Dense Layer Support

We support compressing the dense layers of the neural network without update sparsity. During training, we update the auxiliary variables and perform the gradient update for each parameter in a single fused CUDA kernel. The dense kernel is equivalent to the sparse kernel. The main difference is that we explicitly avoid generating the auxiliary variables for the dense layers in global memory. Instead, we access them inside the shared memory of the GPU Streaming Multiprocessor. Without this key feature, our approach would not save any GPU memory for the dense layers. In the sparse case, we assume that the non-zero gradient updates is significantly smaller than the auxiliary variable. (See dense_exp_cms.py for more details)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
optimizers		optimizers
paper		paper
results		results
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Count-Sketch Optimizers

BERT-Large Training Results

Instructions

Requirements

Examples

Dense Layer Support

References

About

Releases

Packages

Languages

License

rdspring1/Count-Sketch-Optimizers

Folders and files

Latest commit

History

Repository files navigation

Count-Sketch Optimizers

BERT-Large Training Results

Instructions

Requirements

Examples

Dense Layer Support

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages