No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
-
Updated
Feb 9, 2022 - Python
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
AdaTask: A Task-Aware Adaptive Learning Rate Approach to Multi-Task Learning. AAAI, 2023.
A Julia package for adaptive proximal gradient and primal-dual algorithms
A Julia package for adaptive proximal gradient for convex bilevel optimization
Deep Filtering with adaptive learning rates
Implementation of a two-layer perceptron (from scratch) with four back-propagation methods in Python
Implementation of the WAME (Weight-wise Adaptive learning rates with Moving average Estimator) optimization algorithm for TensorFlow version 2.0 or higher.
Add a description, image, and links to the adaptive-learning-rate topic page so that developers can more easily learn about it.
To associate your repository with the adaptive-learning-rate topic, visit your repo's landing page and select "manage topics."