You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A complete implementation of AdamW, NadamW, and SGDW - with warm restarts, cosine annealing, and per-layer learning rate multipliers is available here.
Regarding this repository's implementation, it has a major flaw: weight decay can only be set globally, for all layer weight matrices; this is rarely useful, as not all weights require decay, and most require weight-specific treatment. Also, no mention is made against using both weight decay and l2 penalty, which is important - as using both defeats the purpose of either.
Feedback is welcome.
The text was updated successfully, but these errors were encountered:
A complete implementation of AdamW, NadamW, and SGDW - with warm restarts, cosine annealing, and per-layer learning rate multipliers is available here.
Regarding this repository's implementation, it has a major flaw: weight decay can only be set globally, for all layer weight matrices; this is rarely useful, as not all weights require decay, and most require weight-specific treatment. Also, no mention is made against using both weight decay and l2 penalty, which is important - as using both defeats the purpose of either.
Feedback is welcome.
The text was updated successfully, but these errors were encountered: