Full Implementation Available + Disclaimer #2

OverLordGoldDragon · 2019-09-27T02:41:17Z

A complete implementation of AdamW, NadamW, and SGDW - with warm restarts, cosine annealing, and per-layer learning rate multipliers is available here.

Regarding this repository's implementation, it has a major flaw: weight decay can only be set globally, for all layer weight matrices; this is rarely useful, as not all weights require decay, and most require weight-specific treatment. Also, no mention is made against using both weight decay and l2 penalty, which is important - as using both defeats the purpose of either.

Feedback is welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Implementation Available + Disclaimer #2

Full Implementation Available + Disclaimer #2

OverLordGoldDragon commented Sep 27, 2019 •

edited

Loading

Full Implementation Available + Disclaimer #2

Full Implementation Available + Disclaimer #2

Comments

OverLordGoldDragon commented Sep 27, 2019 • edited Loading

OverLordGoldDragon commented Sep 27, 2019 •

edited

Loading