Check weights and gradients for anomalies #3

pasky · 2016-02-23T22:09:45Z

We are maybe a little careless in the way we train the more complex (CNN, RNN etc.) models, in that we should carefully check the gradients and also the actual weight matrices; for CNN, some papers renormalize weights if their norm is too large, for RNN something similar might be necessary. I suspect that since we are getting reasonable-looking results, it's probably not a crucial issue, nevertheless we might get some improvements from deeply understanding the practical progression of training in our models.

pasky · 2016-03-21T00:57:49Z

http://www.lab41.org/taking-keras-to-the-zoo/ suggests that high dropout might contribute to vanishing gradient. This could explain why on Ubuntu Dialogue with very long sentences (spad=160), dropout=0 is much worse than high dropout that is beneficial for AnsSentSel (spad=80). Should be easy to verify experimentally.

pasky added enhancement help wanted labels Feb 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check weights and gradients for anomalies #3

Check weights and gradients for anomalies #3

pasky commented Feb 23, 2016

pasky commented Mar 21, 2016

Check weights and gradients for anomalies #3

Check weights and gradients for anomalies #3

Comments

pasky commented Feb 23, 2016

pasky commented Mar 21, 2016