-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to the resnet cifar10 example #490
Conversation
b657aad
to
8faea3d
Compare
053f918
to
cd002f8
Compare
1. Reaches 91% accuracy with a resnet18 model on CIFAR10, before we were only getting 88%. The main change that made this possible was to tweak the size of the convolutional kernel in the first layer, from 7x7 (default) to 3x3. This is documented in the code, which now allows easily to overwrite the default kernel size. 2. Runs in 50 epochs (vs 200 before), and only takes 13 min on free colab, vs 1 hour before. 3. Removed many unused variables, and cleaned up the code. 4. I'm now using a warm-up learning rate, which seems to be working great across different architectures and datasets.
cd002f8
to
30d38e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the amazing improvements. LGTM apart from the minor comment below.
:id: onkVLRw7L3j4 | ||
:id: P7Z3Vex8QuGz | ||
|
||
ModuleDef = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we gain from this definition? It wasn't entirely clear to me while reading the code. Maybe add a quick comment explaining why you introduce this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm doing it because flax does it that way (https://github.com/google/flax/blob/main/examples/imagenet/models.py). I guess conv: ModuleDef
is more explicit (for a human) than conv: Any
, although the parser will treat them equally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the Flax way, fine with me!
Reaches 91% accuracy with a resnet18 model on CIFAR10, before
we were only getting 88%. The main change that made this possible was
to tweak the size of the convolutional kernel in the first layer, from
7x7 (default) to 3x3. This is documented in the code, which now allows
easily to overwrite the default kernel size.
Runs in 50 epochs (vs 200 before), and only takes 13 min on free
colab, vs 1 hour before.
Removed many unused variables, and cleaned up the code.
I'm now using a warm-up learning rate, which seems to be working
great across different architectures and datasets.