Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the resnet cifar10 example #490

Merged
merged 1 commit into from
Aug 29, 2023

Conversation

fabianp
Copy link
Collaborator

@fabianp fabianp commented Aug 2, 2023

  1. Reaches 91% accuracy with a resnet18 model on CIFAR10, before
    we were only getting 88%. The main change that made this possible was
    to tweak the size of the convolutional kernel in the first layer, from
    7x7 (default) to 3x3. This is documented in the code, which now allows
    easily to overwrite the default kernel size.

  2. Runs in 50 epochs (vs 200 before), and only takes 13 min on free
    colab, vs 1 hour before.

  3. Removed many unused variables, and cleaned up the code.

  4. I'm now using a warm-up learning rate, which seems to be working
    great across different architectures and datasets.

@fabianp fabianp force-pushed the resnet_cifar10 branch 2 times, most recently from b657aad to 8faea3d Compare August 8, 2023 19:58
@fabianp fabianp changed the title [work in progress, not ready for review] Random improvements to the resnet cifar10 example Improvements to the resnet cifar10 example Aug 28, 2023
@fabianp fabianp marked this pull request as ready for review August 28, 2023 13:43
@fabianp fabianp requested a review from mblondel August 28, 2023 13:43
1. Reaches 91% accuracy with a resnet18 model on CIFAR10, before
we were only getting 88%. The main change that made this possible was
to tweak the size of the convolutional kernel in the first layer, from
7x7 (default) to 3x3. This is documented in the code, which now allows
easily to overwrite the default kernel size.

2. Runs in 50 epochs (vs 200 before), and only takes 13 min on free
colab, vs 1 hour before.

3. Removed many unused variables, and cleaned up the code.

4. I'm now using a warm-up learning rate, which seems to be working
great across different architectures and datasets.
Copy link
Collaborator

@mblondel mblondel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the amazing improvements. LGTM apart from the minor comment below.

:id: onkVLRw7L3j4
:id: P7Z3Vex8QuGz

ModuleDef = Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we gain from this definition? It wasn't entirely clear to me while reading the code. Maybe add a quick comment explaining why you introduce this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm doing it because flax does it that way (https://github.com/google/flax/blob/main/examples/imagenet/models.py). I guess conv: ModuleDef is more explicit (for a human) than conv: Any , although the parser will treat them equally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the Flax way, fine with me!

@copybara-service copybara-service bot merged commit 75724b5 into google:main Aug 29, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants