Our analysis focused on achieving 99.4% test accuracy on MNIST with under 8.9k parameters based on:
To validate our target choice, we:
- Analyzed that 99.4% represents strong performance while avoiding overfitting
- Reviewed literature showing diminishing returns beyond this accuracy level
- Considered that higher accuracy would likely require >10k parameters
Target: Establish baseline model with proper regularization Results: 98.65% test accuracy achieved Analysis: Initial implementation focused on:
- Batch normalization after each convolution layer
- Dropout layers to prevent overfitting
- Basic architecture with ~8k parameters [Link to s7-1.ipynb and s7-2.ipynb]
Target: Improve model capacity while staying under parameter budget Results: 99.16% test accuracy achieved Analysis: Enhanced architecture by:
- Adding residual connections
- Optimizing kernel sizes and channels
- Maintaining ~8.9k parameters through efficient design [Link to s7-3.ipynb and s7-4.ipynb]
Target: Reach 99.4%+ accuracy through data augmentation Results: 99.5% test accuracy achieved at epoch 15 Analysis: Final improvements via:
- Random rotation augmentation (22-45 degrees)
- Learning rate optimization
- GAP layer followed by 1x1 convolution [Link to s7-5.ipynb, s7-6.ipynb, and s7-7.ipynb]
We successfully achieved our target through systematic optimization:
- Final accuracy of 99.5% exceeds target while using only 8.9k parameters
- Model shows good generalization with minimal gap between train/test accuracy
- Architecture demonstrates efficient use of parameters through proper regularization and augmentation
Future recommendations:
- Explore additional augmentation techniques
- Investigate alternative architectures within parameter budget
- Consider knowledge distillation approaches