Skip to content

First dive into designing convolutional neural networks for image recognition

Notifications You must be signed in to change notification settings

mcreelma/EMNIST_Neural_Network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMNIST Network Performance Report

Problem Statement:

The goal of this project is to construct a convolutional neural network capable of classifying handwritten numbers and letters from the EMNIST dataset. This dataset is commonly used as a benchmark dataset for machine learning exercises and competitions, with some network architectures achieving testing accuracies as high as 97.9% (Achintya, 2020). Additionally, this dataset has become one of the standard test pieces for innovations in neural network design (Baldominos, 2019)

My goal for this project was not so much to achieve a new accuracy for the dataset, but rather to construct a working neural network which would achieve satisfactory performance, which I defined as achieving over 85% accuracy on the testing dataset based on the accuracy ranges of published data (paperswithcode.com, 2017).

Dataset and Training:

The EMNIST dataset is a collection of handwritten characters including letters and digits (NIST, 2019). It is based on the modified National Institute of Standards and Technology Database which includes 60,000 handwritten digits used for training image recognition systems (LeCun et. al, 1998). This dataset was then extended to 814,255 characters and digits in order to form the EMNIST dataset (NIST, 2019), however, this dataset does not include an equal probability of each character and thus risks overtraining certain characters. I am training my algorithm on the balanced EMNIST datasets, which includes 131,600 characters from 47 balanced classes (Cohen et. al, 2017). Using the balanced dataset means that there are an equal number of elements from each class, and thus the algorithm will not develop a tendency to identify more frequently occurring characters.

The dataset was split using a 1/6 training/validation split, which involves 5/6th of the dataset being used to train the model and the remaining 1/6th being used to validate the results after each epoch. The training process was conducted with mini batches of the overall data in order to minimize the number of weights calculated during each epoch by using a randomly selected subsample for training.

Network Architecture:

Table 2 - Modification Progression of Neural Network Design

Batch Size Learning Rate Epochs Testing Accuracy Avg Loss Notes
30 0.00001 60 83.0 0.0490 basic structure as initially outlined
40 0.00005 60 85.5 0.0537 just altered hyper parameters for batch size and learning rate
40 0.00005 60 50.7 3.8100 added softmax dim = -1 at the end
40 0.00005 60 85.9 0.0547 softmax back to relu and final layer dropped from 120to 80
40 0.00005 60 87.0 0.0482 added dropout at the end with p = 0.05
40 0.00005 60 59.5 0.2050 rrelu to relu
40 0.00005 60 59.2 0.1715 lose dropout
64 0.00005 40 86.0 0.0433 rrelu back, added a last layer of 80 to 47
64 0.00005 40 86.2 0.0435 used SGD for optimization
64 0.00005 40 86.9 0.0377 dropout between fc 1 and 2 (300 to 160)
64 0.00005 40 87.1 0.0377 dropout inplace = true
64 0.00005 40 87.2 0.0060 dropout p = 0.1, back to ADAM for optimization
64 0.00005 40 86.0 0.0064 added second dropout between 2 and 3
64 0.00005 60 86.7 0.0061 upped to 60 epochs to let it keep training
64 0.00005 60 87.6 0.0056 dropped dropout and kept 60 count
64 0.00005 80 88.2 0.0053 80 reps
64 0.00005 120 88.4 0.0053 120 reps

Table 2 - Final Network Design and Total Number of Weights

Network Architecture
Layer type In Out Size Number of Weights
Convolution 1 20 28 x 28 1.57E+04
Convolution 20 30 28 x 28 4.70E+05
maxpool 30 30 14 x 14 1.38E+08
Convolution 30 30 14 x 14 1.76E+05
Convolution 30 10 14 x 14 5.88E+04
maxpool 10 10 7 x 7 9.60E+05
flatten 49 49 1 x L 0
Linear 490 300 1 x L 147000
Linear 300 160 1 x L 48000
Linear 160 80 1 x L 12800
Linear 80 47 1 x L 3760
Total Weights 1.40E+08

Network Results:

Overall, my network achieved a performance of 88.4% which is comparable with the accuracies of many of benchmark networks, which range in performance from 50.93% to 95.96% accuracy on the testing data (paperswithcode.com, 2017). While certainly not as accurate, I would call my accuracy of 88.4% or 16619/18800 to be above average. Additionally, the improvements of accuracy by epoch seemed to reach a plateau around 80 or so training sessions, as can be seen in Figure 1. Obviously these results can always be improved and I will continue to refine my design and implementation techniques through further iterations.

Accuracy_By_Epoch

Figure 1. - Training Accuracy vs. Epoch of final network design

Citations

A. Agnes Lydia and , F. Sagayaraj Francis, Adagrad - An Optimizer for Stochastic Gradient Descent, Department of Computer Science and Engineering, Pondicherry Engineering College, May 2019.

Baldominos A, Saez Y, Isasi P. A Survey of Handwritten Character Recognition with MNIST and EMNIST. Applied Sciences. 2019; 9(15):3169. https://doi.org/10.3390/app9153169 Add to Citavi project by DOI

Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters. Retrieved from http://arxiv.org/abs/1702.05373

“The EMNIST Dataset.” NIST, 28 Mar. 2019, https://www.nist.gov/itl/products-and-services/emnist-dataset.

Nielsen, Michael A. Neural Networks and Deep Learning, Determination Press, 1 Jan. 1970, http://neuralnetworksanddeeplearning.com/.

Tripathi, Achintya. “EMNIST Letter Dataset 97.9%:ACC & VAL_ACC: 91.78%.” Kaggle, Kaggle, 16 Aug. 2020 ,https://www.kaggle.com/code/achintyatripathi/emnist-letter-dataset-97-9-acc-val-acc-91-78.

LeCun, Yann, et al. “The Mnist Database.” MNIST Handwritten Digit Database, Yann LeCun, Corinna Cortes and Chris Burges, Nov. 1998, http://yann.lecun.com/exdb/mnist/.

Paperswithcode.com. (2017). Papers with code - EMNIST-letters benchmark (image classification). EMNIST Benchmark Algorithms. Retrieved November 19, 2022, from https://paperswithcode.com/sota/image-classification-on-emnist-letters

Li, Fei-Fei. “Convolutional Neural Networks (CNNs / ConvNets).” CS231N Convolutional Neural Networks for Visual Recognition, Stanford University, Jan. 2022, https://cs231n.github.io/convolutional-networks/.

Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical Evaluation of Rectified Activations in Convolutional Network. http://arxiv.org/abs/1505.00853

About

First dive into designing convolutional neural networks for image recognition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published