This project was implemented as an assignment for the course CSE574 : Introduction to Machine Learning at University at Buffalo, The State University of New York in Fall 2016. The goal of the project is to develop and compare different Machine Learning systems to recognise and classify handwritten digits.
To run the project use the command: python main.py
.
Change hyperparameters as required in params.py.
The system developed was trained on MNIST dataset. The dataset consists of 70,000 handwritten digit image samples and is partitioned to two sets containing 60,000 samples and 10,000 samples respectively. The larger of the sets is used for training and the smaller is utilized for testing.
The training set consisting of 60,000 images is further partitioned during training to form the validation set. To this end, 20% of the data i.e. 12,000 image samples have been repurposed to form the validation set.
The following machine learning systems have been developed.
- Softmax Logistic Regression
A softmax regression system was developed and trained on the dataset. The system utilized Gradiend Descent algorithm for weight optimizations.
- Single-Layered Neural Network
A single-layered neural network system was developed and trained on the dataset. The network utilized Stochastic Gradient Descent algorithm for weight optimizations and used Sigmoid function for hidden layer activation and a softmax regression output layer.
Both the systems utilize Cross Entropy Error function as the loss function.
Finding the optimal results for the system involved tuning hyperparameters to the optimal values. The following hyperparameters were tuned to the optimal values.
- L2 regularization constant - λ
- Learning rate - η
- # of units in hidden layer
The hyperparameters of the system have been tuned by iteratively running the system over different values.
The results obtained are as follows:
-
Softmax Logistic Regression
Dataset Accuracy Training 92.48% Validation 92.29% Test 92.27% - Accuracy graph
- Loss graph
-
Single-Layered Neural Network
Dataset Accuracy Training 93.03% Validation 93.26% Test 92.98% -
Additionally a Convolutional Neural Network was built using Tensorflow similar to the system defined on the Tensorflow Website The system was run for 2000 iterations instead of 20,000 iterations and a test accuracy of 97.59% was achieved on the dataset.