Skip to content

Deep Learning and Computer Vision based Summer Project for the year 2019.

License

Notifications You must be signed in to change notification settings

IvLabs/Real-Time-Digit-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real Time Digit Classifier

Motivation

To develop from scratch, a Real Time Digit Recognizer using elementary concepts of Deep Learning and Convolutional Neural Networks along with a tinge of Image Processing and Object Tracking.


Approach

The Prelimnary Stage involved studying and learning the basics of Machine Learning and Deep Learning algorithms along with the understanding of the most primitive but effective Optimization algorithm which is the good ol' Gradient Descent and it's application to Single Node, Multi Node and even Multi Hidden Layer networks.

Coding Networks from scratch using NumPy and writing all the functions for Forward and Backward passes along with activations and calculating gradients and putting it all into an iterative learning function which leanrs and returns the optimized values of the Parameters provided the Learning Rate, Number of Iterations, Batch Size, Number of Epochs and other HyperParameters, helps one deepen their understanding of the subject and experience the complications that arise when implementing models from scratch hence serving the purpose of this project.


Models and Frameworks


A Binary Classifier can be used as a basic project to gain intuitions and get things going. It was implemented using a Single Node archtecture and a Single Hidden Layer Architecture and classifies the selection of students to an institute depending on the marks in 2 exams from this [Data Set](./Data Sets/bindata.csv).

Digit Classifier from Scratch is implemented using 2 approaches, code for both was written from scratch using basics of NumPy and was traied using the [MNIST Dataset](./Data Sets/mnist.pkl.gz).
The First being a Single Layer Perceptron with only 10 Nodes using SoftMax Activation and Gradient Descent Optimization and Gaussian Initialization (Var = 1) of weights.

The image below shows the Learning Curves for the Single Layer Perceptron with Learning Rate = 0.1

The Second is a Multi Layer Perceptron containing a single hidden layer with 200 Nodes using ReLU Activation and an Output Layer with 10 Nodes using SoftMax Activation. The optimization algorithm used was Gradient Descent and the weights were initialized using a Gaussian Distribution (Var = 1).

The image below shows the Learning Curves for the Multi Layer Perceptron with Learning Rate = 0.3



The Convolutional Approach involves training a Convolutional Neural Network with 2 Convolutional Layers, both using:
Kernel Size = 5
Padding = Same (2)
Stride = 1
Followed by Max Pooling where,
Kernel Size = 2
Stride = 2
I/O channels in Layer 1 : 1/32
I/O channels in Layer 2 : 32/64
And finally the ReLU activation.

This is then followed by a Fully Connected Layer with,
Nodes = 1000
Activation = ReLU

Finally, the Output Layer which implements SoftMax Activation using 10 Nodes.
The optimizer used was Adam Optimizer. The model was trained using PyTorch to speed up the training time. It used the MNIST Dataset as before.
The trained model was then stored to a local directory.

The image below shows the Learning Curves for the Convolutional Nueral Network with
Learning Rate = 0.001
Batch Size = 128
Number Of Epochs = 5



Digit Pad is a model developed to facilitate the user in drawing white digits onto a Black Drawing Pad of 128x128pi by using the mouse events available in OpenCV and then resizing the image to 28x28pi and passing it to the trained CNN model for classification.



Real Time Digit Classifier involves implementing the basics of Image Processing using OpenCV to capture and process the video input by applying Thresholding using HSV Color Space and Centroid Tracking to obtain the digit drawnn by the user. This is rescaled and drawn on a black and white image of suitable size (128x128pi) which is then resized to 28x28pi and passed to the trained CNN model to obtain a prediction.




The Full Demo has been uploaded here.
References and aiding articles can be found in the Documentation provided.
Further specifications with results are available on the Project Website.

Releases

No releases published

Packages

No packages published