The final project of Deep Learning class in M2 Data Science conducted by group of 3 students. THe MoCo model in the project is referred from the article Momentum Contrast for Unsupervised Visual Representation Learning of Kaiming He et al. (2020).
Image classification is a crucial task in the field of artificial intelligence and image analysis. Deep learning with neural networks, in particular networks of convolutional neurons (CNN), have shown great performance in this area. Indeed, Deep Learning is a powerful and fashionable tool that allows you to solve tasks complex as we will see in this very project. In this project, we will study the application of several neural network models, such as multilayer perceptrons (MLP), Convolutional Neural Network (CNNs) and the Momentum Contrast (MoCo) for Unsupervised Visual Representation Learning to solve the the image classification task on the handwritten digits data MNIST data. We will also present the implementation and evaluation of these models. The objective of our project is to develop deep learning models for handwritten digits image classification and evaluate their performance using the database MNIST data. However, we assumed that we only have 100 labeled images from the MNIST dataset. Other images are unlabeled. This assumption could occur in reality, because the manual annotation of data could be tedious and expensive. In this project, we constructed several baseline neural networks including MLP, CNNs for supervised learning on 100 labeled images of MNIST data. Moreover, we also implemented a unsupervised method called Momentum Contrast (MoCo) for learning the representations of the unlabeled images. Thus, this approach makes it possible to deal with the lack of a large number of labeled data in reality.
For the supervised learning, we took randomly 100 labeled images including 10 classes (ten digits 0-9) from the MNIST train data to train the baseline neural network. These 100 labeled images were transformed into tensors and normalized by its mean and standard deviation. Those 100 images were split into a train set (75 images) and a validation set (25 images) with stratifying the proportions of 10 classes. We transformed image data into tensor and normalized by its mean and standard deviation. Since the train set has only 75 images, we apply data augmentation on the train set thanks to image rotations, scale, crop, translation, and random affine transformations to increase the number of train images. We constructed 3 neural networks including 2 Multilayer Perceptrons (MLP) and a Convolutional Neural Networks (CNNs) for training on the train set and evaluating on the validation set before testing on the MNIST test data. For the unsupervised learning with MoCo, we use 59900 unlabeled image data to pre-train the MoCo model for transferring to downstream tasks by fine-tuning with 100 labeled image data. We transformed the unlabeled image data into tensor and augmented it by using the augmentation method which contains the Random Resize Crop, Grayscale, Collor Jitter, Random Horizontal Flip, and normalized by its mean and standard deviation. Moreover, we created two random crops of one image which returns a pair of query and key for the training process. In this project, we use Resnet50 as the base encoder for both query and key encoder. Regarding the results, we compared the supevised learning with baseline neural networks and the unsupervised learning with MoCo to evaluate their performance in handwritten digit recognition of MNIST data.