With the advancement in the field of GPU and deep learning, the task of the model to classify the images,object detection etc is now something we can acheive with relatively higher accuracy and in real-time. And with this branch of Artifical Intelligence Computer Vision started to evolve. One of the driving factors behind the growth of Computer Vision field is the amount of data(images , videos ) we generate today is sufficient to trian the Convolution Neural Network (CNN) and make Computer Vision better.
This repository contains my personal exploration and research on Convolution Neural Network , disciplined way to learn and implement the fundamentals of State Of the Art models using PyTorch library.
- Python
- Knowledge of Image( heights , width , pixels , channels)
- PyTorch
- Basics of OpenCV , Python Image Library(PIL) , matplotlib.
- GPU : Tesla T4/Tesla K8 or higher versions.
- GPU count : 1,2
- RAM - 12GB or higher
- Custom Deep Neural Network
- ResNet
- DenseNet
- GoogleNet
- Image Classification.
- Object Detection using YOLO.
- Monocular Depth Estimation.
- Object Segmentation.
- Human Pose Estimation.
- GAN's
- MNIST
- CIFAR10
- Custom Dataset for Object Detection.
- Tiny ImageNet
- Coco
- ImageNet
1. ML Intuition and Basics of CNN
Basics of python can be learnt on YouTube. Channels like Corey Shagffer and Telusko helped me a lot to learn about python basics.
Basics of CNN , how CNN learns , how different channels are formed , how DNN make sense of the inputs it gets ( Features -> Edges & Gradients -> Textures -> Patterns -> Part of Objects -> Objects -> Scenes )Please see below. Resemblance of Human brain , eyes with computer vision field.
2. CNN Architecture
Basic CNN Architecture , maintain symmetry by chosing odd size kernel(Example : 3X3 , 5X5), importance of choosing 3X3 kernel over 5X5 or higher odd kernel , Max-Pooling , Receptive Field. Below image represents convolution from 5x5-3x3-1x1 and receptive field increase from left to right as convolution occurs or layers increases.
3. Kernels and Convolution
Basic Pytorch architecture for working with neural networks, introduction to nn.Module, optimizers, forward and backward pass, datasets, how to apply simple augmentation.
4. Architecture Basics
CNN Architecture components Fully Connected Layer , Drop-Out , Softmax , Learning-Rate , Batch-Size.
Work link Summary :
- Train MNIST Dataset to get 99.40% accuracy with given contraint. Kindly check the worklink to know more.
- Parameters :
- Epoch : 20
- Learning Rate
- Batch Size
- Highest Accuracy -
- Work Link
Fully Connected layer(FC) vs Drop-Out vs Learning Rate is shown below respectively.
5. Model Implementation
Step by step approach to build neural network , debugg , and to optimize to get the best accuracy. Kindly check worklink to know more.
Work link Summary :
6. Batch Normalization and Regularization
Importance of Normalization , Batch normalization , Regularization of Datasets. Thin line difference between normalization and equalization.
Work link Summary :
- Train MNIST Dataset to get 99.40% accuracy with contraint and add regularization to it.Kindly check the worklink to know more.
- Parameters :
- Epoch : 15
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
Original Data Mean vs Normalized Data mean is shown below recpectively.
7. Advanced Convolutions
Different Types of convolution like Normal Convoultion, Dilated Convolutions, Pointwise Convolution(1x1), DECONVOLUTION or Fractionally Strided OR Transpose Convolution, Pixel Shuffle Algorithm, Depthwise Separable Convolution, Grouped Convolution. Dilated, Depthwise , Grouped is shown below respectively.
Work link Summary :
- Train CIFAR10 Dataset to get more that 80% accuracy with contraints.Kindly check the worklink to know more.
- Parameters :
- Epoch : 15
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
Dilated Convolution vs Depthwise vs Group Convolution is shown below respectively.
8. Receptive Fields and dfferent Netwwork Architecture
Introduction to different neural network architecture like AlexNet , VGG , ResNet, GoogleNet, Inception, ResNext. Different Version of it. Importance of having multiple Receptive field.
Work link Summary :
- Train CIFAR10 Dataset to get more that 85% accuracy using ResNet-18 architecture. Kindly check the worklink to know more.
- Model :
- Epoch : 15
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
Comparison of architecture like AlexNet, VGGNet, ResNet is shown below.
9. Data Augmentation/Model Diagnostics
One of the easy way to increase accuracy is to increase the receptive field(core idea of ResNet architecturec). One of the way also include regularization like DropOut , Batch Normalization , L1/L2 Regularization. All the above topic will fall short if the dataset is limited. And to tackle this we can use Data Augmentation strategy. Please see some the strategy mentiond images.
Work link Summary :
- Implement Augmentation module , GRADCAM module. And train the CIFAR10 dataset to achieve 87%+ accuracy. Kindly check the worklink to know more.
- Model :
- Epoch : 15
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- [Work Link]https://github.com/jagatabhay/TSAI/tree/master/S9)
Just have a look at different data augmentation strategy.
10. Advanced Training
LR Finder. This need to update. Work Link
11. Super-Convergence
Implementation of phenomenon( Super-Convergence/One Cycle Policy) where a neural network can be trained on a faster magnitude than a standard training without hampering accuracy of the model.This is the implementation of reasearch paper discussed here. An intuition to implement this is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve the optimal balance.
Work link Summary :
- Implement one-cycle policy along with data-augmentation strategy ad show GRADCAM module. And train the CIFAR10 dataset to achieve 90%+ accuracy. Kindly check the worklink to know more.
- Model :
- Epoch : 15
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
One Cycle Minima , Test accuracy to show significance of Super-convergnece.
12. Object Localization
Difference between Image classification and Image localization ( aka Image/Object Detection ). Detection approaches like Sliding window alogorithm, Regional propasal algorithms, Anchor box,shown below respectively. Pros and Cons of different approaches. Detailed study of latest approach anchor box - IOU ( Intersection over Union ), MAP ( Mean Aeverage Precision ), centriods, K-Means algorithms to compute centroids. Understanding YOLO-V2 loss function.
Work link Summary :
- Train Tiny-ImageNet on ResNet-18 within contraint to acheive 50%+ accuracy. worklink to know more.
- Model : ResNet-18
- Epoch : 50
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
Lets have a visualization of Sliding Window algorithm , Regional Proposal, Anchor Box.
13. YOLO 2 & 3
Introduction to YOLO and why is it called YOLO ? FPS of YOLO. Anchor Box variation on datasets.
Work link Summary :
14. RCNN
Introduction to RCNN family. RCNN family find it's root in Selective Search for Object Recognition - SSOR and Efficient Graph based Image Segmentation - EGIS. SSOR uses EGIS to create initial regions and then uses greedy algorithm to form categorize similar groups. And with the help of color channel , Image segmentation and Classification is done. Popular architecture are using SSOR and EGIS like Region with CNN features also knows as R-CNN , Fast R-CNN, Faster R-CNN where each one the architecture remove cons of previous one respectively. Now interestingly, we can add two additional convulation layer to build Mask R-CNN from Faster R-CNN architecture. Both the architecture as shown below Faster R-CNN vs Mask RCNN.
Work link Summary :
- .
- Model : ResNet-18
- Epoch : 50
- Learning Rate :
- Batch Size :
- Highest Accuracy -
- Work Link
15. Transfer Learning
this need to be updated
This project is licensed under the MIT license.
- Blogs on Medium.com and towards data science
- Research Paper on Arxiv.org
- Andrej Karapathy lecture on Youtube -
- Youtube Videos on Python,Pytorch.