RESEARCH TOPIC - Comparative analysis of pre-trained image classification and object detection models on a custom image dataset
Experimenting with different State of the art Image Classification and Object Detection Deep CNNs on a custom multi domain Image dataset.
Deep Learning since its inception has revolutionised the multiple industries. Now, the cars can drive themselves(autonomous driving) and machines can diagnose the disesase(computer aided detection) much faster than humans. Thanks to the advancements made in Deep Learning. But, there are some serious limitations on using Deep Learning. They are data and computation resource hungry, effectively making it tough to adopt it quickly for solving the industry problems.
These shortcomings can be tackled using techniques such as Transfer Learning, Transfer Learning is the art of using the already trained models(not necessarily on the IID dataset) to enhance the performance of new models. This research work aims to systematically study the transfer learning ability of multiple deep learning models with different image pre-processing techniques so that it can highlight the important insights. These insights can be used as a guide while solving the real world problems.
Custom dataset is prepared for expermenting with the Image Classification and Object detection pre-trained models. Real object images from mutliple domains are collected over the web bing_image_downloader python module. The downloaded images are then labelled using makesense.ai web based tool. It's easy to use and can be used for labelling bounding boxes and even free shapes for image segmentation.
Image classification dataset contains images of single objects and can be found here, object detection dataset images contains multiple objects per images and can be found here along with the annotations.
Total 6 different types of image pre-processing techniques along with more subtypes are analysed in this research work, they are listed below:
- Color space variations (Gray, color and hue)
- Sharpening
- Blurring/De-Noising (Bilateral, Gaussian and Median)
- Thresholding (Average Thresholding)
- Morphological Transformations (Opening, Closing)
- Edge Detections (Canny Edge Detection)
Implementations of all the techniques are used from OpenCV.
20+ pre-trained models along with the architectural variations are analysed in this research work. Models weights are collected from the different software frameworks such as Keras Applications API, PyTorch.
10+ object detection models including the single stage and multi stage detectors are analysed. The models were collected from the Tensorflow2 Object Detection API and also from the PyTorch. Some other open source Github repositories are used for analysing the state of the art models such as PP-YOLO and RetinaNet.
EfficientNet family of models were able to perform significantly well, Inception V3 was the runner up with a little less accuracy. The models achieved the highest accuracy on the raw dataset, i.e. no other pre processing techniques helped in gaining the accuracy.
EfficientDet family of models outperforms others, unlike the case in image classification models, techniques such as Bilateral blurring helped in gaining a slight edge in the model's performance as compared to others.
EfficientNet family of models has surely impacted the recent state of the art performance and can be used as a good starting point to train a domain/problem specific models. Techniques such as blurring can help in gaining the slight edge in performance hence can be considered especially in object detection.