Code
Project Documents
Output Videos
This project is about the development of an Anchor free 2D object detection model using PyTorch, that aims to provide a comprehensive guide for enthusiasts, researchers, and practitioners. Here the object detection model is trained from scratch, incorporating a ImageNet pre-trained backbone from PyTorch. The model is trained using a modest system configuration ( NVIDIA RTX A2000 4 GB Laptop GPU ), thus enabling users with low computational resources to train object detection models that give resonably good performance. An easy to understand and extend codebase is developed in this project. The following are the key highlights:
- Training a 2D object detection Model in PyTorch from scratch by utilizing Imagenet dataset pre-trained backbone from PyTorch.
- Development of an easy to understand and well documented codebase.
- Implementation of a method for tuning the detection threshold parameters.
- Utilizing training samples from two publicly available datasets: KITTI and BDD, so as to provide a technique to merge samples from multiple training datasets, enabling users to utilize a diverse range of training data for model generalization.
opencv_python>=4.8.0.74
imageio>=2.34.0
matplotlib>=3.7.2
numpy>=1.25.0
torch>=2.0.1
torchvision>=0.15.2
tqdm>=4.66.1
git clone https://github.com/UditBhaskar19/ANCHOR_FREE_OBJECT_DETECTOR_FOR_CAMERA
cd ANCHOR_FREE_OBJECT_DETECTOR_FOR_CAMERA/AnchorFree2DObjectDetection
# to run inference on bdd video frames
python video_inference_bdd.py
# to run inference on kitti video frames
python video_inference_kitti.py
# to create the labels file
python script1_create_dataset.py
# to train the model use script3_train_model.ipynb
# to write detections to video
cd write_detections_to_video
python write_detection_to_video_bdd.py
AnchorFree2DObjectDetection
│───doc # Project documents
│───hyperparam # Statistical data of the Bounding Box offsets
│───labels # aggregated GT labels data of KITTI and BDD dataset
│───mAP # module to compute mAP ( https://github.com/Cartucho/mAP.git )
│───model_weights # model weights data after training
│───tensorboard # data folder for loss visualization in tensorboard.
│───modules # main modules
│───augmentation # scripts for image augmentation functions
│───dataset_utils # scripts for data analysis and dataset generation
│───evaluation # scripts for detector evaluation and threshold determination
│───first_stage # scripts for defining the model and ground truth generation function for dense object detection
│───hyperparam # scripts for computing the bounding box offsets statistics from training data
│───loss # loss functions
│───neural_net # scripts for defining various neural net blocks
│───backbone # model backbone blocks
│───bifpn # BIFPN blocks for model neck
│───fpn # FPN blocks for model neck
│───head # blocks for model head
│ common.py # common model building blocks
│ constants.py # constants for model construction
│───plot # contains plotting functions
│───pretrained # scripts for loading the pre-trained backbone from pytorch
│───proposal # scripts for proposal generation
│───second-stage # <work under progress> scripts for defining the model and ground truth generation function for second stage object detection
│───tests # folder for testing and validation scripts
│───video_inference # detection results saved as video
│───write_detections_to_video # scripts to save detections as video, results are saved in 'video_inference' folder
│ config_dataset.py # parameters and constants for dataset
│ config_neuralnet_stage1.py # model design parameters
│ script1_create_datasets.py # aggregate gt labels and save it inside the 'labels' folder
│ script2_gen_hyperparam.py # aggregate and save the box offsets and its statistics inside the 'hyperparam' folder
│ script3_train_model.ipynb # notebook to train the model
│ script4_inference_bdd.ipynb # run inference on the bdd dataset images
│ script4_inference_kitti.ipynb # run inference on the kitti dataset images
│ script5_compute_mAP_bdd.ipynb # compute mean average precison (mAP) on the bdd dataset
│ script5_compute_mAP_kitti.ipynb # compute mean average precison (mAP) on the kitti dataset
│ video_inference_bdd.py # run inference on the bdd dataset video
│ video_inference_kitti.py # run inference on the kitti dataset frame sequence video
To have good performance from a trained object detection model, the training dataset needs to be large, diverse, balanced and the annotation has to be correct. BDD dataset is adequately large to train a resonably good performing model. Below are the data analysis conducted to get an insight about the quality of the dataset where good quality means that the training dataset has to be diverse and balanced.
Number of instances of different classes and scenes.
Observations
- There is a huge intra-class as well as inter-clss imbalance in the dataset (depends on how we are considering the intra and inter class).
- The intra-class imbalance is present in the number of instances of traffic light, where there are much less number of yellow traffic lights. The red and green instances are resonably balanced.
- The intra-class imbalance is also observed in the number of instances of road vehicles, where the car class has huge number of instances than other classes like truck and bus.
- The inter-class imbalance can be seen in the number of instances of vehicles and non-vehicles, where the car class has huge number of instances than other classes like person, rider, train etc.
Annotated bounding box dimension scatter plot.
Observations
- From the plot we can observe that there are some boxes that are probably incorrect annotations. These either have extreme aspect ratio or the area is too small
If we select those boxes from the previous scatter plot that has some extreme aspect ratio or the area is very small, we would be able to identfy annotation errors. Some of them can be categorized as follows.
-
Box area too small
-
Extreme Box Aspect Ratio
-
Incorrect Class
- Simplify the development of object detection model in version 1 by reducing the number of classes and removing the highly imbalanced and irrelevant classes.
- Reduce the number of wrong and low quality annotations.
-
Car, bus, truck are merged as vehicle; person and rider are merged as person. The remaining classes are part of negative class.
- Select boxes that satisfy the below conditions:
- Box width ≥ 5 pixels
- Box heighth ≥ 5 pixels
- 0.1 ≤ Box aspect ratio ≤ 10
- Person class suffers from low recall due to much less number of training samples
- The basic building block of the model is weight standardized conv2d followed by group norm and a non-linear activation. This helped in setting the batch size small (6 in this case) so that it fits in the gpu memory. It also helps in keeping the training stable (no NaNs).
- There are ways to improve the performance. Some of them are: fine-tuning the backbone, utilizing several other open source datasets, taking a second stage to improve recall, training the model end to end for different tasks such as segmentation and tracking. These shall be part of future releases
-
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
- https://www.cvlibs.net/datasets/kitti/
Based on the above analysis the training samples and the dataset annotations are modified to
The modifications are as follows:
Relevant Scripts (BDD)
|
Relevant Scripts (KITTI)
|
Each of the anchors corrospond to an object hypothesis where the network shall learn to predict 4 values : box offsets, centerness score, objectness score, and classification score from the image. The groundtruth for training is computed as follows.
Augmentation is performed during training. The augmentation process is depicted as follows
|
Either SGD with momentum or AdamW oprimization method can be used. Refer to these scripts for more details:
Detection Rate vs False Positives per image (ROC Curve)
Recall vs Precision (PR Curve)
Comparing performance for Vehicle and Person class
mAP at different detection threshold ( computed using Link )
|
Relevant Scripts
Detection Rate vs False Positives per image (ROC Curve)
Recall vs Precision (PR Curve)
Comparing performance for Vehicle and Person class
Result | Visualization | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
mAP at different detection threshold ( computed using Link )
|
Relevant Scripts