An ensemble technique applied on High Resolution net to improve accuracy on pose detection task. This project was carried out as a part of CVWC-2019 for ICCV 2019 - Amur Tiger re-Identification challenge.
Challenge overview - https://cvwc2019.github.io/challenge.html
The key aspect of our approach was to improve on the corner cases on already effective HRNet network. For this reason, the following methodology was adopted:
- We conducted experiments with multi resolution images to test the effect of resolution on the model. We finally settled on 640x480 input size.
- During training we adopted a 5 fold split on the entire train+validation dataset.
- For improving the accuracy during inference, the 5 fold split was ensembled using multiple approaches - average ensemble, bagging ensemble and random forest ensemble for obtaining the best results from the solution. For the submission we have selected average ensemble as it performed the best in our experiments.
- All the models used we trained on HRNet-W32 network which were pre-trained on ImageNet dataset.
HEAD
|__models
|__pytorch
|__imagenet
|__hrnet_w32-36af842e.pth
HEAD
|__data
|__tiger
|__images
| |__train
| |__test
| |__val
|__annotations
|__<train_annotations>
|__<test_annotations>
|__bb_predictions_pose_test.json <GT bboxes for test dataset>
HEAD
|__output (or whatever name was specified in the config or Notebook)
|__<dataset>
|__pose_hrnet
|__<config>
|__contains results and intermediate training results
All trained models and pretrained models are available here: Drive_Link_For_models
The pretrained model needs to be placed as per the instructions above for the pretrained model.
For the trained models, each model must be placed in separate directories as the code looks for model directories and not model-paths( we are working on fixing that as well)
for eg:
{ROOT}
|__trained
|__<resolution>
|__model1
| |__final_state.pth
|__model2
| |__final_state.pth
|....
:
:
The experiment requires you to create 5 directories named output1, output2.... These will store the trained output models. The trained models are fed to a predictor which runs evaluation on the val/test dataset to obtain the final outputs. The outputs and the GTs are passed to the evaluator script which scores and returns the performance metrics to us. To run the above mentioned scenario do the following:
- Create your conda environment from the '.yaml' file provided in the root directory.
conda env create -f pose-env.yaml
- Run the commands
mkdir output
mkdir log
- Go to the
lib
directory and runmake
. This builds the nms library - Also install pycocotools
- Your basic setup is ready.
Run the following command:
python tools/train.py \
--cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml
Remember to change the config file path to suite your requirements.
python tools/test.py \
--cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml
All the ensemble code has been kept in the interactive python Notebook situated here:
<root>/tools/ensemble-hrnet.ipynb
Follow the instructions in sequence in the notebook.
[1] Deep High-Resolution Representation Learning for Human Pose Estimation. Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. CVPR 2019. download