Extract grid features for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
- Install Detectron 2 following INSTALL.md. Since Detectron 2 is also being actively updated which can result in breaking behaviors, it is highly recommended to install via the following command:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'
Commits before or after ffff8ac
might also work, but it could be risky.
- Clone this repo
git clone https://github.com/rentainhe/TRAR-Feature-Extraction.git
cd TRAR-Feature-Extraction
mkdir datasets
- Download COCO image data from official download instructions
We use coco_train_2014
, coco_val_2014
for training and evaluation, coco_test_2015
for online testing.
The file structure should look like:
datasets/coco/
├── annotations
│ ├── image_info_test2015.json
| ├── instances_train2014.json
| ├── instances_val2014.json
| ├── ......
├── train2014
│ ├── COCO_train2014_000000000009.jpg
│ ├── COCO_train2014_000000000025.jpg
| ├── ......
├── val2014
│ ├── COCO_train2014_000000000042.jpg
│ ├── COCO_train2014_000000000073.jpg
| ├── ......
├── test2015
│ ├── COCO_train2014_000000000001.jpg
│ ├── COCO_train2014_000000000014.jpg
| ├── ......
Grid feature extraction can be done by simply running once the model is trained (or you can directly download our pre-trained models, see below):
python extract_trar_grid_feature.py --config-file configs/R-50-grid.yaml --dataset <dataset> --output_dir /path/to/save/features --weight_path /path/to/pretrained/model --feature_size 8
--dataset={'coco_train_2014', 'coco_val_2014', 'coco_test_2015'}
, e.g., to set the dataset to be extracted--output_dir=str
, e.g., where to save the extracted features--weight_path=str
, e.g., where to load the pretrained model weight--feature_size=int
, default:--feature_size=8
, e.g., set feature_size to 16 to get 16*16 features
and the code will load the model weight from args.weight_path
(which one can override in command line) and start extracting features for <dataset>
and save the extracted features to args.output_dir
, we provide three options for the dataset: coco_2014_train
, coco_2014_val
and coco_2015_test
, they correspond to train
, val
and test
splits of the VQA dataset. The extracted features can be conveniently loaded in Pythia.
- For example: Extract
coco_2014_train
features usingpretrained ResNext152
Model
CUDA_VISIBLE_DEVICES=0 python extract_trar_grid_feature.py --config-file configs/X-152-grid.yaml --dataset coco_2014_train --weight_path ./weight/X-152.pth --output_dir ./data/