Human Pose as Compositional Tokens

Environment

The code is developed using python 3.8 on Ubuntu 16.04. The code is developed and tested using 8 NVIDIA V100 GPU cards. Other platforms are not fully tested.

Usage

Installation

Clone this repo.

Setup conda environment:

conda create -n PCT python=3.8 -y
conda activate PCT
pip install -r requirements.txt

Data Preparation

To obtain the COCO dataset, it can be downloaded from the COCO download, and specifically the 2017 train/val files are required. Additionally, the person detection results can be acquired from the HRNet repository. The resulting data directory should look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

Model Zoo

To use this codebase, we provide the following models and tools:

SimMIM Pretrained Backbone: We provide SimMIM pre-trained swin models that you can download. Alternatively, you can use SimMIM repository to pretrain your own models. (Note: When loading the SimMIM model, it is normal to encounter missing keys in the source state_dict, including relative_coords_table, relative_position_index, and norm3. These missing keys do not affect the results.)
Heatmap Trained Backbone: We offer swin models that are trained on the COCO dataset with heatmap supervision. If you prefer, you can also train your own swin backbone using the command: ./tools/dist_train.sh configs/hmp_[base/large/huge].py 8
[Optional] Well-Trained Tokenizers: You can download well-trained PCT tokenizers in the zoo.
[Optional] Well-Trained Pose Models: Our well-trained PCT pose models can be found in the zoo.

After completing the above steps, your models directory should look like this:

${POSE_ROOT}
|-- weights
`-- |-- simmim
    |   `-- swin_[base/large/huge].pth
    |-- heatmap
    |   `-- swin_[base/large/huge].pth
    |-- tokenizer [Optional]
    |   `-- swin_[base/large/huge].pth
    `-- pct [Optional]
        `-- swin_[base/large/huge].pth

PCT

Stage I: Training Tokenizer

./tools/dist_train.sh configs/pct_[base/large/huge]_tokenizer.py 8

Aftering training tokenizer, you should move the well-trained tokenizer from the work_dirs/pct_[base/large/huge]_tokenizer/epoch_50.pth to the weights/tokenizer/swin_[base/large/huge].pth and then proceed to the next stage. Alternatively, you can change the config of classifier using --cfg-options model.keypoint_head.tokenizer.ckpt=work_dirs/pct_[base/large/huge]_tokenizer/epoch_50.pth to train the classifier.

Stage II: Training Classifier

./tools/dist_train.sh configs/pct_[base/large/huge]_classifier.py 8

Finally, you can test your model using the script below.

./tools/dist_test.sh configs/pct_[base/large/huge]_classifier.py work_dirs/pct_[base/large/huge]_classifier/epoch_210.pth 8 --cfg-options data.test.data_cfg.use_gt_bbox=False

Remove image guidance

Additionally, you can choose a cleaner PCT that removes image guidance. The benefit of this approach is that it doesn't require features from a backbone trained on COCO with heatmap supervision. Instead, it directly converts joint coordinates into compositional tokens, making it easier to perform various visualization and analysis tasks. This approach has a slightly reduced performance impact.

./tools/dist_train.sh configs/pct_base_woimgguide_tokenizer.py 8
./tools/dist_train.sh configs/pct_base_woimgguide_classifier.py 8

Demo

You need to install mmdet==2.26.0 and mmcv-full==1.7.0, and then use the following command to generate some image demos.

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH python vis_tools/demo_img_with_mmdet.py vis_tools/cascade_rcnn_x101_64x4d_fpn_coco.py https://download.openmmlab.com/mmdetection/v2.0/cascade_rcnn/cascade_rcnn_x101_64x4d_fpn_20e_coco/cascade_rcnn_x101_64x4d_fpn_20e_coco_20200509_224357-051557b1.pth configs/pct_[base/large/huge]_classifier.py weights/pct/swin_[base/large/huge].pth --img-root images/ --img your_image.jpg --out-img-root images/ --thickness 2

Acknowledge

Thanks to

MMPose
HRNet
VQVAE
MLPMixer

Citation

@inproceedings{Geng23PCT,
	author={Zigang Geng and Chunyu Wang and Yixuan Wei and Ze Liu and Houqiang Li and Han Hu},
	title={Human Pose as Compositional Tokens},
	booktitle={{CVPR}},
	year={2023}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Human Pose as Compositional Tokens

Environment

Usage

Installation

Data Preparation

Model Zoo

PCT

Stage I: Training Tokenizer

Stage II: Training Classifier

Remove image guidance

Demo

Acknowledge

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Human Pose as Compositional Tokens

Environment

Usage

Installation

Data Preparation

Model Zoo

PCT

Stage I: Training Tokenizer

Stage II: Training Classifier

Remove image guidance

Demo

Acknowledge

Citation