Skip to content

Latest commit

 

History

History
121 lines (97 loc) · 9.92 KB

POSE_README.md

File metadata and controls

121 lines (97 loc) · 9.92 KB

Quantitative Evaluations: 2D Pose Estimation

This guide outlines the process to evaluate Sapiens-Pose checkpoints on two datasets.\

  • COCO-WholeBody: 133 keypoints (17 kps body, 6 kps feet, 68 kps face, 42 kps hands).
  • COCO: 17 keypoints

📂 1. Data Preparation

  • Set $DATA_ROOT as your training data root directory.
  • Download the val2017 images and 17 kps annotations from COCO.
  • Download the 133 kps annotations from COCO-WholeBody.
  • Unzip the images and annotations as subfolders to $DATA_ROOT.
  • Additionally, download the bounding-box detection on the val2017 set from COCO_val2017_detections_AP_H_70_person.json and place it under $DATA_ROOT/person_detection_results.

The data directory structure is as follows:

  $DATA_ROOT/
  │   └── val2017
  │   │   └── 000000000139.jpg
  │   │   └── 000000000285.jpg
  │   │   └── 000000000632.jpg
  │   └── annotations
  │   │   └── person_keypoints_train2017.json
  │   │   └── person_keypoints_val2017.json
  │   │   └── coco_wholebody_train_v1.0.json
  │   │   └── coco_wholebody_val_v1.0.json
  │   └── person_detection_results
  │   │   └── COCO_val2017_detections_AP_H_70_person.json

⚙️ 2. Configuration Update

Let $DATASET be either coco-wholebody for 133 kps or coco for 17 kps.
Edit $SAPIENS_ROOT/pose/configs/sapiens_pose/$DATASET/sapiens_1b-210e_$DATASET-1024x768.py:

  1. Update val_dataloader.dataset.data_root to your $DATA_ROOT. eg. data/coco.
  2. Update val_evaluator.ann_file to also point to validation annotation file under $DATA_ROOT.
  3. Update bbox_file to point to the bounding box detection file under $DATA_ROOT.

🏋️ 3. Evaluation

The following guide is for Sapiens-1B. You can find other backbones to evaluate under pose_configs_133 and pose_configs_17.
The testing scripts are under: $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
Make sure you have activated the sapiens python conda environment.

A. 🚀 Single-node Testing

Use $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b/node.sh.

Key variables:

  • CHECKPOINT: Absolute path to your checkpoint
  • DEVICES: GPU IDs (e.g., "0,1,2,3,4,5,6,7")
  • TEST_BATCH_SIZE_PER_GPU: Default 32
  • OUTPUT_DIR: Checkpoint and log directory
  • mode=multi-gpu: Launch multi-gpu testing with multiple workers for dataloading.
  • mode=debug: (Optional) To debug. Launched single gpu dry run, with single worker for dataloading. Supports interactive debugging with pdb/ipdb.

Launch:

cd $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
./node.sh

B. 🌐 Multi-node Testing (Slurm)

Use $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b/slurm.sh

Additional variables:

  • CONDA_ENV: Path to conda environment
  • NUM_NODES: Number of nodes (default 4, 8 GPUs per node)

Launch:

cd $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
./slurm.sh

📈 4. Results

Sapiens achieve state-of-the-art results for keypoint estimation on both datasets. Below we compare them with existing methods.

COCO-WholeBody - 133 Keypoints

PWC

Model Input Size Body AP Body AR Feet AP Feet AR Face AP Face AR Hand AP Hand AR Whole AP Whole AR Config Ckpt
DeepPose 384 × 288 44.4 56.8 36.8 53.7 49.3 66.3 23.5 41.0 33.5 48.4 - -
SimpleBaseline 384 × 288 66.6 74.7 63.5 76.3 73.2 81.2 53.7 64.7 57.3 67.1 - -
HRNet 384 × 288 70.1 77.3 58.6 69.2 72.7 78.3 51.6 60.4 58.6 67.4 - -
ZoomNAS 384 × 288 74.0 80.7 61.7 71.8 88.9 93.0 62.5 74.0 65.4 74.4 - -
VitPose+-L 256 × 192 75.3 - 77.1 - 63.0 - 54.2 - 60.6 - - -
VitPose+-H 256 × 192 75.9 - 77.9 - 63.6 - 54.7 - 61.2 - - -
RTMPose-x 384 × 288 71.4 78.4 69.2 81.0 88.8 92.2 59.0 68.5 65.3 73.3 - -
DWPose-m 256 × 192 68.5 76.1 63.6 77.2 82.8 88.1 52.7 63.4 60.6 69.5 - -
DWPose-l 384 × 288 72.2 78.9 70.4 81.7 88.7 92.1 62.1 71.0 66.5 74.3 - -
Sapiens-0.3B (Ours) 1024 × 768 66.4 73.4 67.3 78.4 87.1 91.2 58.1 67.1 62.0 69.4 config ckpt
Sapiens-0.6B (Ours) 1024 × 768 74.3 80.2 79.4 87.0 89.5 92.9 65.4 74.0 69.5 (+3.0) 76.3 (+2.0) config ckpt
Sapiens-1B (Ours) 1024 × 768 77.4 82.9 83.0 89.8 90.7 93.6 69.2 77.1 72.7 (+6.2) 79.2 (+4.9) config ckpt
Sapiens-2B (Ours) 1024 × 768 79.2 84.6 84.1 90.9 91.2 93.8 70.4 78.1 74.4 (+7.9) 81.0 (+6.7) config ckpt

COCO - 17 Keypoints

PWC

Model Input Size AP AP-50 AP-75 AP-M AP-L AR AR-50 AR-75 AR-M AR-L Config Ckpt
SimpleBaseline 256 × 192 73.5 - - 69.9 80.2 79.0 - - - - - -
HRNet 384 × 288 76.3 - - 72.3 83.4 81.2 - - - - - -
UDP 384 × 288 77.2 - - 73.2 84.4 82.0 - - - - - -
FastPose 256 × 192 73.3 - - - - - - - - - - -
HRFormer 256 × 192 77.2 - - 73.2 84.2 82.0 - - - - - -
VitPose-S 256 × 192 73.8 - - 70.5 80.4 79.2 - - - - - -
VitPose-B 256 × 192 75.8 - - 72.1 82.2 81.1 - - - - - -
VitPose-L 256 × 192 78.3 - - 74.5 85.4 83.5 - - - - - -
VitPose-H 256 × 192 79.1 - - 75.3 86.0 84.1 - - - - - -
VitPose++-S 256 × 192 75.8 - - 72.3 82.6 81.0 - - - - - -
VitPose++-B 256 × 192 77.0 - - 73.4 84.0 82.6 - - - - - -
VitPose++-L 256 × 192 78.6 - - 75.2 85.6 84.1 - - - - - -
VitPose++-H 256 × 192 79.4 - - 75.8 86.5 84.8 - - - - - -
Sapiens-0.3B (Ours) 1024 × 768 79.6 (+0.2) 93.0 85.7 76.0 85.6 83.6 95.6 89.0 79.9 89.1 config ckpt
Sapiens-0.6B (Ours) 1024 × 768 81.2 (+1.8) 93.8 87.3 77.6 87.2 84.9 96.0 90.4 81.3 90.3 config ckpt
Sapiens-1B (Ours) 1024 × 768 82.1 (+2.7) 94.2 88.2 78.4 88.3 85.9 96.6 91.3 82.1 91.4 config ckpt
Sapiens-2B (Ours) 1024 × 768 82.2 (+2.8) 94.1 88.1 78.5 88.4 86.0 96.6 91.2 82.2 91.5 config ckpt