arch. | pt. | #frames | ckp. |
---|---|---|---|
TAdaFormer-B/16 | CLIP | 16 | ckp |
TAdaFormer-L/14 | CLIP | 16 | ckp |
TAdaFormer-L/14 | CLIP | 32 | ckp |
TAdaFormer-L/14 | CLIP | 64 | ckp |
arch. | pt. | #frames | GFLOPS | top1 | ckp. |
---|---|---|---|---|---|
TAdaConvNeXtV2-T | IN1K | 16 | 47x3x4 | 79.6 | ckp |
TAdaConvNeXtV2-T | IN1K | 32 | 94x3x4 | 80.8 | ckp |
TAdaConvNeXtV2-S | IN1K | 16 | 91x3x4 | 80.8 | ckp |
TAdaConvNeXtV2-S | IN1K | 32 | 183x3x4 | 81.9 | ckp |
TAdaConvNeXtV2-S | IN21K | 32 | 183x3x4 | 82.9 | ckp |
TAdaConvNeXtV2-B | IN1K | 16 | 162x3x4 | 81.4 | ckp |
TAdaConvneXtV2-B | IN1K | 32 | 324x3x4 | 82.3 | ckp |
TAdaConvNeXtV2-B | IN21K | 32 | 324x3x4 | 83.7 | ckp |
arch. | pt. | #frames | GFLOPS | top1 | ckp. |
---|---|---|---|---|---|
TAdaFormer-B/16 | CLIP | 16 | 153x3x4 | 84.5 | ckp |
TAdaFormer-L/14 | CLIP | 16 | 703x3x4 | 87.6 | ckp |
TAdaFormer-B/16 | CLIP+K710 | 16 | 153x3x4 | 86.6 | ckp |
TAdaFormer-L/14 | CLIP+K710 | 16 | 703x3x4 | 88.9 | ckp |
TAdaFormer-L/14 | CLIP+K710 | 32 | 1406x3x4 | 89.5 | ckp |
TAdaFormer-L/14 | CLIP+K710 | 64 | 2812x3x4 | 89.9 | ckp |
The checkpoints in this part is provided for SSV2.
arch. | pt. | #frames | GFLOPS | SSV1 | SSV2 | ckp. |
---|---|---|---|---|---|---|
TAdaConvNeXtV2-T | IN1K+K400 | 16 | 47x3x2 | 54.1 | 67.2 | ckp |
TAdaConvNeXtV2-T | IN1K+K400 | 32 | 94x3x2 | 56.4 | 69.8 | ckp |
TAdaConvNeXtV2-S | IN1K+K400 | 16 | 91x3x2 | 55.6 | 68.4 | ckp |
TAdaConvNeXtV2-S | IN1K+K400 | 32 | 183x3x2 | 58.5 | 70.0 | ckp |
TAdaConvNeXtV2-S | IN21K+K400 | 32 | 183x3x2 | 59.7 | 70.6 | ckp |
TAdaConvneXtV2-B | IN21K+K400 | 32 | 324x3x2 | 60.7 | 71.1 | ckp |
arch. | pt. | #frames | GFLOPS | SSV1 | SSV2 | ckp. |
---|---|---|---|---|---|---|
TAdaFormer-B/16 | CLIP | 16 | 187x3x2 | 59.2 | 70.4 | ckp |
TAdaFormer-B/16 | CLIP | 32 | 374x3x2 | 61.2 | 71.3 | ckp |
TAdaFormer-L/14 | CLIP | 16 | 858x3x2 | 62.0 | 72.4 | ckp |
TAdaFormer-L/14 | CLIP | 32 | 1716x3x2 | 63.7 | 73.6 | ckp |
architecture | depth | init | clips x crops | #frames x sampling rate | acc@1 | acc@5 | checkpoint | config |
---|---|---|---|---|---|---|---|---|
TAda2D | R50 | IN-1K | 10 x 3 | 8 x 8 | 76.7 | 92.6 | [google drive][baidu(code:p06d)] | tada2d_8x8.yaml |
TAda2D | R50 | IN-1K | 10 x 3 | 16 x 5 | 77.4 | 93.1 | [google drive][baidu(code:6k8h)] | tada2d_16x5.yaml |
ViViT Fact. Enc. | B16x2 | IN-21K | 4 x 3 | 32 x 2 | 79.4 | 94.0 | [google drive][baidu(code:1t51)] | vivit_fac_enc_b16x2.yaml |
architecture | depth | init | clips x crops | #frames | acc@1 | acc@5 | checkpoint | config |
---|---|---|---|---|---|---|---|---|
TAda2D | R50 | IN-1K | 2 x 3 | 8 | 64.2 | 88.0 | [google drive][baidu(code:dlil)] | tada2d_8f.yaml |
TAda2D | R50 | IN-1K | 2 x 3 | 16 | 65.6 | 89.1 | [google drive][baidu(code:f857)] | tada2d_16f.yaml |
architecture | init | resolution | clips x crops | #frames x sampling rate | action acc@1 | verb acc@1 | noun acc@1 | checkpoint | config |
---|---|---|---|---|---|---|---|---|---|
ViViT Fact. Enc.-B16x2 | K700 | 320 | 4 x 3 | 32 x 2 | 46.3 | 67.4 | 58.9 | [google drive][baidu(code:rinh)] | vivit_fac_enc.yaml |
ir-CSN-R152 | K700 | 224 | 10 x 3 | 32 x 2 | 44.5 | 68.4 | 55.9 | [google drive][baidu(code:s0uj)] | csn.yaml |
feature | classification | type | IoU@0.1 | IoU@0.2 | IoU@0.3 | IoU@0.4 | IoU@0.5 | Avg | checkpoint | config |
---|---|---|---|---|---|---|---|---|---|---|
ViViT | ViViT | Verb | 22.90 | 21.93 | 20.74 | 19.08 | 16.00 | 20.13 | [google drive][baidu(code:3sud)] | vivit-os-local.yaml |
ViViT | ViViT | Noun | 28.95 | 27.38 | 25.52 | 22.67 | 18.95 | 24.69 | [google drive][baidu(code:3sud)] | vivit-os-local.yaml |
ViViT | ViViT | Action | 20.82 | 19.93 | 18.67 | 17.02 | 15.06 | 18.30 | [google drive][baidu(code:3sud)] | vivit-os-local.yaml |
TAda2D | TAda2D | Verb | 19.70 | 18.49 | 17.41 | 15.50 | 12.78 | 16.78 | [google drive][baidu(code:d01j)] | - |
TAda2D | TAda2D | Noun | 20.54 | 19.32 | 17.94 | 15.77 | 13.39 | 17.39 | [google drive][baidu(code:d01j)] | - |
TAda2D | TAda2D | Action | 15.15 | 14.32 | 13.59 | 12.18 | 10.65 | 13.18 | [google drive][baidu(code:d01j)] | - |
Note: for the following models, decord 0.4.1 are used rather than the default 0.6.0 for the codebase.
dataset | backbone | checkpoint | config |
---|---|---|---|
HMDB51 | R-2D3D-18 | [google drive][baidu(code:ahqg)] | pt-hmdb/r2d3ds.yaml |
HMDB51 | R(2+1)D-10 | [google drive][baidu(code:1ktb)] | pt-hmdb/r2p1d.yaml |
UCF101 | R-2D3D-18 | [google drive][baidu(code:61uw)] | pt-ucf/r2d3ds.yaml |
UCF101 | R(2+1)D-10 | [google drive][baidu(code:drq2)] | pt-ucf/r2p1d.yaml |
dataset | backbone | acc@1 | acc@5 | checkpoint | config |
---|---|---|---|---|---|
HMDB51 | R-2D3D-18 | 46.93 | 74.71 | [google drive][baidu(code:2puu)] | ft-hmdb/r2d3ds.yaml |
HMDB51 | R(2+1)D-10 | 51.83 | 78.63 | [google drive][baidu(code:hgnc)] | ft-hmdb/r2p1d.yaml |
UCF101 | R-2D3D-18 | 71.75 | 89.14 | [google drive][baidu(code:ndt6)] | ft-ucf/r2d3ds.yaml |
UCF101 | R(2+1)D-10 | 82.79 | 95.78 | [google drive][baidu(code:ecsf)] | ft-ucf/r2p1d.yaml |