Skip to content

The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).

Notifications You must be signed in to change notification settings

gimpong/AAAI25-S5VH

Repository files navigation

Efficient Self-Supervised Video Hashing with Selective State Spaces

[toc]

1. Introduction

This repository contains the PyTorch implementation of our work at AAAI 2025:

Efficient Self-Supervised Video Hashing with Selective State Spaces. Jinpeng Wang, Niu Lian, Jun Li, Yuting Wang, Yan Feng, Bin Chen, Yongbing Zhang, Shu-Tao Xia.

overview

We are happy to announce S5VH, the first Mamba-based video hashing model with an improved self-supervised learning paradigm. S5VH includes bidirectional Mamba layers for both the encoder and decoder, which are effective and efficient in capturing temporal relationships thanks to the data-dependent selective scanning mechanism with linear complexity. On hash learning strategy, we transform global semantics in the feature space into semantically consistent and *discriminative hash centers, followed by a center alignment loss as a global learning signal. Experiments show S5VH’s efficacy and efficiency under various setups. Our study suggests the strong potential of state-space models in video hashing, which we hope can inspire further research.

In the following, we will guide you how to use this repository step by step. 🤗

2. Preparation

git clone https://github.com/gimpong/AAAI25-S5VH.git
cd AAAI25-S5VH/

2.1 Requirements

  • python==3.11.8
  • numpy==1.26.4
  • pytorch==2.0.1
  • torchvision==0.15.2
  • mamba-ssm==2.0.4
  • scipy==1.5.4
  • h5py==3.1.0
  • addict==2.4.0

2.2 Download the video feature datasets and organize them properly

Before running the code, make sure that everything is ready. The working directory is expected to be organized as below:

AAAI25-S5VH/
  • checkpoint/
    • activitynet/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
      fcv/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
      hmdb/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
      ucf/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
  • data/
    • activitynet/
      • train_feats.h5
      • final_train_train_assit.h5
      • final_train_latent_feats.h5
      • final_train_anchors.h5
      • final_train_sim_matrix.h5
      • semantic.h5
      • hash_center_16.h5
      • hash_center_32.h5
      • hash_center_64.h5
      • test_feats.h5
      • re_label.mat
      • query_feats.h5
      • q_label.mat
    • fcv/
      • fcv_train_feats.h5
      • final_train_train_assit.h5
      • final_train_latent_feats.h5
      • final_train_anchors.h5
      • final_train_sim_matrix.h5
      • semantic.h5
      • hash_center_16.h5
      • hash_center_32.h5
      • hash_center_64.h5
      • fcv_test_feats.h5
      • fcv_test_labels.mat
    • hmdb/
      • hmdb_train_feats.h5
      • final_train_train_assit.h5
      • final_train_latent_feats.h5
      • final_train_anchors.h5
      • final_train_sim_matrix.h5
      • semantic.h5
      • hash_center_16.h5
      • hash_center_32.h5
      • hash_center_64.h5
      • hmdb_train_labels.mat
      • hmdb_test_feats.h5
      • hmdb_test_labels.mat
    • ucf/
      • ucf_train_feats.h5
      • final_train_train_assit.h5
      • final_train_latent_feats.h5
      • final_train_anchors.h5
      • final_train_sim_matrix.h5
      • semantic.h5
      • hash_center_16.h5
      • hash_center_32.h5
      • hash_center_64.h5
      • ucf_train_labels.mat
      • ucf_test_feats.h5
      • ucf_test_labels.mat
  • logs/
    • activitynet/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
    • fcv/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
    • hmdb/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
    • ucf/
      • S5VH_16bit
      • S5VH_32bit
      • S5VH_64bit
  • configs/
  • dataset/
  • inference/
  • Loss/
  • model/
  • optim/
  • utils/
  • preprocess.py
  • train.py
  • eval.py
  • requirements.txt
  • You may downloaded video features from the following Baidu Cloud links and put them into dataset-specific folder under the data/ folder.

    Dataset Video Features Hash Centers Logs and Checkpoints
    FCVID Baidu disk Baidu disk Baidu disk
    ActivityNet Baidu disk Baidu disk Baidu disk
    UCF101 Baidu disk Baidu disk Baidu disk
    HMDB51 Baidu disk Baidu disk Baidu disk

    2.3 Pre-processing: hash center generation

    Before model training, please make sure the hash centers have been generated. You may download our preprocessed version of the hash center files from the Baidu Cloud links (see the table above) and put them to the video feature folders of specific datasets. Otherwise, you can generate these files by re-executing the pre-processing code. For example, on ActivityNet:

    python preprocess.py --gpu 0 --config configs/S5VH/act.py
    

    To modify the configurations for different datasets, you can replace act.py with fcv.py, ucf.py or hmdb.py.

    2.4 Train

    The training command is as follows:

    python train.py --config configs/<MODEL_NAME>/<DATASET_NAME>.py --gpu <GPU_ID>
    

    Options:

    • <MODEL_NAME>: S5VH, LSTM, RetNet, RWKV
    • <DATASET_NAME>: act, fcv, ucf, hmdb
    • <GPU_ID>: specify the gpu id

    The logs, model checkpoints will be generated under the logs/ and checkpoint/ folders, respectively.

    2.5 Test

    We provide the evaluation code for model checkpoints (if exist). The test command is as follows:

    python eval.py --configs/<MODEL_NAME>/<DATASET_NAME>.py --gpu <GPU_ID>
    

    3. Results

    <style type="text/css"> </style>
    Dataset Code Length MAP@5 MAP@20 MAP@40 MAP@80 MAP@100 Log MAP File
    ActivityNet 16 0.180 0.097 0.060 0.034 0.029 ActivityNet-16bit.log ActivityNet-16bit.map
    32 0.250 0.146 0.087 0.049 0.040 ActivityNet-32bit.log ActivityNet-32bit.map
    64 0.266 0.152 0.095 0.053 0.043 ActivityNet-64bit.log ActivityNet-64bit.map
    FCVID 16 0.346 0.246 0.214 0.184 0.173 FCVID-16bit.log FCVID-16bit.map
    32 0.482 0.329 0.285 0.246 0.231 FCVID-32bit.log FCVID-32bit.map
    64 0.520 0.369 0.325 0.284 0.269 FCVID-64bit.log FCVID-64bit.map
    UCF101 16 0.471 0.420 0.375 0.308 0.269 UCF101-16bit.log UCF101-16bit.map
    32 0.534 0.457 0.406 0.330 0.291 UCF101-32bit.log UCF101-32bit.map
    64 0.578 0.507 0.458 0.380 0.338 UCF101-64bit.log UCF101-64bit.map
    HMDB51 16 0.190 0.128 0.095 0.065 0.057 HMDB51-16bit.log HMDB51-16bit.map
    32 0.244 0.175 0.139 0.092 0.078 HMDB51-32bit.log HMDB51-32bit.map
    64 0.256 0.189 0.150 0.103 0.088 HMDB51-64bit.log HMDB51-64bit.map

    4. References

    If you find our code useful or use the toolkit in your work, please consider citing:

    @inproceedings{Wang25_S5VH,
      author={Wang, Jinpeng and Lian, Niu and Li, Jun and Wang, Yuting and Feng, Yan and Chen, Bin and Zhang, Yongbing and Xia, Shu-Tao},
      title={Efficient Self-Supervised Video Hashing with Selective State Spaces},
      booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
      year={2025}
    }
    

    5. Acknowledgements

    This code is based on our previous work ConMH at AAAI'23. We are also grateful for other teams for open-sourcing codes that inspire our work, including SSVH, BTH, MCMSH, BerVAE, DKPH, and SHC-IR.

    6. Contact

    If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.