Efficient Self-Supervised Video Hashing with Selective State Spaces

[toc]

1. Introduction

This repository contains the PyTorch implementation of our work at AAAI 2025:

Efficient Self-Supervised Video Hashing with Selective State Spaces. Jinpeng Wang, Niu Lian, Jun Li, Yuting Wang, Yan Feng, Bin Chen, Yongbing Zhang, Shu-Tao Xia.

We are happy to announce S5VH, the first Mamba-based video hashing model with an improved self-supervised learning paradigm. S5VH includes bidirectional Mamba layers for both the encoder and decoder, which are effective and efficient in capturing temporal relationships thanks to the data-dependent selective scanning mechanism with linear complexity. On hash learning strategy, we transform global semantics in the feature space into semantically consistent and *discriminative hash centers, followed by a center alignment loss as a global learning signal. Experiments show S5VH’s efficacy and efficiency under various setups. Our study suggests the strong potential of state-space models in video hashing, which we hope can inspire further research.

In the following, we will guide you how to use this repository step by step. 🤗

2. Preparation

git clone https://github.com/gimpong/AAAI25-S5VH.git
cd AAAI25-S5VH/

2.1 Requirements

python==3.11.8
numpy==1.26.4
pytorch==2.0.1
torchvision==0.15.2
mamba-ssm==2.0.4
scipy==1.5.4
h5py==3.1.0
addict==2.4.0

2.2 Download the video feature datasets and organize them properly

Before running the code, make sure that everything is ready. The working directory is expected to be organized as below:

AAAI25-S5VH/

checkpoint/

activitynet/

S5VH_16bit
S5VH_32bit
S5VH_64bit

S5VH_16bit
S5VH_32bit
S5VH_64bit

S5VH_16bit
S5VH_32bit
S5VH_64bit

S5VH_16bit
S5VH_32bit
S5VH_64bit

data/

activitynet/

train_feats.h5
final_train_train_assit.h5
final_train_latent_feats.h5
final_train_anchors.h5
final_train_sim_matrix.h5
semantic.h5
hash_center_16.h5
hash_center_32.h5
hash_center_64.h5
test_feats.h5
re_label.mat
query_feats.h5
q_label.mat

fcv/

fcv_train_feats.h5
final_train_train_assit.h5
final_train_latent_feats.h5
final_train_anchors.h5
final_train_sim_matrix.h5
semantic.h5
hash_center_16.h5
hash_center_32.h5
hash_center_64.h5
fcv_test_feats.h5
fcv_test_labels.mat

hmdb/

hmdb_train_feats.h5
final_train_train_assit.h5
final_train_latent_feats.h5
final_train_anchors.h5
final_train_sim_matrix.h5
semantic.h5
hash_center_16.h5
hash_center_32.h5
hash_center_64.h5
hmdb_train_labels.mat
hmdb_test_feats.h5
hmdb_test_labels.mat

ucf/

ucf_train_feats.h5
final_train_train_assit.h5
final_train_latent_feats.h5
final_train_anchors.h5
final_train_sim_matrix.h5
semantic.h5
hash_center_16.h5
hash_center_32.h5
hash_center_64.h5
ucf_train_labels.mat
ucf_test_feats.h5
ucf_test_labels.mat

logs/

activitynet/

S5VH_16bit
S5VH_32bit
S5VH_64bit

fcv/

S5VH_16bit
S5VH_32bit
S5VH_64bit

hmdb/

S5VH_16bit
S5VH_32bit
S5VH_64bit

ucf/

S5VH_16bit
S5VH_32bit
S5VH_64bit

configs/

dataset/

inference/

Loss/

model/

optim/

utils/

preprocess.py

train.py

eval.py

requirements.txt

You may downloaded video features from the following Baidu Cloud links and put them into dataset-specific folder under the data/ folder.

Dataset	Video Features	Hash Centers	Logs and Checkpoints
FCVID	Baidu disk	Baidu disk	Baidu disk
ActivityNet	Baidu disk	Baidu disk	Baidu disk
UCF101	Baidu disk	Baidu disk	Baidu disk
HMDB51	Baidu disk	Baidu disk	Baidu disk

2.3 Pre-processing: hash center generation

Before model training, please make sure the hash centers have been generated. You may download our preprocessed version of the hash center files from the Baidu Cloud links (see the table above) and put them to the video feature folders of specific datasets. Otherwise, you can generate these files by re-executing the pre-processing code. For example, on ActivityNet:

python preprocess.py --gpu 0 --config configs/S5VH/act.py

To modify the configurations for different datasets, you can replace act.py with fcv.py, ucf.py or hmdb.py.

2.4 Train

The training command is as follows:

python train.py --config configs/<MODEL_NAME>/<DATASET_NAME>.py --gpu <GPU_ID>

Options:

<MODEL_NAME>: S5VH, LSTM, RetNet, RWKV
<DATASET_NAME>: act, fcv, ucf, hmdb
<GPU_ID>: specify the gpu id

The logs, model checkpoints will be generated under the logs/ and checkpoint/ folders, respectively.

2.5 Test

We provide the evaluation code for model checkpoints (if exist). The test command is as follows:

python eval.py --configs/<MODEL_NAME>/<DATASET_NAME>.py --gpu <GPU_ID>

3. Results

Dataset	Code Length	MAP@5	MAP@20	MAP@40	MAP@80	MAP@100	Log	MAP File
Dataset	Code Length	MAP@5	MAP@20	MAP@40	MAP@80	MAP@100	Log	MAP File
ActivityNet	16	0.180	0.097	0.060	0.034	0.029	ActivityNet-16bit.log	ActivityNet-16bit.map
	32	0.250	0.146	0.087	0.049	0.040	ActivityNet-32bit.log	ActivityNet-32bit.map
	64	0.266	0.152	0.095	0.053	0.043	ActivityNet-64bit.log	ActivityNet-64bit.map
FCVID	16	0.346	0.246	0.214	0.184	0.173	FCVID-16bit.log	FCVID-16bit.map
	32	0.482	0.329	0.285	0.246	0.231	FCVID-32bit.log	FCVID-32bit.map
	64	0.520	0.369	0.325	0.284	0.269	FCVID-64bit.log	FCVID-64bit.map
UCF101	16	0.471	0.420	0.375	0.308	0.269	UCF101-16bit.log	UCF101-16bit.map
	32	0.534	0.457	0.406	0.330	0.291	UCF101-32bit.log	UCF101-32bit.map
	64	0.578	0.507	0.458	0.380	0.338	UCF101-64bit.log	UCF101-64bit.map
HMDB51	16	0.190	0.128	0.095	0.065	0.057	HMDB51-16bit.log	HMDB51-16bit.map
	32	0.244	0.175	0.139	0.092	0.078	HMDB51-32bit.log	HMDB51-32bit.map
	64	0.256	0.189	0.150	0.103	0.088	HMDB51-64bit.log	HMDB51-64bit.map

4. References

If you find our code useful or use the toolkit in your work, please consider citing:

@inproceedings{Wang25_S5VH,
  author={Wang, Jinpeng and Lian, Niu and Li, Jun and Wang, Yuting and Feng, Yan and Chen, Bin and Zhang, Yongbing and Xia, Shu-Tao},
  title={Efficient Self-Supervised Video Hashing with Selective State Spaces},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2025}
}

5. Acknowledgements

This code is based on our previous work ConMH at AAAI'23. We are also grateful for other teams for open-sourcing codes that inspire our work, including SSVH, BTH, MCMSH, BerVAE, DKPH, and SHC-IR.

6. Contact

If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Self-Supervised Video Hashing with Selective State Spaces

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the video feature datasets and organize them properly

2.3 Pre-processing: hash center generation

2.4 Train

2.5 Test

3. Results

4. References

5. Acknowledgements

6. Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Loss		Loss
configs		configs
data		data
dataset		dataset
figures		figures
inference		inference
logs		logs
model		model
optim		optim
utils		utils
.gitignore		.gitignore
README.md		README.md
eval.py		eval.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

gimpong/AAAI25-S5VH

Folders and files

Latest commit

History

Repository files navigation

Efficient Self-Supervised Video Hashing with Selective State Spaces

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the video feature datasets and organize them properly

2.3 Pre-processing: hash center generation

2.4 Train

2.5 Test

3. Results

4. References

5. Acknowledgements

6. Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages