Please download the AVSBench dataset including S4 and MS3 subsets following https://github.com/OpenNLPLab/AVSBench.
We have extracted flows based on AVSBench S4 and MS3 datasets. Please download the data via OneDrive Link.
We additionally sampled semantic-similar unannotated samples to enhance MS3 data in training. Please download the data via OneDrive Link.
Following previous works such as TPAVI and AVSegFormer, we utilize image-pretrained ResNet and PVT-v2 as image backbones, and AudioSet pretrained VGGish as audio backbone. Please download the pretrained backbones via OneDrive Link.
We release the model weights for inference, please download them via OneDrive Link.