Skip to content

[ICCASP2021] Learning Audio-Visual Correlations from Variational Cross-Modal Generation.

Notifications You must be signed in to change notification settings

L-YeZhu/Learning-Audio-Visual-Correlations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Learning Audio-Visual Correlations from Variational Cross-Modal Generations

This is the code implementation for the ICCASP2021 paper Learning Audio-Visual Correlations from Variational Cross-Modal Generations. In this work, we propose a Variational Autencoder with Multiple encoders and a Shared decoder (MS-VAE) framework for processing the data from visual and audio modalities. We use the AVE dataset for experiments, and thank the authors of the previous work for sharing their codes and data.

1. We implement the project using

Python 3.6
Pytorch 1.2

2. Training and pre-trained models

Please download the audio and visual features from here, and place the data files in the data folder. Note that we use the features for CML task for experiments.
To train the model, run the msvae.py.
For the cross-modal localization task, run the cml.py.
For the cross-modal retrieval task, run the retrieval.py.
The pre-trained models are also available for download: audio and visual.

3. Citation

Please consider citing our paper if you find it useful.

@InProceedings{zhu2021learning,    
  author = {Zhu, Ye and Wu, Yu and Latapie, Hugo and Yang, Yi and Yan, Yan},    
  title = {Learning Audio-Visual Correlations from Variational Cross-Modal Generation},    
  booktitle = {International Conference on Acoustics, Speech, and Signal Processing(ICCASP)},    
  year = {2021} 
  }

About

[ICCASP2021] Learning Audio-Visual Correlations from Variational Cross-Modal Generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages