GitHub - L-YeZhu/Learning-Audio-Visual-Correlations: [ICCASP2021] Learning Audio-Visual Correlations from Variational Cross-Modal Generation.

Learning Audio-Visual Correlations from Variational Cross-Modal Generations

This is the code implementation for the ICCASP2021 paper Learning Audio-Visual Correlations from Variational Cross-Modal Generations. In this work, we propose a Variational Autencoder with Multiple encoders and a Shared decoder (MS-VAE) framework for processing the data from visual and audio modalities. We use the AVE dataset for experiments, and thank the authors of the previous work for sharing their codes and data.

1. We implement the project using

Python 3.6
Pytorch 1.2

2. Training and pre-trained models

Please download the audio and visual features from here, and place the data files in the data folder. Note that we use the features for CML task for experiments.
To train the model, run the msvae.py.
For the cross-modal localization task, run the cml.py.
For the cross-modal retrieval task, run the retrieval.py.
The pre-trained models are also available for download: audio and visual.

3. Citation

Please consider citing our paper if you find it useful.

@InProceedings{zhu2021learning,    
  author = {Zhu, Ye and Wu, Yu and Latapie, Hugo and Yang, Yi and Yan, Yan},    
  title = {Learning Audio-Visual Correlations from Variational Cross-Modal Generation},    
  booktitle = {International Conference on Acoustics, Speech, and Signal Processing(ICCASP)},    
  year = {2021} 
  }

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
codes		codes
data		data
README.md		README.md
fig1.png		fig1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Audio-Visual Correlations from Variational Cross-Modal Generations

1. We implement the project using

2. Training and pre-trained models

3. Citation

About

Releases

Packages

Languages

L-YeZhu/Learning-Audio-Visual-Correlations

Folders and files

Latest commit

History

Repository files navigation

Learning Audio-Visual Correlations from Variational Cross-Modal Generations

1. We implement the project using

2. Training and pre-trained models

3. Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages