PyTorch Implementation of (ACM MM'21)Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.
See requirements in requirement.txt:
- linux
- python 3.6
- pytorch 1.0+
- librosa
- json, tqdm, logging
- Put any wav files in data directory
- Edit configuration in config/config.yaml
Use our checkpoint, or
you can also train the encoder on your own here, and set the enc_model_fpath
in config/config.yaml. Please set params as those in encoder/params_data
and encoder/params_model
.
Extract mel-spectrogram
python preprocess.py -i data/wavs -o data/feature -c config/config.yaml
-i
your audio folder
-o
output acoustic feature folder
-c
config file
Training conditioned on mel-spectrogram
python train.py -i data/feature -o checkpoints/ --config config/config.yaml
-i
acoustic feature folder
-o
directory to save checkpoints
-c
config file
python inference.py -i data/feature -o outputs/ -c checkpoints/*.pkl -g config/config.yaml
-i
acoustic feature folder
-o
directory to save generated speech
-c
checkpoints file
-c
config file
For Singing Voice Synthesis:
- Take modified FastSpeech 2 for mel-spectrogram synthesis
- Use synthesized mel-spectrogram in Multi-Singer for waveform synthesis.
GE2E
FastSpeech 2
Parallel WaveGAN
@inproceedings{huang2021multi,
title={Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus},
author={Huang, Rongjie and Chen, Feiyang and Ren, Yi and Liu, Jinglin and Cui, Chenye and Zhao, Zhou},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={3945--3954},
year={2021}
}
Feel free to contact me at rongjiehuang@zju.edu.cn