Skip to content

Unofficial pytorch implementation of the CycleVAE-VC.

License

Notifications You must be signed in to change notification settings

Masao-Someki/CycleVAE_VC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CycleVAE-VC implementation with Pytorch

This repository provides UNOFFICIAL CycleVAE-VC implementations with Pytorch.

You can combine your own vocoder to get great converted speech!!

Source of the figure: https://arxiv.org/pdf/1907.10185.pdf

The goal of this repository is to provide VC model trained with completely non-parallel data. Also this repository is to provide many-to-many conversion model.

I modified the model from @patrickltobing 's implementation as below. In the original model, AR structure is used for ConvRnn network. However, it takes quite a long time to train with that model. So I used RNN-based model to train faster.

What's new

  • 2020/06/11 [NEW!] Support ParallelWaveGAN in vocoder branch.
  • 2020/06/02 Support one-to-one conversion model.

Requirements

This repository is tested on Ubuntu 19.10 with a RTX2080ti with the following environment.

  • Python 3.7+
  • Cuda10.2
  • CuDNN 7+

Setup

You can setup this repository with the following commands.

$ cd tools
$ make

Please check if the venv directory is successfully located under the tools directory.

Usage

Before training the model, be sure to locate your wav files under specific directory. I assume that the structure of the wav directory is:

wav
├── train
│   ├── jvs001
│   └── jvs002
└── val
    ├── jvs001
    └── jvs002

Step0: path

  • This script is not designed for servers, which uses slurm .

  • If you are using slurm or you have some GPUs, then you have to add environment variables in path.sh

  • To set environment variables and activate virtual environment, run

    . path.sh
    

Step1: set min/max f0

  • Run the next command to generate figures

    . run.sh --stage 0
    

    and the figures will generated into ./figure directory.

  • If you don't have speaker config file in ./config/speaker , then you have to do the following

    1. Copy ./config/speaker/default.conf to ./config/speaker/<spk_name>.conf

    2. Set speaker-dependent variables there.

      The structure of the config file is:

      <minf0>
      <maxf0>
      <npow>
      

Step2: Feature extract and training model.

  • Run the next command to extract features and train the model.

    . run.sh --stage 12
    
    • stage1: Feature Extract

    • Stage2: Training

    Flags in training stage

    • conf_path : Path to the training config file. Default: ./config/vc.conf
    • model_name : Name of the saved model. Model name will be <model_name>.<num_iter>.pt .
    • log_name : Logging directory to save events files from tensorboard

Step3: Convert voice

  • Run the next command to convert voice.

    . run.sh --stage 3
    

    Flags in conversion stage

    • test_dir : Directory to save source wav files.
    • exp_dir : Directory to save converted wav files.
    • checkpoint : Path to the trained model.
    • log_name : Name of the log file.

Results

Features to be implemented in the future

  • Support gin-config

References

Acknowledgement

The author would like to thank Patrick Lumban Tobing for his repository.

Author

Someki Masao (@Masao-Someki)

e-mail : masao.someki@gmail.com

About

Unofficial pytorch implementation of the CycleVAE-VC.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published