GitHub - IEEE-NITK/Voice-Swapper-1: A voice-changing dictaphone

Voice-Swapper

Real-time voice conversion using GANs implemented on RPi4.
Explore the dataset »

Table of Contents

About The Project
Objectives
Scope
Roadmap
Contact
Acknowledgments

About The Project

A voice-changing dictaphone

Voice-Swapper is a dictaphone that will be used to convert the user’s voice(source) to a target voice without any loss of linguistic information. VC is useful in many applications, such as customizing audio book and avatar voices, dubbing, voice modification, voice restoration after surgery, and cloning of voices of historical persons. VC models are primarily implemented with Generative Adversarial Networks(GANs) which provide promising results by generating the user fed-in statements in the target’s voice. We aim to build these models from scratch and implement them on a NVIDIA Jetson, a commonly used, powerful device, for AI applications. This project would be an inter-sig project between Diode and CompSoc.

Use the README.md to get started.

(back to top)

Objectives

To build the generative adversarial network model from scratch. To implement these models on a NVIDIA Jetson. To perform voice swapping (conversion) in real-time.

(back to top)

Scope

If time permits, we aim to propose a novel model based on the survey/summary of model performances in VCC2020 and write a research paper based on its performance compared to the existing models.

Click here for the complete proposal.

(back to top)

Model Architecture

CycleGAN

One of the important characteristics of speech is that it has sequential and hierarchical structures, e.g., voiced or unvoiced segments and phonemes or morphemes. An effective way to represent such structures would be to use an RNN, but it is computationally demanding due to the difficulty of parallel implementations.

Instead, we configure a CycleGAN using gated CNNs that not only allow parallelization over sequential data but also achieve state-of-the-art in speech modeling. In a gated CNN, gated linear units (GLUs) are used as an activation function. A GLU is a data-driven activation function, and the gated mechanism allows the information to be selectively propagated depending on the previous layer states.

MelGAN

We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker.

(back to top)

Roadmap

(back to top)

Contact

Palgun N P - my0504palsore@gmail.com

Harish Gumnur - hari.8jan@gmail.com

Nikhil P Reddy - nikhil2002s@gmail.com

Project Link: https://github.com/IEEE-NITK/Voice-Swapper

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Swapper

About The Project

Objectives

Scope

Model Architecture

CycleGAN

MelGAN

Roadmap

Contact

About

Releases

Packages

IEEE-NITK/Voice-Swapper-1

Folders and files

Latest commit

History

Repository files navigation

Voice-Swapper

About The Project

Objectives

Scope

Model Architecture

CycleGAN

MelGAN

Roadmap

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages