Alternative (Neura Voice)

original implementation of fast parallel speech signal generation from text with MFCC (2018)

Idea & Architecture

Text → (Location-based attention mechanism) → MFCC
MFCC → (parallel recurrent network) → Speech Signal

Text to Mel

This model is based on Alex Glaves「Generating Sequences With Recurrent Neural Networks」

MFCC to Speech Signal

Parallel speech signal generation vocoder model (based on WaveRNN)

    WaveRNN math::
        xt = [ct-1, ft-1, ct]  # input
        ut = σ(Ru ht-1 + Iu*xt + bu)  # update gate
        rt = σ(Rr ht-1 + Ir*xt + br)  # reset gate
        et = tanh(rt∘(Re ht-1) + Ie*xt + be)  # recurrent unit
        ht = ut∘ht-1 + (1-u)∘et  # next hidden state
        yc, yf = split(ht)  # coarse, fine
        P(ct) = softmax(O2 relu(O1 yc))  # coarse distribution
        P(ft) = softmax(O4 relu(O3 yf))  # fine distribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Alternative (Neura Voice)

Idea & Architecture

Text to Mel

MFCC to Speech Signal

Files

README.md

Latest commit

History

README.md

File metadata and controls

Alternative (Neura Voice)

Idea & Architecture

Text to Mel

MFCC to Speech Signal