Skip to content

Latest commit

 

History

History
42 lines (27 loc) · 1.5 KB

README.md

File metadata and controls

42 lines (27 loc) · 1.5 KB

Alternative (Neura Voice)

original implementation of fast parallel speech signal generation from text with MFCC (2018)

Idea & Architecture

  • Text → (Location-based attention mechanism) → MFCC
  • MFCC → (parallel recurrent network) → Speech Signal

Text to Mel

FFTNet architecture

This model is based on Alex Glaves「Generating Sequences With Recurrent Neural Networks」

FFTNet architecture FFTNet architecture

FFTNet architecture

FFTNet architecture

FFTNet architecture

MFCC to Speech Signal

Parallel speech signal generation vocoder model (based on WaveRNN)

    WaveRNN math::
        xt = [ct-1, ft-1, ct]  # input
        ut = σ(Ru ht-1 + Iu*xt + bu)  # update gate
        rt = σ(Rr ht-1 + Ir*xt + br)  # reset gate
        et = tanh(rt∘(Re ht-1) + Ie*xt + be)  # recurrent unit
        ht = utht-1 + (1-u)∘et  # next hidden state
        yc, yf = split(ht)  # coarse, fine
        P(ct) = softmax(O2 relu(O1 yc))  # coarse distribution
        P(ft) = softmax(O4 relu(O3 yf))  # fine distribution