Ideas about implementing VAE with DDSP components? #354

ketan0 · 2021-05-20T23:13:14Z

ketan0
May 20, 2021

Hi all - I was wondering if anyone has ideas / has tried implementing a VAE with DDSP components. My current setup is to take the Autoencoder architecture and modify the encoder and the loss function to be variational. The decoder and processor group are unchanged from the original autoencoder. (Here's the gin file for my model.)

It is able to fit to NSynth, using SpectralLoss for the reconstruction loss. However, it doesn't seem to be able generate novel samples; when supplying an f0/loudness, sampling z's from a Gaussian and then decoding produces almost exactly the same audio each time. upon inspection, I realized that the decoder is almost entirely relying on the f0 and loudness to reconstruct the inputs, rather than the latent embedding z. Do you have any suggestions on getting it to use z more to encode the timbre? (Perhaps the fully unsupervised variant—but that seemed to not work as well in the DDSP paper?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas about implementing VAE with DDSP components? #354

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Ideas about implementing VAE with DDSP components? #354

ketan0 May 20, 2021

Replies: 0 comments

ketan0
May 20, 2021