Timbre interpolation #276

teo-sanchez · 2020-11-23T13:46:26Z

teo-sanchez
Nov 23, 2020

Hello,
First, thank you for the awesome work. I'm trying to reproduce the timbre interpolation you present in the sound examples section. I have trained the auto-encoder (model/ae.gin) with flute and violin sounds (about 20 min each) for about 30,000 epochs. I can't get a clear timbre from either instrument when I decode with the z encoded for flute and violin separately. Do you have any examples or practical advice to improve the results?
Have you conditioned z, trained with longer epochs or longer audio files, or used a specific architecture to handle multiple timbres by any chance?
Here are the generated results.
Thank you in advance for your thoughts. Best,
Téo

jesseengel · 2020-11-28T19:45:44Z

jesseengel
Nov 28, 2020
Maintainer

Interesting experiment. The paper's interpolation experiments were done on the NSynth dataset. Sounds like your model is having trouble fitting the data of both instruments. You could try a bigger model, which might help, but my best guess is that the two datasets have a very different reverb response, but you're likely trying to use the same reverb on both datasets, so you'd need to incorporate a way to interpolate the reverb too. One way to try that would be to label your dataset with a one-hot of which instrument it is, and load up a different reverb IR for each different instrument during training. Another way could be to infer the IR on an example by example basis with a neural net. Those are non-trivial modifications of the code, but that's my best guess. We've had more luck with timbre interpolation on the URMP dataset, which has a similar (non-existent) reverb for all sources.

…

On Mon, Nov 23, 2020 at 5:46 AM Téo Sanchez ***@***.***> wrote: Hello, First, thank you for the awesome work. I'm trying to reproduce the timbre interpolation you present in the sound examples section. I have trained the auto-encoder (model/ae.gin) with flute and violin sounds (about 20 min each) for about 30,000 epochs. I can't get a clear timbre from either instrument when I decode with the z encoded for flute and violin separately. Do you have any examples or practical advice to improve the results? Have you conditioned z, trained with longer epochs or longer audio files, or used a specific architecture to handle multiple timbres by any chance? Here are the generated results <https://transfert.u-psud.fr/we0wlhz>. Thank you in advance for your thoughts. Best, Téo — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/magenta/ddsp/issues/276>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANFCCIPQPIULGK4CAAQ6CLSRJRUDANCNFSM4T7QJQEA> .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timbre interpolation #276

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Timbre interpolation #276

teo-sanchez Nov 23, 2020

Replies: 1 comment

jesseengel Nov 28, 2020 Maintainer

teo-sanchez
Nov 23, 2020

jesseengel
Nov 28, 2020
Maintainer