Skip to content

Latest commit

 

History

History
24 lines (12 loc) · 2.13 KB

README.md

File metadata and controls

24 lines (12 loc) · 2.13 KB

Generating new sounds by vector arithmetic in the latent space of the MelGAN architecture

This script implements the training of a MelGAN for generating new sounds that are then combined in a convex manner for new sonority. The idea is presented in paper [1], while the training is based on the MelGAN architecture [2] whose source code is available at: https://github.com/descriptinc/melgan-neurips/blob/master/scripts/train.py

  1. M. Scarpiniti, E. Massaro, D. Comminiello and A. Uncini, “Generating new sounds by vector arithmetic in the latent space of the MelGAN architecture”, in Applications of Artificial Intelligence and Neural Systems to Data Science (A. Esposito, M. Faundez-Zanuy, F. C. Morabito and E. Pasero, Eds.), ISBN: , pp. , Springer 2023.
  2. Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, and Aaron Courville, "MelGAN: generative adversarial networks for conditional waveform synthesis", in Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19), Red Hook, NY, USA, pp. 14910–14921, 2019.

The folder "Examples" contains some sounds generated by the proposed approach.

Specifically, we perform the linear combination of three test files and use this combination, after the mel spectrogram extraction, as input to the MelGAN.

The first case is regarding the linear combination: "Mallet 2 - String 1 + Keyboard 1" (Example1.wav). The resulting spectrogram is shown below.

Alt text

For the second combination, we use the linear combination: "Keyboard 1 + 0.4 x Organ 2 - 0.4 x String 4" (Example2.wav). The resulting spectrogram is shown below.

Alt text

Finally, the third linear combination is: "0.4 x Organ 3 - 0.4 x String 1 + 0.9 x Organ 1" (Example3.wav). The resulting spectrogram is shown below.

Alt text