AI-generated audio samples for producers. Created by Ricky Ma.
DCGAN (Radford et al., 2016) was created for learning to generate images. Donahue et al (2018) took this idea and applied it to audio to create WaveGAN. It is able to synthesize 1-4 second clips of birds, speech, piano, and drum sounds. I wondered if the same algorithm can be used for music, where a potential application includes the generation of creative audio samples for human producers to use. Initially trained on single artists/genres, ProducerGAN was able to generate audio of similar qualities. This can be extrapolated to include samples from multiple artists and/or genres to create music previously unheard of. Although its current output is limited, with sufficient time and computing power, this could be extended to produce full songs learned on many various artists and musical genres.
Songs were pulled randomly from YouTube, converted to .wav files, and segmented into 4-second clips. Each audio clip is then decoded and represented as a vector x, normalized, and transformed into uniform vectors Z. Vectors Z are used to create the generator G using the WaveGAN algorithm. Real discriminators D(x) are created from X and fake discriminators D(G(z)) from G(z).
Discriminator D and generator G are optimized with the Adam Optimizer provided by Tensorflow. For each iteration: D is optimized five times, then G is optimized once. Weights for the generator are saved locally as the GAN trains. Training does not end automatically: the number of iterations is controlled by the user.
Random vectors Z are created and fed into a trained generator G. Vector G(z) is converted back into a waveform for playback.