Project 2 Generative Audio

Abstract

The goal of this project is to produce music and written text in Python using Machine Learning. This hands-off approach will allow artists to experiment with new music or find inspiration for new songs. A person will write whatever lyrics they want and DeepVoice3 will translate it into speech using a model created from a woman's voice. In this case, hopeful quotes are used as the text. Performance RNN is then separately used to produce a music piece based on a combination of multiple, generated music phrases. These phrases are similar to a bass guitar audio clip on which the model was trained. Gansynth is used to interpolate a MIDI file of Frank Mill's Musicbox Dancer. Finally, the speech, generated music, and interpolated music are combined on the Davinci Resolve video editor. This project succeeds in producing comprehendible human speech and completely new music. However, it is not near the quality one would expect from an actual artist. A future direction would be to include training a RNN model for singing rather than just speaking.

Model/Data

DeepVoice3 is trained on the model 20180505_deepvoice3_ljspeech.json which is found online. The code automatically downloads it.

SGM-v2.01-Sal-Guit-Bass-V1.3.sf2

https://sites.google.com/site/soundfonts4u/
Performance RNN is run on this audio sample from Soundfonts4u which combines guitar and bass. The .sf2 file must be added to the /tmp/ directory to be accessed by the code.

DeepVoice3 converts the following hopeful quotes text to speech

"Good, better, best. Never let it rest. 'Til your good is better and your better is best."
"The most beautiful things in the world cannot be seen or even touched. They must be felt with the heart."
"The best preparation for tomorrow is doing your best."
"Every next level of your life will demand a different you."
"If your goals don't scare you. They aren't big enough."
"Don't listen to what they say."
"Be fearless in the pursuit of what sets your soul on fire."
"The greatest glory in living lies not in never falling, but in rising every time we fall."

Frank_Mills_-_ Musicbox_Dancer.mid

https://www.midiworld.com/search/?q=dance
Gansynth interpolates this music piece from Midiworld. The .mid file must be added to the /gansynth/midi/ directory to be accessed by the code.

Code

DeepVoice3

https://colab.research.google.com/drive/1JpWuvyPCZqGdsXuclHqKidvf2yx_NFtc
Training and generation code
Converts the following hopeful quotes text to a woman's speech

Performance RNN

https://colab.research.google.com/drive/1W6yGQP3bJ-IfvSpLgr9ELJ68jr6SBgES
Takes SGM-v2.01-Sal-Guit-Bass-V1.3.sf2 music as input to build the RNN
Generates similar sounding samples of music each 5 seconds long (length can be adjusted)

Gansynth

https://colab.research.google.com/drive/1W6yGQP3bJ-IfvSpLgr9ELJ68jr6SBgES
Takes Frank_Mills_-_ Musicbox_Dancer.mid as input and interpolates the music

Results

The resulting speech and music can be found in this repository. The text-to-speech generated by DeepVoice3 are the speech.wav files. The music generated by PerformanceRNN based on the guitar-bass audio are the music.mp3 files. The Musicbox Dancer music interpolated by Gansynth is the musicbox-gansynth.wav file. The 8 speech outputs, 8 music outputs, and 1 interpolated song are combined in the DaVinci Resolve video editor and uploaded to YouTube for viewing. The resulting speech and music are much below the quality expected from a composer or songwriter, but as something generated by a machine, it is quite impressive.

https://www.youtube.com/watch?v=48HugqVAv9o

Technical Notes

This implementation requires Google Colab which is an open source coding notebook. It only runs on Colab even though it is in Python Notebook format.

Reference

Online-Convert: MIDI to MP3 Converter
Audio-Joiner: MP3 Audio Joiner
Bear Audio: MP3 to MIDI Converter
Trim Midi File: Trim MIDI File

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Frank_Mills_-_Musicbox_Dancer.mid		Frank_Mills_-_Musicbox_Dancer.mid
README.md		README.md
music-merged-42s.wav		music-merged-42s.wav
music0.mp3		music0.mp3
music1.mp3		music1.mp3
music2.mp3		music2.mp3
music3.mp3		music3.mp3
music4.mp3		music4.mp3
music5.mp3		music5.mp3
music6.mp3		music6.mp3
music7.mp3		music7.mp3
musicbox-gansynth.wav		musicbox-gansynth.wav
speech0.wav		speech0.wav
speech1.wav		speech1.wav
speech2.wav		speech2.wav
speech3.wav		speech3.wav
speech4.wav		speech4.wav
speech5.wav		speech5.wav
speech6.wav		speech6.wav
speech7.wav		speech7.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 2 Generative Audio

Abstract

Model/Data

Code

Results

Technical Notes

Reference

About

Releases

Packages

Contributors 2

ucsd-ml-arts/generative-audio-joseph-chang

Folders and files

Latest commit

History

Repository files navigation

Project 2 Generative Audio

Abstract

Model/Data

Code

Results

Technical Notes

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages