This is a 2-week project I have undertaken as a final project in my bootcamp. Although the pipeline works as intented and it generates music, it needs quite a bit of rework, restructuring along with a better dataset to generate music as intended See: Future Updates
To read a detailed description please see section Detailed Presentation
To have a test run at its current stage please see How to Run
- Python 3.8 (see requirements.tx for libraries)
- TensorFlow
- Keras
Project mostly inspired by DeepBach by Sony and BachBot by Feynman Liang. Both of these projects used Bach chorales in 4 voices to train their network, which in turn would generate 4 voice chorales by itself.
We can use the same logic to train a network that learns Bach keyboard music (piano, harpsichord, and organ).
A great overview of several techniques used in music generation is described in this article Deep Learning Techniques for Music Generation -- A Survey by Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet. They also address the tecnique used by DeepBach and BachBot
We use music data stored in MusicXML files. The files are obtained from Kunst der Fuge and Tobi's Notenarchiv.
The music stored here is keyboard or organ music, and normally has 2-3 partitions with polyphonic sequences. These polyphonic sequences in 2-3 partitions should be converted to 4 monophonic voices if we are to follow the recipe set out by DeepBach and BachBot
We perform data augmentation by transposing all the music available to us to different keys In its current stage we ended up with 33 suitable scores and their transpositions to 12 keys
We chose music that is separable to a maximum of 5 voices with a time signature of 4/4
Image From: Deep Learning Techniques for Music Generation -- A Survey
We choose to encode the data as done by DeepBach
We use the MusicXML library to read the music data into Music21 stream object, split this music into 4 voices (with a resolution up to 1/32nd notes) and encode the data. One music data has 7 components. 4 monophonic voices, 1 musical key, and 2 for start and stop sequences
The encoded data is converted to numerical data, then categorized, and finally one hot encoded to feed into the network
This portion of the work is handled by MusicHandler() and NeuralNetIOHandler() classes in data_utils.py
The Neural Network consists of an input layer, 3 LSTM layers of size 256 (512 in the diagram), and a Dense layer corresponding to the output layer.
The data folder contains 3 example .xml files, only this data will be processed. Processed data will be stored as .pickle files in the data folder and the network will run on this data
Install requirements: pip install -r requirements.txt
Note: please install gpu version of tensorflow
Process the data: python generate_nn_data.py
Run the network for training: python run_onehot_model.py
-
Properly 4 voice encoded music acquired from www.kunstderfuge.com will be used for data preparation, manual handling and splitting music into voices causes a lot of issues, a major issue being an overinflated feature space.
-
Interpretation and implementations of rests will be revised. An overabundance of rests causes the network to learn to place rests everywhere. Note encoding could be done in the style of BachBot. This would cause a 2x increase in note feature space but reduce the emphasis on rests
-
A resolution down to 1/32nd notes also causes an abundance of rests
-
One hot encoding scheme could be replaced by a numerical encoding scheme
-
The issue of the output space is a complicated matter. I am not sure if the separated output space works as intended. The separated outputs could instead be reduced to a single multi-onehot-encoded vector. Or voice outputs could be a single multi-onehot-encoded and metadata parameters another one
-
Optimal parameters should be scanned. Particularly number of LSTM layers and LSTM layer sizes
-
Although the forward flow of the data is sufficiently put into class structures, reverse encoding the output from the model is another matter. Mostly functions in the other_utils.py should be put into proper classes
-
The music generating script that uses the trained model resides in jupyter notebook, it should be put into a proper script
-
Function names should better describe and align with the type of data they get as an input and data they output
-
Complete Type annotations would be very useful