Authors: Conghao (Tom) Shen, Violet Yao, Yixin Liu
This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated music. This project allows anyone to compose their own music by tapping their keyboards or ``recoloring'' beat sequences from existing works.
To get started, clone this repository and install the required packages:
git clone https://github.com/tsunrise/everybody-compose.git
cd everybody-compose
pip install -r requirements.txt
You may encouter dependency issues during training on protobuf
. If so, try reinstall tensorboard
by running:
pip install --upgrade tensorboard
This issue is due to an conflicting requirements of note_seq
and tensorboard
.
We have also provided a Colab Notebook for your reference.
The preprocessed dataset will automatically be downloaded before training. To train a model, run the train.py
script with the -m
or --model_name
argument followed by a string specifying the name of the model to use. The available model names are:
lstm_attn
: LSTM with Local Attentionvanilla_rnn
: Decoder Only Vanilla RNNattention_rnn
: LSTM with Full Attentiontransformer
: Transformer RPR
You can also use the -nf
or --n_files
argument followed by an integer to specify the number of files to use for training (the default value of -1 means that all available files will be used).
To specify the number of epochs to train the model for, use the -n
or --n_epochs
argument followed by an integer. The default value is 100.
To specify the device to use for training, use the -d
or --device
argument followed by a string. The default value is cuda if a CUDA-enabled GPU is available, or cpu if not.
To specify the frequency at which to save snapshots of the trained model, use the -s
or --snapshots_freq
argument followed by an integer. This specifies the number of epochs between each saved snapshot. The default value is 200. The snapshots will be saved in the .project_data/snapshots
directory. The default value is 200.
To specify a checkpoint to load the model from, use the -c
or --checkpoint
argument followed by a string specifying the path to the checkpoint file. The default value is None, which means that no checkpoint will be loaded.
Here are some examples of how to use these arguments:
# Train the LSTM with Local Attention model using all available files, for 100 epochs, on the default device, saving snapshots every 200 epochs, and not using a checkpoint
python train.py -m lstm_attn
# Train the LSTM with Local Attention model using 10 files, for 1000 epochs, on the CPU, saving snapshots every 100 epochs, and starting from the checkpoint
python train.py -m lstm_attn -nf 10 -n 1000 -d cpu -s 100 -c ./.project_data/snapshots/my_checkpoint.pth
# Train the Transformer RPR model using all available files, for 500 epochs, on the default device, saving snapshots every 50 epochs, and not using a checkpoint
python train.py -m transformer -n 500 -s 50
To generate a predicted notes sequence and save it as a MIDI file, run the predict_stream.py
script with the -m
or --model_name
argument followed by a string specifying the name of the model to use. The available model names are:
lstm_attn
: LSTM with Local Attentionvanilla_rnn
: Decoder Only Vanilla RNNattention_rnn
: LSTM with Full Attentiontransformer
: Transformer RPR
Use the -c
or --checkpoint_path
argument followed by a string
specifying the path to the checkpoint file to use for the model.
The generated MIDI file will be saved using the filename specified by the -o
or --midi_filename
argument (the default value is output.mid
).
To specify the device to use for generating the predicted sequence, use the -d
or --device
argument followed by a string. The default value is cuda
if a CUDA-enabled GPU is available, or cpu
if not.
To specify the source of the input beats, use the -s
or --source
argument followed by a string. The default value is interactive
, which means that the user will be prompted to input the beats using the keyboard. Other possible values are:
- A file path, e.g.
beat_sequence.npy
, to load the recorded beats from a file. Recorded beats can be generated using thecreate_beats.py
script. dataset
to use a random sample from the dataset as the beats.
To specify the profile to use for generating the predicted sequence, use the -t
or --profile
argument followed by a string. The available values are beta
, which uses stochastic search, or beam
, which uses hybrid beam search. The heuristic parameters for these profiles can be customized in the config.toml file by adjusting the corresponding sections in [sampling.beta]
and [sampling.beam]
. The default value is default, which uses the settings specified in the config.toml
file.
Here are some examples of how to use these arguments:
# Generate a predicted sequence using the LSTM with Local Attention model, from beats by the user using the keyboard, using the checkpoint at ./.project_data/snapshots/my_checkpoint.pth, on the default device, and using the beta profile with default settings
python predict_stream.py -m lstm_attn -c ./.project_data/snapshots/my_checkpoint.pth -t beta