This code is published as a part of my master's thesis for the program Sound and Music Computing (SMC) at UPF Barcelona. The manuscript is also made available for more information here.
- MedleyDB (can be downloaded from here : http://medleydb.weebly.com/downloads.html)
For the deep learning part:
- GPU
- keras 1.2.2
- librosa
- numpy
- pandas
- sklearn
Organization of the respository:
The source code is organized as follows -
The ./settings.py contains all the paths to different folders relevant to the experiments that need to be set. The data_prep folder contains the code required to preprocess MedleyDB dataset. The models folder contains folders for deep-learning and traditional machine learning code.
Sequence of execution along with brief decription for each of the files:
The source code for data preprocessing borrows heavily from Li et al. 2015. Please refer to their paper for more details regarding data preprocessing including steps to create the train/test split.
- ./data_prep/data_prep.py This file contains all scripts necessary for preparing data. Usage :
python data_prep.py -c window_configuration
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,
python data_prep.py -c _5s_h50
- ./data_prep/gen_split.py
This file splits the group of .mat files generated by running data_prep.py into 5 sets, each containing 20% of the samples. 4 of these sets are used a training set and the remaining one is used as test set. In this work, we have not done cross-validation, but since we have 5 such splits, it could be done.
Usage :
python gen_split.py -c window_configuration
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,
python gen_split.py -c _5s_h50
- ./data_prep/wav_generator.py
The wavfiles for each of the samples which were split using gen_split.py are generated using this code. Usage :
python wav_generator.py -c window_configuration
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,
python wav_generator.py -c _5s_h50
4)./data_prep/audio_transformer.py
This code splits an audio file into harmonic component and residual component using librosa and stores them separately.
Usage :
python audio_transformer.py -c window_configuration
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,
python audio_transformer.py -c _5s_h50
1)./models/traditional/feature_extractor.py
This code makes use of Essentia's music extractor to extract temporal, spectral and cepstral features from the wav files and persist them.
Usage :
python feature_extractor.py -c window_configuration -t dataset_type
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,
python feature_extractor.py -c _5s_h50 -t original
2)./models/traditional/train_accumulator.py
This code aggregates datasets {0,1,2,3} and their respective labels into training set.
Usage :
python train_accumulator.py -c window_configuration -t dataset_type
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,
python train_accumulator.py -c _5s_h50 -t original
3)./models/traditional/test_accumulator.py
This code aggregates dataset_4 and their respective labels into test set.
Usage :
python test_accumulator.py -c window_configuration -t dataset_type
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,
python test_accumulator.py -c _5s_h50 -t original
4)./models/traditional/regressor.py
This code fits a regressor on training dataset and then predicts the instrument annotations for the test set.
Usage :
python regressor.py -c window_configuration -t dataset_type
The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,
python regressor.py -c _5s_h50 -t original
Refer to ./models/deep-learning/README.me