The purpose of this repository is to offer an overview of the methods used during our project and the possibility to reproduce our experiments and results presented in the report.
Some scripts may assume the following file-structure (you might have to create missing directories):
-
Datasets/ : Directory containing all training-, test- and preprocessed data (and original data)
-
Datasets/fma_medium/ : Directory containing all the original data from the FMA dataset
-
Datasets/fma_metadata/ : Directory containing all the metadata of the FMA dataset
-
Datasets/preprocess_mfcc/ : Directory containing 3 subfolders with 30s, 10s and 3s cuts after pre-processing and folder preparation
-
Models/ : Directory containing 3 subdirectories with all the models you train (we add some pre-trained models from our experiments FYI)
-
Models/30sec/ : Directory containing all the models you train with 30sec inputs
-
Models/10sec/ : Directory containing all the models you train with 10sec inputs
-
Models/3sec/ : Directory containing all the models you train with 3sec inputs
-
Figures/ : Directory containing the confusion matrix and the training history (loss and accuracy) in .png format evaluation
-
Results/ : Directory containing .txt files with the accuracy of each previously evaluated model
-
ManualFeatures/ : Directory containing a small subproject for the dimensionality reduction within the Librosa features.
- preprocessing_melspect.py : Script running the preprocessing pipeline
- training.py : Script allowing to train any models presented in the paper
- evaluate.py : Script allowing to evaluate any models and save .png and .txt of the results in the corresponding directory
Datasets/, Models/, Figures/, and Results/ are empty directories at the beginning. But in order to avoid any issues during the reproduction of our experiments we added some figures and results already obtained to ensure the file structure of the project.
[All the (preprocessed)-datasets used for our experiments were too large to be added on Polybox. Therefore in order to run the experiment you will first have to download the original datasets and run the preprocessing script]
Download the FMA dataset, and the metadata:
- fma_medium.zip: 25,000 tracks of 30s, 16 unbalanced genres (22GiB)
- fma_metadata.zip
Move them to the Datasets/ directory and to the ManualFeatures/data/ directory:
unzip fma_medium.zip
mv fma_medium/* Datasets
unzip fma_metadata.zip
cp -r fma_metadata/ ManualFeatures/data/
mv fma_metadata/* Datasets
The Datasets/fma_metadata/ directory should contain the following files:
- tracks.csv: per track metadata such as ID, title, artist, genres, tags and play counts, for all 106,574 tracks.
- genres.csv: all 163 genres with name and parent (used to infer the genre hierarchy and top-level genres).
- features.csv: common features extracted with librosa.
- echonest.csv: audio features provided by Echonest (now Spotify) for a subset of 13,129 tracks.
the Datasets/fma_medium/ directory should contain the following files
- 156 folders: each cointaining tracks in .mp3 format
Guidelines for running our experiments are presented here. We assume that the git-directory has been cloned, that the correct file structure has been set up (i.e. adding missing directories according to the description above) and that the datasets have been downloaded and put in the Datasets/ directory.
Create and start virtual environment:
python3 -m venv venv
source activate venv
Install dependencies (make sure to be in venv):
pip install -r requirements.txt
Before running the preprocessing, ensure that Datasets/ contains de following directory:
- fma_medium/
- fma_metadata/
- preprocess_mfcc/
You may need to create the last folder yourself with the following command line:
cd Dadasets/
mkdir preprocess_mfcc
cd ..
You are now ready to run the preprocessing script that will build 30sec, 10sec, and 3sec datasets with the correct architecture needed for the rest of the project. To do so, run the following command line.
python3 preprocessing_melspect.py
Note that we set up preprocessing_melspect.py to reproduce the exact experiments we performed during the project. However, if you want to try different cuts, modify the hyperparameters used for the mel-spectrograms generation, and more; you can modify the global variables at the top of the preprocessing file. Disclaimer: we ensured that the code is reliable for 10s and 3s cuts, using other cut lengths might lead to some kind of error in the process.
Before training the models, you need to ensure that you have the Models/30sec/ , Models/10sec/ , Models/3sec directories in your strucutre. You can run the following command line to create if needed:
mkdir Models
cd Models/
mkdir 30sec
mkdir 10sec
mkdir 3seec
cd ..
Using training.py you can train any models we used during our experiments. To make the process easier we build the script such that it takes different arguments allowing you to train different model architectures :
- "-30sec" or "-10sec" or "-3sec" : chose if you want to train the model with 30, 10 or 3 seconds samples (mandatory)
- "-4c" or "-3c" or "-3c" : chose the number of convolution block in the model (mandatory)
- "-l1" or "-l2" : chose the regularization loss you want to use, l1 loss or l2 loss (optional)
- "-lrs" : if you want to use the learning rate scheduler (optional)
- "-gru2" : if you want add a second consecutive GRU layer (optional)
- "-ep20" : if you want to run only 20 epochs instead of 50 epochs (optional)
When the training is done, the model as well as the history of the training process are stored in the Models/ directory.
Example to train a model for 30sec samples with 4 convolution blocks, l2 loss, a learning rate scheduler, 2 consecutive gru and only 20 epochs:
python3 training.py -30sec -4c -l2 -lrs -gru2 -ep20
If you want to run the model from the CRNN for Music Classification paper by Keunwoo Choi & al. you need to use the specific argument:
- "-papermodel"
This argument can be combined ONLY with the size of the sample you want to use.
Example: train the model from the paper with 30sec samples:
python3 training.py -30sec -papermodel
Again, note that we ensure that the script allows you to reproduce our exact experiments. If you want to try other cut lengths or architecture you might need to modify the script according to your needs. Also, you might want to use different batch sizes or number of epochs (we forced 32 and 50/20 as we obtained the best results with this configuration). To do so you can modify the global variables at the beginning of the script.
Once a model has been trained, you can now evaluate it. We offer a script that:
- Evaluate a given model and save the results in a .txt
- Allow you to use our voting system (Divide and Conquer) on models trained with 10s and 3s samples
- Save the confusion matrix and the training history (loss and accuracy) in .png format
The script takes as inputs the following arguments:
- "model_name" : the name of the model you want to evaluate aka. the name of the model's directory saved after the training step (mandatory)
- "-30sec", "-10sec" or "-3sec" : the sample's size used to train the model you want to evaluate (mandatory)
- "-voting" : if you want to apply the voting methods (divide and conquer) --> only possible for models trained with 10s or 3s samples (optional)
Example: evaluate the model trained with 4 convolution blocks on 20 epochs with 10sec samples, and using the voting (divide and conquer) method:
python3 evaluate.py "4conv_20epochs" -10 -voting
Warning: as we plot the different figures in time, you might have to close them to continue the process
For our setup in the feature-based approach, we used the PyTorchLightning-Hydra Template. The entire structure can be found within the ManualFeatures/ directory and run directly from there. For exact instructions on how to run the project and adjust configs, we refer to the Template documentation (link above).
You can find the experiment runs under ManualFeatures/runs/, where:
- wandb_export_wd_search.csv contains the runs we used to select appropriate hyperparameters
- wandb_export_ae.csv contains our experiment runs with our auto-encoder architecture
If you want to know more about our preprocessing methods, model architectures, results and more; please refer to our report. You can also have a look at our code (available in this repository) for a better understanding of the different processes that have been executed.
@authors Auguste, Marc and Lukas