This is EnCodecMAE, an audio feature extractor pretrained with masked language modelling to predict discrete targets generated by EnCodec, a neural audio codec. For more details about the architecture and pretraining procedure, read the paper.
- 2024/5/23 Updated paper in arxiv. New models with better performance across all downstream tasks are available for feature extraction. Code for older version is here
- 2024/2/29 New code to go from encodecmae to the waveform domain, with pretrained generative audio models from this paper.
- 2024/2/14 Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio was accepted to ICASSP 2024 XAI Workshop.
- 2023/10/23 Prompting for audio generation.
Try our example Colab notebook or
1) Clone the EnCodecMAE library:
git clone https://github.com/habla-liaa/encodecmae.git
cd encodecmae
pip install -e .
from encodecmae import load_model
model = load_model('mel256-ec-base_st', device='cuda:0')
features = model.extract_features_from_file('gsc/bed/00176480_nohash_0.wav')
1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.
First, docker-compose.yml has to be modified. In the volumes section, change the routes to the ones in your system. You'll need a folder called datasets with the following subfolders:
- audioset_24k/unbalanced_train
- fma_large_24k
- librilight_med_24k
All the audio files need to be converted to a 24kHz sampling rate.
You might also modify the device_ids if you have a different number of gpus.
Then, run:
chmod +x start_docker.sh
./start_docker.sh
This will build the encodecmae image, start a container using docker compose, and attach to it.
cd workspace/encodecmae
pip install -e .
chmod +x scripts/run_pretraining.sh
scripts/run_pretraining.sh
The training script uses my own library for executing pipelines configured with gin: ginpipe. By modifying the config files (with .gin extension), you can control aspects of the training and the model configuration. I plan to explain my approach to ML pipelines, and how to use gin and ginpipe in a future blog article. Stay tuned!