GitHub - habla-liaa/encodecmae: Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

This is EnCodecMAE, an audio feature extractor pretrained with masked language modelling to predict discrete targets generated by EnCodec, a neural audio codec. For more details about the architecture and pretraining procedure, read the paper.

Updates:

2024/5/23 Updated paper in arxiv. New models with better performance across all downstream tasks are available for feature extraction. Code for older version is here
2024/2/29 New code to go from encodecmae to the waveform domain, with pretrained generative audio models from this paper.
2024/2/14 Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio was accepted to ICASSP 2024 XAI Workshop.
2023/10/23 Prompting for audio generation.

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

git clone https://github.com/habla-liaa/encodecmae.git

2) Install it:

cd encodecmae
pip install -e .

3) Extract embeddings in Python:

from encodecmae import load_model

model = load_model('mel256-ec-base_st', device='cuda:0')
features = model.extract_features_from_file('gsc/bed/00176480_nohash_0.wav')

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

First, docker-compose.yml has to be modified. In the volumes section, change the routes to the ones in your system. You'll need a folder called datasets with the following subfolders:

audioset_24k/unbalanced_train
fma_large_24k
librilight_med_24k

All the audio files need to be converted to a 24kHz sampling rate.

You might also modify the device_ids if you have a different number of gpus.

Then, run:

chmod +x start_docker.sh
./start_docker.sh

This will build the encodecmae image, start a container using docker compose, and attach to it.

3) Install the encodecmae package inside the container

cd workspace/encodecmae
pip install -e .

4) Run the training script

chmod +x scripts/run_pretraining.sh
scripts/run_pretraining.sh

The training script uses my own library for executing pipelines configured with gin: ginpipe. By modifying the config files (with .gin extension), you can control aspects of the training and the model configuration. I plan to explain my approach to ML pipelines, and how to use gin and ginpipe in a future blog article. Stay tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
encodecmae		encodecmae
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
start_docker.sh		start_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Updates:

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

2) Install it:

3) Extract embeddings in Python:

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

3) Install the encodecmae package inside the container

4) Run the training script

About

Releases 1

Packages

Languages

habla-liaa/encodecmae

Folders and files

Latest commit

History

Repository files navigation

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Updates:

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

2) Install it:

3) Extract embeddings in Python:

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

3) Install the encodecmae package inside the container

4) Run the training script

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages