This is the codebase for the project:
OpenMU: Your Swiss Army Knife for Music Understanding
A Dockerfile is provided and it should work out-of-the-box.
For OpenMU-Bench, our created benchmark for music understanding tasks, please download from here.
Please also download the checkpoints for inference.
OpenMU contains two training stages:
- Stage 1 training: OpenMU is trained to output captions conditioned on an input image;
- Stage 2 training: instruction following, where OpenMU follows instructions in the music domain.
To launch training, please checkout and use stage1.sh
and stage2.sh
respectively.
Please c.f. run_inference.sh
for running inference of the provided checkpoints.
We use lyrics understanding (model_lyrics_grid.py
) as an example in the scripts;
replace it with other scripts (e.g., model_musicqacaption.py
) for other splits of OpenMU-Bench (e.g., MusicQA captioning).
@article{zhao2024openmu,
title={OpenMU: Your Swiss Army Knife for Music Understanding},
author={Zhao, Mengjie and Zhong, Zhi and Mao, Zhuoyuan and Yang, Shiqi and Liao, Wei-Hsiang and Takahashi, Shusuke and Wakaki, Hiromi and Mitsufuji, Yuki},
journal={arXiv preprint arXiv:2410.15573},
year={2024}
}