This repo contains a curated list of academic papers, datasets and other resources on multimodal machine learning (MML) research applied to music. By Ilaria Manco (i.manco@qmul.ac.uk), Centre for Digital Music, QMUL.
This is not meant to be an exhaustive list, as MML for music is a varied and growing field, tackling a wide variety of tasks, from music information retrieval to generation, through many different methods. Since this research area is also not yet well established, conventions and definitions aren't set in stone and this list aims to provide a point of reference for its ongoing development.
- Academic Papers
- Datasets
- Workshops, Tutorials & Talks
- Other Projects
- Statistics & Visualisations
- How to Contribute
- Other Resources
- Multimodal music information processing and retrieval: Survey and future challenges (F. Simonetta et al., 2019)
- Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies (M. Muller et al., 2019)
Summary of papers on multimodal machine learning for music, including the review papers highlighted above.
Year | Paper Title | Code |
---|---|---|
2020 | Tr$\backslash$" aumerai: Dreaming music with stylegan | GitHub |
2019 | Learning Affective Correspondence between Music and Image | |
2018 | The Sound of Pixels | GitHub |
2018 | Image generation associated with music data |
Year | Paper Title | Code |
---|---|---|
2020 | Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | GitHub |
2017 | A deep multimodal approach for cold-start music recommendation | GitHub |
Dataset | Description | Modalities | Size |
---|---|---|---|
MARD | Multimodal album reviews dataset | Text, Metadata, Audio descriptors | 65,566 albums and 263,525 reviews |
URMP | Multi-instrument musical pieces of recorded performances | MIDI, Audio, Video | 44 pieces (12.5GB) |
IMAC | Affective correspondences between images and music | Images, Audio | 85,000 images and 3,812 songs |
EmoMV | Affective Music-Video Correspondence | Audio, Video | 5986 pairs |
- Song Describer: a Platform for Collecting Textual Descriptions of Music Recordings - [link] | [paper] | [code]
- 47 papers referenced. See the details in multimodal_ml_music.bib. Number of articles per year:
- If you are applying multimodal ML to music, there are 150 other researchers in your field.
- 13 tasks investigated. See the list of tasks. Tasks pie chart:
- Only 16 articles (34%) provide their source code. by Yann Bayle has a very useful list of resources on reproducibility for MIR and ML.
Contributions are welcome! Please refer to the contributing.md file.
You are free to copy, modify, and distribute Multimodal Machine Learning for Music (MML4Music) with attribution under the terms of the MIT license. See the LICENSE file for details. This project is heavily based on Deep Learning for Music by Yann Bayle and uses other projects. You may refer to them for appropriate license information:
If you use the information contained in this repository, please let us know!