Skip to content

ilaria-manco/multimodal-ml-music

Repository files navigation

Multimodal Machine Learning for Music (MML4Music) Awesome

This repo contains a curated list of academic papers, datasets and other resources on multimodal machine learning (MML) research applied to music. By Ilaria Manco (i.manco@qmul.ac.uk), Centre for Digital Music, QMUL.

This is not meant to be an exhaustive list, as MML for music is a varied and growing field, tackling a wide variety of tasks, from music information retrieval to generation, through many different methods. Since this research area is also not yet well established, conventions and definitions aren't set in stone and this list aims to provide a point of reference for its ongoing development.

Table of Contents

Papers

Survey Papers

Journal and Conference Papers

Summary of papers on multimodal machine learning for music, including the review papers highlighted above.

Audio-Text

Year Paper Title Code
2022 Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
2022 Conversational Music Retrieval with Synthetic Data
2022 Contrastive audio-language learning for music GitHub
2022 Learning music audio representations via weak language supervision GitHub
2022 Mulan: A joint embedding of music audio and natural language
2022 RECAP: Retrieval Augmented Music Captioner
2022 Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge GitHub
2022 Clap: Learning audio concepts from natural language supervision GitHub
2022 Toward Universal Text-to-Music Retrieval GitHub
2021 MusCaps: Generating Captions for Music Audio GitHub
2021 Music Playlist Title Generation: A Machine-Translation Approach GitHub
2020 MusicBERT - learning multi-modal representations for music and text
2020 Music autotagging as captioning
2019 Deep cross-modal correlation learning for audio and lyrics in music retrieval
2018 Music mood detection based on audio and lyrics with deep neural net
2016 Exploring customer reviews for music genre classification and evolutionary studies
2016 Towards Music Captioning: Generating Music Playlist Descriptions
2008 Multimodal Music Mood Classification using Audio and Lyrics

Audio-Image

Year Paper Title Code
2020 Tr$\backslash$" aumerai: Dreaming music with stylegan GitHub
2019 Learning Affective Correspondence between Music and Image
2018 The Sound of Pixels GitHub
2018 Image generation associated with music data

Audio-Video

Year Paper Title Code
2022 It's Time for Artistic Correspondence in Music and Video
2019 Audio-visual embedding for cross-modal music video retrieval through supervised deep CCA
2019 Query by Video: Cross-Modal Music Retrieval
2018 Cbvmr: content-based video-music retrieval using soft intra-modal structure constraint GitHub

Audio-User

Year Paper Title Code
2020 Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags GitHub
2017 A deep multimodal approach for cold-start music recommendation GitHub

Other

Year Paper Title Code
2021 Multimodal metric learning for tag-based music retrieval GitHub
2021 Enriched music representations with multiple cross-modal contrastive learning GitHub
2020 Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging
2020 Music gesture for visual sound separation
2020 Foley music: Learning to generate music from videos
2020 Musical word embedding: Bridging the gap between listening contexts and music
2019 Query-by-Blending: a Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and Artist
2019 Multimodal music information processing and retrieval: Survey and future challenges
2019 Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
2019 Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
2018 Multimodal Deep Learning for Music Genre Classification GitHub
2018 JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features GitHub
2017 Learning neural audio embeddings for grounding semantics in auditory perception
2017 Music emotion recognition via end-To-end multimodal neural networks
2013 Cross-modal Sound Mapping Using Deep Learning
2013 Music emotion recognition: From content- to context-based models
2011 Musiclef: A benchmark activity in multimodal music information retrieval
2011 The need for music information retrieval with user-centered and multimodal strategies
2009 Combining audio content and social context for semantic music discovery

Datasets

Dataset Description Modalities Size
MARD Multimodal album reviews dataset Text, Metadata, Audio descriptors 65,566 albums and 263,525 reviews
URMP Multi-instrument musical pieces of recorded performances MIDI, Audio, Video 44 pieces (12.5GB)
IMAC Affective correspondences between images and music Images, Audio 85,000 images and 3,812 songs
EmoMV Affective Music-Video Correspondence Audio, Video 5986 pairs

Workshops, Tutorials & Talks

Other Projects

  • Song Describer: a Platform for Collecting Textual Descriptions of Music Recordings - [link] | [paper] | [code]

Statistics & Visualisations

How To Contribute

Contributions are welcome! Please refer to the contributing.md file.

License

You are free to copy, modify, and distribute Multimodal Machine Learning for Music (MML4Music) with attribution under the terms of the MIT license. See the LICENSE file for details. This project is heavily based on Deep Learning for Music by Yann Bayle and uses other projects. You may refer to them for appropriate license information:

If you use the information contained in this repository, please let us know!