Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Nov 10, 2024
1 parent 0a8f95a commit a0378a9
Show file tree
Hide file tree
Showing 55 changed files with 1,723 additions and 2,281 deletions.
Binary file not shown.
Binary file added _images/benefit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/cls_problem.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added _images/generation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/history.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/loss_function.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed _images/overview.png
Binary file not shown.
Binary file added _images/qbd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/retrieval_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions _sources/conclusion/beyondaudio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Beyond Audio Modality
18 changes: 8 additions & 10 deletions _sources/conclusion/intro.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
# Conclusion

Congratulations! You finished the book, executed every code we typed, and read every line we wrote!
Congratulations! You've completed the book, working through all the code examples and content we've prepared!

In the first chapter, The Basics, we defined music classification and introduced its applications. We then looked into input representations with a special focus on biological plausibility. We also looked into music classification datasets with a special focus on the secrets of how to use some popular datasets correctly. In the evaluation section, we showed the concepts of important metrics such as precision and recall as well as code demo to compute them. After finishing this chapter, we hope you’re ready to start working on your music classification model.
In Chapter 2, we provided a comprehensive overview of language models, examining their key components from tokenizers to training methodologies and conditioning methods. We also investigated the challenges that arise when using language modeling as a framework and explored how these challenges are currently being addressed in NLP and multimodal domains.

In the second chapter, Supervised Learning, we reviewed popular architectures - their definitions, pros, and cons. We also demonstrated data augmentation methods for music audio - the code, spectrograms, and audio signals you can play. At the end of the chapter, we showed a full example of data preparation, model training, and evaluation on Pytorch. After this chapter, you can implement a majority of music classification models that were introduced during the deep learning era.
In Chapter 3, we introduced Music Description as a novel MIR task. We discussed how the abstractness and specificity of music description, combined with the flexibility of language, create unique advantages for music and language models. This chapter traced the evolution of methodologies from classification models to encoder-decoder architectures and audio LLMs, demonstrating how the field has leveraged music description in increasingly sophisticated ways.

In the third chapter, Semi-Supervised Learning, we covered transfer learning and semi-supervised learning – approaches that became popular, recently, due to annotation cost. Both are strategies one can consider when there is only a small number of labeled items. These approaches can be useful in many real-world situations where you only have, for example, less than a thousand labeled items.
In Chapter 4, we focused on traditional Music Retrieval approaches and how audio-text joint embedding helps overcome their limitations. We explored the advantages and disadvantages of multimodal metric learning using triplet and contrastive losses, and examined how advances in text encoders have enhanced joint embedding capabilities. The chapter concluded by analyzing the current limitations of joint embedding models and exploring the possibilities of conversational music retrieval.

In the fourth chapter, Self-Supervised Learning, an even more radical approach. The goal of self-supervised learning is to learn useful representations without any labels. To achieve the goal, researchers assume some structural/internal patterns purely within input and design loss functions to predict the patterns. We covered a wide range of self-supervised learning methods introduced in music, speech, and computer vision. The lesson of this chapter liberates you from the worry of getting annotations.
In Chapter 5, we reviewed two prominent text-to-music generation methods: discrete token-based language models and diffusion-based generative models operating in continuous space. We also conducted an in-depth discussion about the importance of evaluation and current challenges in evaluation methodologies.

In the fifth chapter, Towards Real-world Applications, we introduce you to what people care about in industry. After finishing this chapter, you can understand the procedures and tasks researchers and engineers in industry spend time on.
We're delighted that you've studied these topics with us. Have you achieved your learning goals? Were your questions answered? We hope we've succeeded in our aims: making these complex topics more accessible to newcomers, providing practical solutions for data challenges, and bridging the gap between academic research and practical applications. Please don't hesitate to reach out if you have any questions or feedback.

We’re delighted that you have studied music classification with us. Did you achieve your goal while reading it? Are your questions solved now? We hope we also achieved our goals - lowering the barrier of music classification to the newcomers, providing methods to cope with data issues, and narrowing the gap between academia and industry. Please feel free to reach out to us if you have any questions or feedback.
As a sweet dessert, we've prepared two exciting future directions in the following pages. Don't miss these delightful treats!

Best wishes,

Minz, Janne, and Keunwoo.

SeungHeon, Ilaria, Zachary, JongWook, Ke
2 changes: 1 addition & 1 deletion _sources/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Welcome to the online supplement for the tutorial on "**Connecting Music Audio a
Language serves as an efficient interface for communication between humans as well as between humans and machines. With the recent advancements in deep learning-based pretrained language models the understanding, search, and creation of music are now capable of catering to user preferences with diversity and precision. This tutorial is motivated by the rapid advancements in machine learning techniques, particularly in the domain of language models, and their burgeoning applications in the field of Music Information Retrieval (MIR). The remarkable capability of language models to understand and generate human-like text has paved the way for innovative methodologies in music description, retrieval, and generation, heralding a new era in how we interact with music through technology.


```{figure} ./img/main.png
```{figure} ./img/front.png
---
name: overview
---
Expand Down
2 changes: 1 addition & 1 deletion _sources/introduction/background.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Background

```{figure} ../img/history.png
```{figure} ./img/qbd.png
---
name: history
---
Expand Down
2 changes: 1 addition & 1 deletion _sources/introduction/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This tutorial will present the changes in music understanding, retrieval, and generation technologies following the development of language models.

```{figure} ./img/overview.png
```{figure} ./img/flow.png
---
name: scope
---
Expand Down
4 changes: 2 additions & 2 deletions _sources/retrieval/intro.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

```{figure} ./img/cal_retrieval.png
```{figure} ./img/retrieval_example.png
---
name: cal_retrieval
---
Expand All @@ -23,7 +23,7 @@ name: cls_methods
Early retrieval methods were based on classification models. If music is annotated with relevant attributes through initial tagging tasks, during the retrieval stage, music can be searched either through filtering-based boolean search or by using the output logits from classification.


```{figure} ./img/cls_problems.png
```{figure} ./img/cls_problem.png
---
name: cls_problems
---
Expand Down
4 changes: 2 additions & 2 deletions _sources/retrieval/joint_embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Let $x_{a}$ represent a musical audio sample and $x_{t}$ denote its paired text

The most common metric learning loss functions used to train joint embedding models are triplet loss and contrastive loss.

```{figure} ./img/loss_functions.png
```{figure} ./img/loss_function.png
---
name: loss functions
---
Expand All @@ -42,7 +42,7 @@ where $\tau$ is a learnable parameter.

## What is the Benefit of Joint Embedding?

```{figure} ./img/joint_embedding_benefit.png
```{figure} ./img/benefit.png
---
name: joint embedding benefit
---
Expand Down
8 changes: 5 additions & 3 deletions bibliography.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
<script>DOCUMENTATION_OPTIONS.pagename = 'bibliography';</script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="prev" title="Conclusion" href="conclusion/intro.html" />
<link rel="prev" title="Beyond Text-Based Interactions" href="conclusion/beyondtext.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
</head>
Expand Down Expand Up @@ -220,6 +220,8 @@
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Chapter 6. Conclusion</span></p>
<ul class="nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="conclusion/intro.html">Conclusion</a></li>
<li class="toctree-l1"><a class="reference internal" href="conclusion/beyondaudio.html">Beyond Audio Modality</a></li>
<li class="toctree-l1"><a class="reference internal" href="conclusion/beyondtext.html">Beyond Text-Based Interactions</a></li>
</ul>
<p aria-level="2" class="caption" role="heading"><span class="caption-text">References</span></p>
<ul class="current nav bd-sidenav">
Expand Down Expand Up @@ -976,12 +978,12 @@ <h1>Bibliography<a class="headerlink" href="#bibliography" title="Link to this h

<div class="prev-next-area">
<a class="left-prev"
href="conclusion/intro.html"
href="conclusion/beyondtext.html"
title="previous page">
<i class="fa-solid fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Conclusion</p>
<p class="prev-next-title">Beyond Text-Based Interactions</p>
</div>
</a>
</div>
Expand Down
Loading

0 comments on commit a0378a9

Please sign in to comment.