Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Nov 10, 2024
1 parent a0378a9 commit 80262ae
Show file tree
Hide file tree
Showing 17 changed files with 97 additions and 247 deletions.
Binary file removed _images/history.png
Binary file not shown.
Binary file added _images/prompt_product.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/representation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _sources/introduction/advantange.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"Although it's not within the scope of this tutorial (which focuses on annotation, retrieval, and generation tasks), music-language models can become powerful representation learners by using high-level semantic information as supervision. Models trained on noisy but scalable music-text pairs can perform well on downstream tasks. For example, in the vision domain, models like CLIP (Contrastive Language-Image Pre-training) {cite}`radford2021learning` and CoCa (Contrastive Captioners are Image-Text Foundation Models) {cite}`yu2022coca` actually report excellent performance on multiple downstream tasks. In the music domain, the MuLaP {cite}`manco2022learning`, TTMR {cite}`doh2023toward`, and MuLan {cite}`huang2022mulan` papers demonstrate that Music-Language models can be powerful representation learners.\n",
"\n",
"\n",
"```{figure} ../img/representation.png\n",
"```{figure} ./img/representation.png\n",
"---\n",
"name: representation\n",
"---\n",
Expand All @@ -118,7 +118,7 @@
"Language serves as an effective interface for AI models, (i.e., ChatGPT and Stable Diffusion). Because it leverages natural, intuitive communication methods. Language allows users to express complex queries, requests, or ideas in a flexible and contextually rich way without needing specialized knowledge. In terms of responses, language can also enable the system to generate human-like intentions or answers, which can positively impact user satisfaction and usability.\n",
"\n",
"\n",
"```{figure} ../img/prompt_product.png\n",
"```{figure} ./img/prompt_product.png\n",
"---\n",
"name: prompt_product\n",
"---\n",
Expand Down
2 changes: 1 addition & 1 deletion description/code.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion description/datasets.html
Original file line number Diff line number Diff line change
Expand Up @@ -604,7 +604,7 @@ <h3>The Song Describer Dataset (SDD)<a class="headerlink" href="#the-song-descri
from .autonotebook import tqdm as notebook_tqdm
</pre></div>
</div>
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Error while downloading from https://cdn-lfs-us-1.hf.co/repos/39/dc/39dcd02e33711e100ff7e22339b0bbb4c7e99bccadaaadbe5c1acae576294a48/64322527ba09a4ddf04c46dea2b275f1e68db5a3bf48ed31b5f90ae51d9b13d2?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27train-00002-of-00008.parquet%3B+filename%3D%22train-00002-of-00008.parquet%22%3B&amp;Expires=1731512960&amp;Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczMTUxMjk2MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzM5L2RjLzM5ZGNkMDJlMzM3MTFlMTAwZmY3ZTIyMzM5YjBiYmI0YzdlOTliY2NhZGFhYWRiZTVjMWFjYWU1NzYyOTRhNDgvNjQzMjI1MjdiYTA5YTRkZGYwNGM0NmRlYTJiMjc1ZjFlNjhkYjVhM2JmNDhlZDMxYjVmOTBhZTUxZDliMTNkMj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&amp;Signature=cQha8Q2Z0hRs6I%7EL418JFwgZGCEGPg9iD-jinARQZONdwWJUgnShzHOwRwYg6ePVLPZqx66IUI0xld8gewT3VgK4iqJHzRaJbRCkNojVHh2JFnB4dyArU-c%7EhS-fpbdGSp3DW1Pg8TCALnMHtGatYKKIwkb15rs1WyXowZNH2Ee%7EsTZ-o09FDitg5nJEc4V7OgHNgZYW7-2UtwTHH%7EjdA10PCsAzh3pXDUEta4vG3MSB1sLckzw5kIg4U2rC1iRl5sajJo6lkHZavcctzvQqHTaE5q0URlwO09175azmDU4su8MSNru5FY1D1rLV1C5skwkSN74RfIJZGKeOrwnBYQ__&amp;Key-Pair-Id=K24J24Z295AEI9: HTTPSConnectionPool(host=&#39;cdn-lfs-us-1.hf.co&#39;, port=443): Read timed out.
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Error while downloading from https://cdn-lfs-us-1.hf.co/repos/39/dc/39dcd02e33711e100ff7e22339b0bbb4c7e99bccadaaadbe5c1acae576294a48/64322527ba09a4ddf04c46dea2b275f1e68db5a3bf48ed31b5f90ae51d9b13d2?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27train-00002-of-00008.parquet%3B+filename%3D%22train-00002-of-00008.parquet%22%3B&amp;Expires=1731514795&amp;Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczMTUxNDc5NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzM5L2RjLzM5ZGNkMDJlMzM3MTFlMTAwZmY3ZTIyMzM5YjBiYmI0YzdlOTliY2NhZGFhYWRiZTVjMWFjYWU1NzYyOTRhNDgvNjQzMjI1MjdiYTA5YTRkZGYwNGM0NmRlYTJiMjc1ZjFlNjhkYjVhM2JmNDhlZDMxYjVmOTBhZTUxZDliMTNkMj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&amp;Signature=CYoJnanVNYZ9XVvg-BySEgXUKFvwvYk6hI%7ERWoCdeV4vqdpJr4MLi3iG%7EkBLER5ekGYNp8ZsUvlf6QusKUYw21Aze-188Pbp5RU4DgGJtt8hJ%7Es6DHE4H8Fm7JSUPFws86nfWygUWD5LRWQJSs2x8Xu8bFBi%7EjMikwB0y8yrmkUuXThQDZhXaoEHe6vaSMMn5Z7Yy5MEnpBKyaW1hqh3NUEoKsZkb8E75Cgi5p62016wme3JQSpE9b3GIc4jQDPOdqFTLmQZw1nMW7jT%7EKISG8KcRuipyjYFCNtfRU0LSR3OcBbyJh8bzoIDxEx7FhxVmuF83thmaFxrLEGIzJxVRw__&amp;Key-Pair-Id=K24J24Z295AEI9: HTTPSConnectionPool(host=&#39;cdn-lfs-us-1.hf.co&#39;, port=443): Read timed out.
Trying to resume download...
</pre></div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion description/models.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="../_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="../_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
2 changes: 1 addition & 1 deletion description/tasks.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="../_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="../_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
312 changes: 81 additions & 231 deletions generation/code.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
<link rel="stylesheet" type="text/css" href="_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
2 changes: 1 addition & 1 deletion intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
6 changes: 3 additions & 3 deletions introduction/advantange.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="../_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="../_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down Expand Up @@ -498,7 +498,7 @@ <h2>1. Natural Langauge is (almost) universal label (y), task (z) encoder.<a cla
<h2>2. Natural Langauge is (weak but scalable) supervision for representation learning<a class="headerlink" href="#natural-langauge-is-weak-but-scalable-supervision-for-representation-learning" title="Link to this heading">#</a></h2>
<p>Although it’s not within the scope of this tutorial (which focuses on annotation, retrieval, and generation tasks), music-language models can become powerful representation learners by using high-level semantic information as supervision. Models trained on noisy but scalable music-text pairs can perform well on downstream tasks. For example, in the vision domain, models like CLIP (Contrastive Language-Image Pre-training) <span id="id1">[<a class="reference internal" href="../retrieval/joint_embedding.html#id28" title="Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, and others. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR, 2021.">RKH+21</a>]</span> and CoCa (Contrastive Captioners are Image-Text Foundation Models) <span id="id2">[<a class="reference internal" href="../bibliography.html#id23" title="Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. Coca: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.">YWV+22</a>]</span> actually report excellent performance on multiple downstream tasks. In the music domain, the MuLaP <span id="id3">[<a class="reference internal" href="../bibliography.html#id8" title="Ilaria Manco, Emmanouil Benetos, Elio Quinton, and György Fazekas. Learning music audio representations via weak language supervision. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 456–460. IEEE, 2022.">MBQF22b</a>]</span>, TTMR <span id="id4">[<a class="reference internal" href="../retrieval/models.html#id31" title="SeungHeon Doh, Minz Won, Keunwoo Choi, and Juhan Nam. Toward universal text-to-music retrieval. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE, 2023.">DWCN23</a>]</span>, and MuLan <span id="id5">[<a class="reference internal" href="../retrieval/models.html#id36" title="Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, and Daniel PW Ellis. Mulan: a joint embedding of music audio and natural language. arXiv preprint arXiv:2208.12415, 2022.">HJL+22</a>]</span> papers demonstrate that Music-Language models can be powerful representation learners.</p>
<figure class="align-default" id="representation">
<img alt="img/representation.png" src="img/representation.png" />
<img alt="../_images/representation.png" src="../_images/representation.png" />
</figure>
<p>Additionally, many people try “dancing about architecture”. In reality, a lot of music metadata, review, social tags, high-level to low-level attribute descriptions, lyrics, etc. remain as natural language data. We will cover more details about this in the dataset section. Additionally, we can use labels inferred from other pretrained MIR models as pseudo labels <span id="id6">[<a class="reference internal" href="../description/tasks.html#id36" title="Josh Gardner, Simon Durand, Daniel Stoller, and Rachel M Bittner. Llark: a multimodal foundation model for music. arXiv preprint arXiv:2310.07160, 2023.">GDSB23</a>]</span>. This can be seen as a weaker but still scalable form of supervision compared to self-supervision.</p>
<div class="admonition note">
Expand All @@ -510,7 +510,7 @@ <h2>2. Natural Langauge is (weak but scalable) supervision for representation le
<h2>3. Natural Langauge is Human Friendly interface.<a class="headerlink" href="#natural-langauge-is-human-friendly-interface" title="Link to this heading">#</a></h2>
<p>Language serves as an effective interface for AI models, (i.e., ChatGPT and Stable Diffusion). Because it leverages natural, intuitive communication methods. Language allows users to express complex queries, requests, or ideas in a flexible and contextually rich way without needing specialized knowledge. In terms of responses, language can also enable the system to generate human-like intentions or answers, which can positively impact user satisfaction and usability.</p>
<figure class="align-default" id="prompt-product">
<img alt="img/prompt_product.png" src="img/prompt_product.png" />
<img alt="../_images/prompt_product.png" src="../_images/prompt_product.png" />
</figure>
</section>
</section>
Expand Down
2 changes: 1 addition & 1 deletion introduction/background.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="../_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="../_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
2 changes: 1 addition & 1 deletion introduction/overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<link rel="stylesheet" type="text/css" href="../_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="../_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="../_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
2 changes: 1 addition & 1 deletion reports/generation/code.err.log
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ File ~/anaconda3/envs/p310/lib/python3.10/site-packages/huggingface_hub/u
 431 + "Access to this resource is disabled."
 432 )

GatedRepoError: 401 Client Error. (Request ID: Root=1-6730d649-76a2420f0c65d5c147671bde;f63dbb9a-d18a-4530-9c2f-bd26a6e8aff8)
GatedRepoError: 401 Client Error. (Request ID: Root=1-6730ddae-71fa96f30578290a3bb66eb3;b5ca2979-b888-4c99-9e08-9b0ecf35cc0f)

Cannot access gated repo for url https://huggingface.co/stabilityai/stable-audio-open-1.0/resolve/main/model_config.json.
Access to model stabilityai/stable-audio-open-1.0 is restricted. You must have access to it and be authenticated to access it. Please log in.
Expand Down
2 changes: 1 addition & 1 deletion search.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
<link rel="stylesheet" type="text/css" href="_static/styles/sphinx-book-theme.css?v=a3416100" />
<link rel="stylesheet" type="text/css" href="_static/togglebutton.css?v=13237357" />
<link rel="stylesheet" type="text/css" href="_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css?v=be8a1c11" />
<link rel="stylesheet" type="text/css" href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-thebe.css?v=4fa983c6" />
<link rel="stylesheet" type="text/css" href="_static/sphinx-design.min.css?v=95c83b7e" />

Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 80262ae

Please sign in to comment.