Finetuning model #1

aibba19 · 2022-08-29T14:11:36Z

aibba19
Aug 29, 2022

Hello, i am currently trying to finetune this pre-trained model using a db of nes song in 8 bit audio from various videogames, and i want to know if for you makes sense to do this using the already trained model with the same training loop used for training. I'm a beginner in this field so sorry if my question could sound simple, but i'd like to know if in this way is possible to modified the model so it can generate 8 bit like songs. :)
Thank you

Answered by asigalov61

Aug 29, 2022

@aibba19 Hello and thank you for your question.

From my experience, fine-tuning music models do not work well because music is a little bit different from text or images. From my experience, it is best to train from scratch on a genre-specific dataset.

However, you are welcome to experiment and see if it will work.

Thank you.

View full answer

asigalov61 · 2022-08-29T20:30:11Z

asigalov61
Aug 29, 2022
Maintainer

@aibba19 Hello and thank you for your question.

From my experience, fine-tuning music models do not work well because music is a little bit different from text or images. From my experience, it is best to train from scratch on a genre-specific dataset.

However, you are welcome to experiment and see if it will work.

Thank you.

1 reply

aibba19 Sep 1, 2022
Author

Thanks for the answer @asigalov61.

As you said, in fact, i'm already training to finetune your model with my dataset but the results aren't that good so i wiil give it a try to train from scratch.

Moreover, i was wondering if you can briefly explain to me the differencies between the implementation of your previous repository (Yoda) and this one (since i saw some code changes but i didn't quite understand them)

asigalov61 · 2022-09-01T13:15:11Z

asigalov61
Sep 1, 2022
Maintainer

@aibba19 You are welcome!

What I can also suggest for you to help improve results is to add a style token to each note's encoding. This should help the model generate music closer to the style you want. Since you are doing VG music, it should be relatively easy. You can separate compositions by game names (for example).

RE: Yoda vs Mini Muse...Yoda uses 2 tokens per MIDI note encoding which is the most efficient way I could think of. This helps greatly save on training time and reduces dataset size. However, this comes with trade-offs: large dictionary size and also reduced precision for times and durations.

Mini-Muse, on the other hand, uses 4 tokens per note encoding which allows better results in general, but it also has its drawbacks such as large dataset time, longer training time, and also relatively short generated output with 1024 seq_len.

I was basically trying out different types of encoding for multi-instrumental stuff to see which works best, so it really depends on your task/goals and which encoding to choose.

Both implementations showed good results on multi-instrumental music and each implementation has its advantages and disadvantages as I have stated above.

0 replies

asigalov61 · 2022-09-01T13:36:03Z

asigalov61
Sep 1, 2022
Maintainer

PS. The thing with fine-tuning music is that to get good results you need a huge model trained on all possible music, including what you fine-tuning upon. This is why it is more efficient and easier to train from scratch on a specific genre/dataset to get best results.

And of course style tokens help too but they also have certain drawbacks and disadvantages...

0 replies

aibba19 · 2022-09-14T13:42:25Z

aibba19
Sep 14, 2022
Author

@asigalov61 Thank you so much, I appreciate the clarity of your answer.

I do not understand how I could add the style token, for example you mean by adding a field to each note's melody chords representing the "style" of each note?
This approach does not pose the problem that the style in a song is referred to the whole set of notes and not to a single one?

Consider that I'm using the database at this link https://github.com/chrisdonahue/nesmdb, and as now I'm training from scratch your model (actually I'm using Yoda) feeding this data with all the model parameters halved with respect to your implementation because of my computational limitations.

My final idea is to prime this trained model with a piece from a song of a particular videogame genre previously classified by an LSTM classifier written by me and sees if the model can complete that, both in a creative way and keeping continuity with the genre.

To check the "creativity" I thought I’d compare the generated track with those used for training and check that we stay below a certain threshold of similarity (here I could open a discussion only on this topic but I leaving that for now), and for the genre continuity classify the generated track with the LSTM and see the result.

I explained all of this because I think that the idea to add a style token may help with my project but as of now I implementing this without and I would like to know what you think of the entire process.

Sorry for the length of the message and for my not-so-great English.

0 replies

asigalov61 · 2022-09-18T03:08:28Z

asigalov61
Sep 18, 2022
Maintainer

@aibba19 No worries... Your English is good enough :)

Your process seems fine...I think it's a good way to go... Do not be afraid to experiment because you never know what will work for your particular task/objective.

Yes, basically, what I was proposing for you is to prepend each MIDI note with a particular "style token". For example, all notes for Mario Bros and similar compositions would be prepended with 1, all notes for Contra and similar compositions would be prepended with 2, all notes for Zelda and similar compositions would be prepended with 3, etc...

This would allow you note-level generation control/conditioning based on the desired style/composition...

I recommended note-level styling because composition-level styling does not produce very good/coherent results because the model usually forgets very quickly what it is asked to play...so it is best to condition on note-level to get the best results.

Hope this makes sense...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning model #1

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Finetuning model #1

aibba19 Aug 29, 2022

Replies: 5 comments · 1 reply

asigalov61 Aug 29, 2022 Maintainer

aibba19 Sep 1, 2022 Author

asigalov61 Sep 1, 2022 Maintainer

asigalov61 Sep 1, 2022 Maintainer

aibba19 Sep 14, 2022 Author

asigalov61 Sep 18, 2022 Maintainer

aibba19
Aug 29, 2022

Replies: 5 comments 1 reply

asigalov61
Aug 29, 2022
Maintainer

aibba19 Sep 1, 2022
Author

asigalov61
Sep 1, 2022
Maintainer

asigalov61
Sep 1, 2022
Maintainer

aibba19
Sep 14, 2022
Author

asigalov61
Sep 18, 2022
Maintainer