Finetuning model #1
-
Hello, i am currently trying to finetune this pre-trained model using a db of nes song in 8 bit audio from various videogames, and i want to know if for you makes sense to do this using the already trained model with the same training loop used for training. I'm a beginner in this field so sorry if my question could sound simple, but i'd like to know if in this way is possible to modified the model so it can generate 8 bit like songs. :) |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
@aibba19 Hello and thank you for your question. From my experience, fine-tuning music models do not work well because music is a little bit different from text or images. From my experience, it is best to train from scratch on a genre-specific dataset. However, you are welcome to experiment and see if it will work. Thank you. |
Beta Was this translation helpful? Give feedback.
-
@aibba19 You are welcome! What I can also suggest for you to help improve results is to add a style token to each note's encoding. This should help the model generate music closer to the style you want. Since you are doing VG music, it should be relatively easy. You can separate compositions by game names (for example). RE: Yoda vs Mini Muse...Yoda uses 2 tokens per MIDI note encoding which is the most efficient way I could think of. This helps greatly save on training time and reduces dataset size. However, this comes with trade-offs: large dictionary size and also reduced precision for times and durations. Mini-Muse, on the other hand, uses 4 tokens per note encoding which allows better results in general, but it also has its drawbacks such as large dataset time, longer training time, and also relatively short generated output with 1024 seq_len. I was basically trying out different types of encoding for multi-instrumental stuff to see which works best, so it really depends on your task/goals and which encoding to choose. Both implementations showed good results on multi-instrumental music and each implementation has its advantages and disadvantages as I have stated above. |
Beta Was this translation helpful? Give feedback.
-
PS. The thing with fine-tuning music is that to get good results you need a huge model trained on all possible music, including what you fine-tuning upon. This is why it is more efficient and easier to train from scratch on a specific genre/dataset to get best results. And of course style tokens help too but they also have certain drawbacks and disadvantages... |
Beta Was this translation helpful? Give feedback.
-
@asigalov61 Thank you so much, I appreciate the clarity of your answer. I do not understand how I could add the style token, for example you mean by adding a field to each note's melody chords representing the "style" of each note? Consider that I'm using the database at this link https://github.com/chrisdonahue/nesmdb, and as now I'm training from scratch your model (actually I'm using Yoda) feeding this data with all the model parameters halved with respect to your implementation because of my computational limitations. My final idea is to prime this trained model with a piece from a song of a particular videogame genre previously classified by an LSTM classifier written by me and sees if the model can complete that, both in a creative way and keeping continuity with the genre. To check the "creativity" I thought I’d compare the generated track with those used for training and check that we stay below a certain threshold of similarity (here I could open a discussion only on this topic but I leaving that for now), and for the genre continuity classify the generated track with the LSTM and see the result. I explained all of this because I think that the idea to add a style token may help with my project but as of now I implementing this without and I would like to know what you think of the entire process. Sorry for the length of the message and for my not-so-great English. |
Beta Was this translation helpful? Give feedback.
-
@aibba19 No worries... Your English is good enough :) Your process seems fine...I think it's a good way to go... Do not be afraid to experiment because you never know what will work for your particular task/objective. Yes, basically, what I was proposing for you is to prepend each MIDI note with a particular "style token". For example, all notes for Mario Bros and similar compositions would be prepended with 1, all notes for Contra and similar compositions would be prepended with 2, all notes for Zelda and similar compositions would be prepended with 3, etc... This would allow you note-level generation control/conditioning based on the desired style/composition... I recommended note-level styling because composition-level styling does not produce very good/coherent results because the model usually forgets very quickly what it is asked to play...so it is best to condition on note-level to get the best results. Hope this makes sense... |
Beta Was this translation helpful? Give feedback.
@aibba19 Hello and thank you for your question.
From my experience, fine-tuning music models do not work well because music is a little bit different from text or images. From my experience, it is best to train from scratch on a genre-specific dataset.
However, you are welcome to experiment and see if it will work.
Thank you.