-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MoEs in src/ and proper load balancing losses #8
base: main
Are you sure you want to change the base?
Conversation
…cing loss and adapt logging
…tion includes moes
… logs bc loss forwarded through layers
@TJ-Solergibert suggested that we could make the MoE models a fully new model file (just like llama.py) and also have a new config, in order to keep things clean, separate, and this wouldn't break the current configs/conversion etc. The downsides of this would be that we add a lot of copied code since the model definition is essentially the same, apart from which MLP definition to use and the extended forward of aux_losses. Any thoughts on this? |
Hi! Yes, I suggest to keep the model apart with its own model definition and config. This is a common practice in HuggingFace For the |
hey @AleHD I believe you discussed it with @TJ-Solergibert, is this ready to merge? |
Copy from PR, huggingface#192, issue huggingface#159
For Swiss AI: need to create a conversion script for MoEs. Probably makes sense to add in another PR @ischlag ?