Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move MoE Implementation into src/, add Load Balancing Losses #192

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

haeggee
Copy link

@haeggee haeggee commented Jun 6, 2024

WIP #159

Hey everyone :)

As mentioned in the issue, I think it's worth it to move the MoE code into source, so I'm working on it here. MoEs have become a common/standard architecture choice for LLMs now. With this PR, it would be possible to run MoEs via the pip installed nanotron -- no need to be inside examples/ folder. Also, auxiliary losses are essential for training MoEs at scale (stability, load balancing & max utilization!).

Points I've tried to make sure:

  • Not much more code in src (the moe.py file in models, some if statements inside the main model, the forwarding of additional aux loss value)
  • With the config setup, a standard dense model is simply one with num_experts=1. For >1, the dropless MoE is used
  • Working for all parallelisms without errors
  • With latest commit, pipeline parallel also has correct logging: the load balancing losses are computed inside the MLPs and passed forward + summed cumulatively

Still TODO:

  • z-loss

Questions and input needed would be on:

  • Is there a more elegant way to forward the aux losses?
  • Inside the pipeline engine, it's currently hardcoded to check for "load_balancing_loss" for the backward pass (just like it was before with "loss"). Would you want to make this more generic, i.e., loop over all keys in the dictionary to store them? Note that the two losses are separate (and not added up before) for logging.

Please feel free to give feedback or request other changes! I want to make sure it's clean and correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants