Skip to content

MoEs are here! 🎉

Compare
Choose a tag to compare
@NouamaneTazi NouamaneTazi released this 16 Feb 18:33
· 590 commits to main since this release
372fdc1

How to use nanotron's MoEs

To use nanotron's 3D parallel implementation of MoEs simply add dMoE to your modeling as such:

        self.block_sparse_moe = dMoE(
            config,
            expert_parallel_group=parallel_context.expert_pg,
            tp_pg=parallel_context.tp_pg,
            parallel_config=parallel_config,
        )

See example in examples/moe/llamoe.py
You can control expert parallelism degree by setting parallelism.expert_parallel_size and weight parallelism degree is the same as tensor parallel degree

What's Changed

New Contributors

Full Changelog: v0.1...v0.2