HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic Transfer
CoRR(2024)
摘要
The Mixture of Experts (MoE) for language models has been proven effective in
augmenting the capacity of models by dynamically routing each input token to a
specific subset of experts for processing. Despite the success, most existing
methods face a challenge for balance between sparsity and the availability of
expert knowledge: enhancing performance through increased use of expert
knowledge often results in diminishing sparsity during expert selection. To
mitigate this contradiction, we propose HyperMoE, a novel MoE framework built
upon Hypernetworks. This framework integrates the computational processes of
MoE with the concept of knowledge transferring in multi-task learning. Specific
modules generated based on the information of unselected experts serve as
supplementary information, which allows the knowledge of experts not selected
to be used while maintaining selection sparsity. Our comprehensive empirical
evaluations across multiple datasets and backbones establish that HyperMoE
significantly outperforms existing MoE methods under identical conditions
concerning the number of experts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要