On the Benefits of Learning to Route in Mixture-of-Experts Models.
EMNLP 2023(2023)
Key words
mixture-of-experts,transformer,router,efficiency,conditional compute,sparsely activated models,theory
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined