Chrome Extension
WeChat Mini Program
Use on ChatGLM

On the Benefits of Learning to Route in Mixture-of-Experts Models.

EMNLP 2023(2023)

Cited 18|Views41
Key words
mixture-of-experts,transformer,router,efficiency,conditional compute,sparsely activated models,theory
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined