Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
ICLR, Volume abs/1701.06538, 2017.
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there a...More