Are Sixteen Heads Really Better than One?
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), pp. 14014-14024, 2019.
Attention is a powerful and ubiquitous mechanism for allowing neural models to focus on particular salient pieces of information by taking their weighted average when making predictions. In particular, multi-headedattention is a driving force behind many recent state-of-the-art natural language processing (NLP) models such as Transformer-...More
PPT (Upload PPT)