Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
ACL (1), pp. 2978-2988, 2019.
Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, Transformer-XL, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concrete...More
PPT (Upload PPT)