BPE-Dropout: Simple and Effective Subword Regularization
ACL, pp. 1882-1892, 2020.
We introduce Byte Pair Encoding-dropout – simple and effective subword regularization, which operates within the standard Byte Pair Encoding framework
Subword segmentation is widely used to address the open vocabulary problem in machine translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE), which keeps the most frequent words intact while splitting the rare ones into multiple tokens. While multiple segmentations are possible even with the same vocabula...更多
下载 PDF 全文