ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ICLR, 2020.

Cited by: 323|Views1146
EI
Weibo:
We find that BERT performance is being slightly harmed from the pre-train fine-tune mismatch from tokens, as Replace masked language modeling slightly outperforms BERT

Abstract:

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we ...More
Your rating :
0

 

Tags
Comments