SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
CoRR(2024)
摘要
Pre-training large language models is known to be extremely resource
intensive and often times inefficient, under-utilizing the information
encapsulated in the training text sequences. In this paper, we present SpacTor,
a new training procedure consisting of (1) a hybrid objective combining span
corruption (SC) and token replacement detection (RTD), and (2) a two-stage
curriculum that optimizes the hybrid objective over the initial τ
iterations, then transitions to standard SC loss. We show empirically that the
effectiveness of the hybrid objective is tied to the two-stage pre-training
schedule, and provide extensive analysis on why this is the case. In our
experiments with encoder-decoder architectures (T5) on a variety of NLP tasks,
SpacTor-T5 yields the same downstream performance as standard SC pre-training,
while enabling a 50
total FLOPs. Alternatively, given the same amount of computing budget, we find
that SpacTor results in significantly improved downstream benchmark
performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要