Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
CoRR(2024)
摘要
In this work, we leverage the intrinsic segmentation of language sequences
and design a new positional encoding method called Bilevel Positional Encoding
(BiPE). For each position, our BiPE blends an intra-segment encoding and an
inter-segment encoding. The intra-segment encoding identifies the locations
within a segment and helps the model capture the semantic information therein
via absolute positional encoding. The inter-segment encoding specifies the
segment index, models the relationships between segments, and aims to improve
extrapolation capabilities via relative positional encoding. Theoretical
analysis shows this disentanglement of positional information makes learning
more effective. The empirical results also show that our BiPE has superior
length extrapolation capabilities across a wide range of tasks in diverse text
modalities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要