How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT\'s Attention

COLING(2020)

引用 14|浏览129
暂无评分
摘要
Recent research on the multi-head attention mechanism, especially that in pre-trained modelssuch as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.As most of the research focus on probing tasks or hidden states, previous works have found someprimitive patterns of attention head behavior by heuristic analytical methods, but a more system-atic analysis specific on the attention patterns still remains primitive. In this work, we clearlycluster the attention heatmaps into significantly different patterns through unsupervised cluster-ing on top of a set of proposed features, which corroborates with previous observations. Wefurther study their corresponding functions through analytical study. In addition, our proposedfeatures can be used to explain and calibrate different attention heads in Transformer models.
更多
查看译文
关键词
bert,attention,clustering,distance-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要