Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty

Journal of Memory and Language(2024)

引用 0|浏览1
暂无评分
摘要
Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences – one of the major concerns of psycholinguistics – be explained by predictability, as estimated using computational language models, and operationalized as surprisal (negative log probability)? A precise, quantitative test of this question requires a much larger scale data collection effort than has been done in the past. We present the Syntactic Ambiguity Processing Benchmark, a dataset of self-paced reading times from 2000 participants, who read a diverse set of complex English sentences. This dataset makes it possible to measure processing difficulty associated with individual syntactic constructions, and even individual sentences, precisely enough to rigorously test the predictions of computational models of language comprehension. By estimating the function that relates surprisal to reading times from filler items included in the experiment, we find that the predictions of language models with two different architectures sharply diverge from the empirical reading time data, dramatically underpredicting processing difficulty, failing to predict relative difficulty among different syntactic ambiguous constructions, and only partially explaining item-wise variability. These findings suggest that next-word prediction is most likely insufficient on its own to explain human syntactic processing.
更多
查看译文
关键词
Sentence processing,Prediction,Surprisal,Language models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要