Reliable measures of synt a ctic and lexical complexity : The case of Iris Murdoch

Stefan Evert,Sebastian Wankerl,Elmar Nöth, Friedrich-Alexander-Universität

semanticscholar(2017)

引用 1|浏览0
暂无评分
摘要
Quantitative measures of the syntactic and lexical complexity of natural language text – such as type-token ratio (TTR), Yule’s K (1944) or Yngve depth (Yngve, 1960) – play a central role in stylometric analysis. They have been used to investigate stylometric differences between writers and settle questions of disputed authorship (Stamatatos, 2009), to explore the characteristics of translated texts (Volansky, Ordan, & Wintner, 2015), to identify determinants of style in scientific writing (Bergsma, Post, & Yarowsky, 2012), to study diachronic changes in grammar (Bentz, Kiela, Hill, & Buttery, 2014), to assess the readability and difficulty level of a text (Graesser, McNamara, Louwerse, & Cai, 2004; Collins-Thompson, 2014), and as a feature in the multivariate analysis of linguistic variation (Biber, 1988; Diwersy, Evert, & Neumann, 2014). In particular, several recent studies (Garrard, Maloney, Hodges, & Patterson, 2005; Pakhomov, Chacon, Wicklund, & Gundel, 2011; Le, Lancashire, Hirst, & Jokel, 2011) attempt to detect early symptoms of dementia in the last novels written by the British author Iris Murdoch, who was diagnosed with Alzheimer’s disease in 1997. These studies focus primarily on quantitative complexity measures, based on the assumption that beginning dementia reduces either the lexical or the syntactic complexity of a patient’s writing. Results were inconclusive: while the first two studies obsereved a promising decline of complexity in Murdoch’s last novel Jackson’s Dilemma published in 1995,1 Le et al. (2011) analyzed a larger sample of Murdoch’s writings and found that most of the quantitative measures did not to show any clear effects. In particular, they rejected the hypothesis of a decline in syntactic complexity. Like most work in stylometry, all three studies fail to take the sampling distributions of complexity measures into account. As a result, they are prone to over-interpreting observed differences that may well be explained by random variation. Only Le et al. (2011) apply significance tests, but they test for a linear trend in complexity across the span of Murdoch’s writing career, which would not be consistent with the typical development of Alzheimer’s disease. In this paper, we propose a novel methodology for the computation of reliable confidence intervals and significance tests for measures of linguistic complexity, inspired by ideas from bootstrapping and cross-validation. As an illustration, we apply the new method to the case of Iris Murdoch, showing that most of the differences observed in previous work are not signficant and can indeed be accounted for by sampling variation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要