Toward more careful corpus statistics: uncertainty estimates for frequencies, dispersions, association measures, and more

Research Methods in Applied Linguistics(2022)

引用 5|浏览1
暂无评分
摘要
This article demonstrates that, counter to current practice, (i) corpus-linguistic studies should provide uncertainty/interval estimates for all corpus-linguistic statistics, even for basic/fundamental ones such as frequencies, dispersions, or association measures, and (ii) these statistics should be based on text-/file-based bootstrapping and confidence/data ellipses covering two or more dimensions of information. Four small case studies – three more programmatic and one more applied – are offered to exemplify the logic and method. The first case study shows how parametric confidence intervals or confidence intervals from word-based bootstrapping can be inappropriate; the second case study exemplifies the computation of frequency-cum-dispersion intervals; the third does the same for collocational/collostructional data (the ditransitive); and the last case study exemplifies the use of these methods in a diachronic statutory-interpretation context.
更多
查看译文
关键词
Corpus linguistics,Frequency,Dispersion,Association,Bootstrapping,Interval estimates
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要