Profiling Specialized Web Corpus Qualities: A Progress Report On "Domainhood"

ARGENTINIAN JOURNAL OF APPLIED LINGUISTICS(2019)

引用 0|浏览1
暂无评分
摘要
In this article we describe ways to profile the domain specificity, a.k.a. domainhood, of specialized web corpora in English and in Swedish. Several studies have been carried out to measure the "qualities" of general-purpose web corpora. On the contrary, less attention has been paid to the evaluation of specialized or domain-specific web corpora. To fill this gap, in this article we present case studies where we explore the effectiveness of several statistical measures - i.e. rank correlation coefficients (Kendall and Spearman), Kullback-Leibler divergence, log-likelihood and burstiness - to assess domainhood. Our findings indicate that it is possible to profile the domainhood quality of a corpus. However, further research is needed to generalize on the results.
更多
查看译文
关键词
corpus evaluation, term extraction, log- likelihood, rank correlation, Kullback-Leibler divergence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要