On the Power Laws of Language: Word Frequency Distributions

SIGIR(2017)

引用 16|浏览74
暂无评分
摘要
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. The word frequency distributions produced by this model are shown to match the observations both analytically and empirically.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要