A Three-Parameter Rank-Frequency Relation In Natural Languages

58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020)(2020)

引用 2|浏览437
暂无评分
摘要
We present that, the rank-frequency relation in textual data follows f proportional to r(-alpha) (r+gamma)(-beta), where f is the token frequency and r is the rank by frequency, with (alpha, beta, gamma) as parameters. The formulation is derived based on the empirical observation that d(2)(x + y)/dx(2) is a typical impulse function, where (x, y) = (log r, log f). The formulation is the power law when beta = 0 and the Zipf-Mandelbrot law when alpha = 0. We illustrate that alpha is related to the analytic features of syntax and beta + gamma to those of morphology in natural languages from an investigation of multilingual corpora.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要