A Three-Parameter Rank-Frequency Relation In Natural Languages
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020)(2020)
摘要
We present that, the rank-frequency relation in textual data follows f proportional to r(-alpha) (r+gamma)(-beta), where f is the token frequency and r is the rank by frequency, with (alpha, beta, gamma) as parameters. The formulation is derived based on the empirical observation that d(2)(x + y)/dx(2) is a typical impulse function, where (x, y) = (log r, log f). The formulation is the power law when beta = 0 and the Zipf-Mandelbrot law when alpha = 0. We illustrate that alpha is related to the analytic features of syntax and beta + gamma to those of morphology in natural languages from an investigation of multilingual corpora.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要