Optimal string clustering based on a Laplace-like mixture and EM algorithm on a set of strings

Journal of Computer and System Sciences(2019)

引用 6|浏览26
暂无评分
摘要
In this study, we address the problem of clustering string data in an unsupervised manner by developing a theory of a mixture model and an EM algorithm for strings based on probability theory on a topological monoid of strings developed in our previous studies. We begin with introducing a parametric probability distribution on a set of strings, which has location and dispersion parameters of a string and positive real number. We develop an iteration algorithm for estimating the parameters of the mixture model of the distributions introduced and demonstrate that our algorithm converges to the EM algorithm, which cannot be explicitly written for this mixture model, with probability one and strongly consistently estimates its parameters as the numbers of observed strings and iterations increase. We finally derive a procedure for unsupervised string clustering that is asymptotically optimal in the sense that the posterior probability of making correct classifications is maximized.
更多
查看译文
关键词
Unsupervised string clustering,Topological monoid of strings,Probability theory,Statistical asymptotics,Convergence of a sequence of algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要