Fully distributed EM for very large datasets

Proceedings of the 25th international conference on Machine learning(2008)

引用 119|浏览0
暂无评分
摘要
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.
更多
查看译文
关键词
mapreduce topology,large data set,data item,junction-tree topology,entire em procedure,single node,large datasets,topic modeling,e-step computation,node interacts,related algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要