BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections

ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2015(2015)

引用 64|浏览13
暂无评分
摘要
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this paper we announce the BigARTM open source project (http://bigartm.org) for regularized multimodal topic modeling of large collections. Several experiments on Wikipedia corpus show that BigARTM performs faster and gives better perplexity comparing to other popular packages, such as Vowpal Wabbit and Gensim. We also demonstrate several unique BigARTM features, such as additive combination of regularizers, topic sparsing and decorrelation, multimodal and multilanguage modeling, which are not available in the other software packages for topic modeling.
更多
查看译文
关键词
Probabilistic topic modeling,Probabilistic latent sematic analysis,Latent dirichlet allocation,Additive regularization of topic models,Stochastic matrix factorization,EM-algorithm,BigARTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要