Topic Modeling using Variational Auto-Encoders with Gumbel-Softmax and Logistic-Normal Mixture Distributions

2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2018)

引用 8|浏览32
暂无评分
摘要
Probabilistic Topic Models are widely applied in many NLP-related tasks due to their effective use of unlabeled data to capture variable dependencies. Analytical solutions for Bayesian inference of such models, however, are usually intractable, hindering the proposition of highly expressive text models. In this scenario, Variational Auto-Encoders (VAEs), where an inference network (the encoder) is used to approximate the posterior distribution, became a promising alternative for inferring latent topic distributions of text documents. These models, however, also pose new challenges such as the requirement of continuous and reparameterizable distributions which may not fit so well the true latent topic distributions. Moreover, inference networks are prone to component collapsing, impairing the collection of coherent topics. To overcome these problems, we propose two new text topic models based on the categorical distribution Gumbel-Softmax (GSDTM) and on mixtures of Logistic-Normal distributions (LMDTM). We also provide a study on the impact of different modeling choices on the generated topics, observing a trade-off between topic coherence and document reconstruction. Through experiments using two reference datasets, we show that GSDTM largely outperforms previous state-of-the-art baselines when considering three different evaluation metrics.
更多
查看译文
关键词
unlabeled data,text documents,continuous distributions,reparameterizable distributions,text topic models,categorical distribution Gumbel-Softmax,Logistic-Normal distributions,document reconstruction,NLP-related tasks,Bayesian inference network,variational auto-encoders,posterior distribution approximation,topic coherence,probabilistic topic models,latent topic distributions,LMDTM,GSDTM,VAEs,logistic-normal mixture distributions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要