## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Online Learning for Latent Dirichlet Allocation.

NIPS, pp.856-864, (2010)

EI

Keywords

Abstract

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream....More

Code:

Data:

Introduction

- Hierarchical Bayesian modeling has become a mainstay in machine learning and applied statistics.
- Research in probabilistic topic modeling—the application the authors will focus on in this paper—revolves around fitting complex hierarchical Bayesian models to large collections of documents.
- A central research problem for topic modeling is to efficiently fit models to larger corpora [4, 5]
- To this end, the authors develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based.
- The authors' algorithm is based on online stochastic optimization, which has been shown to produce good parameter estimates dramatically faster than batch algorithms on large datasets [6].
- Online variational Bayes is a practical new method for estimating the posterior of complex hierarchical Bayesian models

Highlights

- Hierarchical Bayesian modeling has become a mainstay in machine learning and applied statistics
- We develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based
- We study the performance of online Latent Dirichlet Allocation (LDA) in several ways, including by fitting a topic model to 3.3M articles from Wikipedia without looking at the same article twice
- We show that online LDA finds topic models as good as or better than those found with batch variational Bayes (VB), and in a fraction of the time
- We can analyze a corpus of documents with LDA by examining the posterior distribution of the topics β, topic proportions θ, and topic assignments z conditioned on the documents
- We have developed online variational Bayes (VB) for LDA

- Table1: Best settings of κ and τ0 for various mini-batch sizes S, with resulting perplexities on Nature and Wikipedia corpora

Related work

- Comparison with other stochastic learning algorithms. In the standard stochastic gradient optimization setup, the number of parameters to be fit does not depend on the number of observations [19]. However, some learning algorithms must also fit a set of per-observation parameters (such as the per-document variational parameters γd and φd in LDA). The problem is addressed by online coordinate ascent algorithms such as those described in [20, 21, 16, 17, 10]. The goal of these algorithms is to set the global parameters so that the objective is as good as possible once the perobservation parameters are optimized. Most of these approaches assume the computability of a unique optimum for the per-observation parameters, which is not available for LDA.

Funding

- Blei is supported by ONR 175-6343, NSF CAREER 0745520, AFOSR 09NL202, the Alfred P
- Sloan foundation, and a grant from Google
- Bach is supported by ANR (MGA project)

Study subjects and analysis

documents: 352549

Although online LDA converges to a stationary point for any valid κ, τ0, and S, the quality of this stationary point and the speed of convergence may depend on how the learning parameters are set. We evaluated a range of settings of the learning parameters κ, τ0, and S on two corpora: 352,549 documents from the journal Nature 3 and 100,000 documents downloaded from the English ver-. 3For the Nature articles, we removed all words not in a pruned vocabulary of 4,253 words

documents: 256

Higher values of the learning rate κ and the downweighting parameter τ0 lead to better performance for small mini-batch sizes S, but worse performance for larger values of S. Mini-batch sizes of at least 256 documents outperform smaller mini-batch sizes. Comparing batch and online on fixed corpora

articles: 60000

To demonstrate the ability of online VB to perform in a true online setting, we wrote a Python script to continually download and analyze mini-batches of articles chosen at random from a list of approximately 3.3 million Wikipedia articles. This script can download and analyze about 60,000 articles an hour. It completed a pass through all 3.3 million articles in under three days

Wikipedia articles: 1000

We ran online LDA with κ = 0.5, τ0 = 1024, and S = 1024. Figure 1 shows the evolution of the perplexity obtained on the held-out validation set of 1,000 Wikipedia articles by the online algorithm as a function of number of articles seen. Shown for comparison is the perplexity obtained by the online algorithm (with the same parameters) fit to only 98,000 Wikipedia articles, and that obtained by the batch algorithm fit to the same 98,000 articles

Wikipedia articles: 98000

Figure 1 shows the evolution of the perplexity obtained on the held-out validation set of 1,000 Wikipedia articles by the online algorithm as a function of number of articles seen. Shown for comparison is the perplexity obtained by the online algorithm (with the same parameters) fit to only 98,000 Wikipedia articles, and that obtained by the batch algorithm fit to the same 98,000 articles. The online algorithm outperforms the batch algorithm regardless of which training dataset is used, but it does best with access to a constant stream of novel documents

Reference

- M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. arXiv, (0712.2526), 2008.
- D. Blei and M. Jordan. Variational methods for the Dirichlet process. In Proc. 21st Int’l Conf. on Machine Learning, 2004.
- A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
- D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Neural Information Processing Systems, 2007.
- Feng Yan, Ningyi Xu, and Yuan Qi. Parallel inference for latent Dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems 22, pages 2134–2142, 2009.
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, volume 20, pages 161–168. NIPS Foundation (http://books.nips.cc), 2008.
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
- Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Advances in Neural Information Processing Systems 22, pages 1973–1981, 2009.
- W. Buntine. Variational extentions to EM and multinomial PCA. In European Conf. on Machine Learning, 2002.
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1):19–60, 2010.
- L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD 2009: Proc. 15th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining, pages 937–946, 2009.
- M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
- H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12, 2000.
- A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.
- L. Bottou and N. Murata. Stochastic approximations and efficient learning. The Handbook of Brain Theory and Neural Networks, Second edition. The MIT Press, Cambridge, MA, 2002.
- M.A. Sato. Online model selection based on the variational Bayes. Neural Computation, 13(7):1649– 1681, 2001.
- P. Liang and D. Klein. Online EM for unsupervised models. In Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 611–619, 2009.
- H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407, 1951.
- L. Bottou. Online learning and stochastic approximations. Cambridge University Press, Cambridge, UK, 1998.
- R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89:355–368, 1998.
- M.A. Sato and S. Ishii. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12(2):407–432, 2000.
- T. Griffiths and M. Steyvers. Finding scientific topics. Proc. National Academy of Science, 2004.
- X. Song, C.Y. Lin, B.L. Tseng, and M.T. Sun. Modeling and predicting personal information dissemination behavior. In KDD 2005: Proc. 11th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining. ACM, 2005.
- K.R. Canini, L. Shi, and T.L. Griffiths. Online inference of topics with latent Dirichlet allocation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, 2009.
- J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 21 (NIPS), 2009.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn