AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have developed online variational Bayes for Latent Dirichlet Allocation

Online Learning for Latent Dirichlet Allocation.

NIPS, pp.856-864, (2010)

Cited by: 1524|Views208
EI
Full Text
Bibtex
Weibo

Abstract

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream....More

Code:

Data:

0
Introduction
  • Hierarchical Bayesian modeling has become a mainstay in machine learning and applied statistics.
  • Research in probabilistic topic modeling—the application the authors will focus on in this paper—revolves around fitting complex hierarchical Bayesian models to large collections of documents.
  • A central research problem for topic modeling is to efficiently fit models to larger corpora [4, 5]
  • To this end, the authors develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based.
  • The authors' algorithm is based on online stochastic optimization, which has been shown to produce good parameter estimates dramatically faster than batch algorithms on large datasets [6].
  • Online variational Bayes is a practical new method for estimating the posterior of complex hierarchical Bayesian models
Highlights
  • Hierarchical Bayesian modeling has become a mainstay in machine learning and applied statistics
  • We develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based
  • We study the performance of online Latent Dirichlet Allocation (LDA) in several ways, including by fitting a topic model to 3.3M articles from Wikipedia without looking at the same article twice
  • We show that online LDA finds topic models as good as or better than those found with batch variational Bayes (VB), and in a fraction of the time
  • We can analyze a corpus of documents with LDA by examining the posterior distribution of the topics β, topic proportions θ, and topic assignments z conditioned on the documents
  • We have developed online variational Bayes (VB) for LDA
Tables
  • Table1: Best settings of κ and τ0 for various mini-batch sizes S, with resulting perplexities on Nature and Wikipedia corpora
Download tables as Excel
Related work
  • Comparison with other stochastic learning algorithms. In the standard stochastic gradient optimization setup, the number of parameters to be fit does not depend on the number of observations [19]. However, some learning algorithms must also fit a set of per-observation parameters (such as the per-document variational parameters γd and φd in LDA). The problem is addressed by online coordinate ascent algorithms such as those described in [20, 21, 16, 17, 10]. The goal of these algorithms is to set the global parameters so that the objective is as good as possible once the perobservation parameters are optimized. Most of these approaches assume the computability of a unique optimum for the per-observation parameters, which is not available for LDA.
Funding
  • Blei is supported by ONR 175-6343, NSF CAREER 0745520, AFOSR 09NL202, the Alfred P
  • Sloan foundation, and a grant from Google
  • Bach is supported by ANR (MGA project)
Study subjects and analysis
documents: 352549
Although online LDA converges to a stationary point for any valid κ, τ0, and S, the quality of this stationary point and the speed of convergence may depend on how the learning parameters are set. We evaluated a range of settings of the learning parameters κ, τ0, and S on two corpora: 352,549 documents from the journal Nature 3 and 100,000 documents downloaded from the English ver-. 3For the Nature articles, we removed all words not in a pruned vocabulary of 4,253 words

documents: 256
Higher values of the learning rate κ and the downweighting parameter τ0 lead to better performance for small mini-batch sizes S, but worse performance for larger values of S. Mini-batch sizes of at least 256 documents outperform smaller mini-batch sizes. Comparing batch and online on fixed corpora

articles: 60000
To demonstrate the ability of online VB to perform in a true online setting, we wrote a Python script to continually download and analyze mini-batches of articles chosen at random from a list of approximately 3.3 million Wikipedia articles. This script can download and analyze about 60,000 articles an hour. It completed a pass through all 3.3 million articles in under three days

Wikipedia articles: 1000
We ran online LDA with κ = 0.5, τ0 = 1024, and S = 1024. Figure 1 shows the evolution of the perplexity obtained on the held-out validation set of 1,000 Wikipedia articles by the online algorithm as a function of number of articles seen. Shown for comparison is the perplexity obtained by the online algorithm (with the same parameters) fit to only 98,000 Wikipedia articles, and that obtained by the batch algorithm fit to the same 98,000 articles

Wikipedia articles: 98000
Figure 1 shows the evolution of the perplexity obtained on the held-out validation set of 1,000 Wikipedia articles by the online algorithm as a function of number of articles seen. Shown for comparison is the perplexity obtained by the online algorithm (with the same parameters) fit to only 98,000 Wikipedia articles, and that obtained by the batch algorithm fit to the same 98,000 articles. The online algorithm outperforms the batch algorithm regardless of which training dataset is used, but it does best with access to a constant stream of novel documents

Reference
  • M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. arXiv, (0712.2526), 2008.
    Google ScholarFindings
  • D. Blei and M. Jordan. Variational methods for the Dirichlet process. In Proc. 21st Int’l Conf. on Machine Learning, 2004.
    Google ScholarLocate open access versionFindings
  • A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
    Google ScholarLocate open access versionFindings
  • D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Neural Information Processing Systems, 2007.
    Google ScholarLocate open access versionFindings
  • Feng Yan, Ningyi Xu, and Yuan Qi. Parallel inference for latent Dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems 22, pages 2134–2142, 2009.
    Google ScholarLocate open access versionFindings
  • L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, volume 20, pages 161–168. NIPS Foundation (http://books.nips.cc), 2008.
    Locate open access versionFindings
  • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
    Google ScholarLocate open access versionFindings
  • Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Advances in Neural Information Processing Systems 22, pages 1973–1981, 2009.
    Google ScholarLocate open access versionFindings
  • W. Buntine. Variational extentions to EM and multinomial PCA. In European Conf. on Machine Learning, 2002.
    Google ScholarLocate open access versionFindings
  • J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1):19–60, 2010.
    Google ScholarLocate open access versionFindings
  • L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD 2009: Proc. 15th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining, pages 937–946, 2009.
    Google ScholarLocate open access versionFindings
  • M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
    Google ScholarLocate open access versionFindings
  • H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12, 2000.
    Google ScholarLocate open access versionFindings
  • A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.
    Google ScholarLocate open access versionFindings
  • L. Bottou and N. Murata. Stochastic approximations and efficient learning. The Handbook of Brain Theory and Neural Networks, Second edition. The MIT Press, Cambridge, MA, 2002.
    Google ScholarFindings
  • M.A. Sato. Online model selection based on the variational Bayes. Neural Computation, 13(7):1649– 1681, 2001.
    Google ScholarLocate open access versionFindings
  • P. Liang and D. Klein. Online EM for unsupervised models. In Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 611–619, 2009.
    Google ScholarLocate open access versionFindings
  • H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407, 1951.
    Google ScholarLocate open access versionFindings
  • L. Bottou. Online learning and stochastic approximations. Cambridge University Press, Cambridge, UK, 1998.
    Google ScholarFindings
  • R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89:355–368, 1998.
    Google ScholarFindings
  • M.A. Sato and S. Ishii. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12(2):407–432, 2000.
    Google ScholarLocate open access versionFindings
  • T. Griffiths and M. Steyvers. Finding scientific topics. Proc. National Academy of Science, 2004.
    Google ScholarLocate open access versionFindings
  • X. Song, C.Y. Lin, B.L. Tseng, and M.T. Sun. Modeling and predicting personal information dissemination behavior. In KDD 2005: Proc. 11th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining. ACM, 2005.
    Google ScholarLocate open access versionFindings
  • K.R. Canini, L. Shi, and T.L. Griffiths. Online inference of topics with latent Dirichlet allocation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, 2009.
    Google ScholarLocate open access versionFindings
  • J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 21 (NIPS), 2009.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科