## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Kernel Trick Embedded Gaussian Mixture Model

ALGORITHMIC LEARNING THEORY, PROCEEDINGS, (2003): 159-174

WOS SCOPUS EI

Keywords

Abstract

In this paper, we present a kernel trick embedded Gaussian Mixture Model (GMM), called kernel GMM. The basic idea is to embed kernel trick into EM algorithm and deduce a parameter estimation algorithm for GMM in feature space. Kernel GMM could be viewed as a Bayesian Kernel Method. Compared with most classical kernel methods, the proposed...More

Code:

Data:

Introduction

- Kernel trick is an efficient method for nonlinear data analysis early used by Support Vector Machine (SVM) [18].
- In many cases, the authors are required to obtain risk minimization result and incorporate prior knowledge, which could be provided within Bayesian probabilistic framework.
- This makes the emerging of combining kernel trick and Bayesian method, which is called Bayesian Kernel Method [16].
- As Bayesian Kernel Method is in probabilistic framework, it can realize Bayesian optimal decision and estimate confidence or reliability with probabilistic criteria such as Maximum-A-Posterior [5] and so on

Highlights

- Kernel trick is an efficient method for nonlinear data analysis early used by Support Vector Machine (SVM) [18]
- We review some background knowledge including the kernel trick, Gaussian Mixture Model based on EM algorithm and Bayesian Kernel Method
- We present a kernel Gaussian Mixture Model, and deduce a parameter estimation algorithm by embedding kernel trick into EM algorithm
- We adopt a Monte Carlo sampling technique to speedup kernel Gaussian Mixture Model upon large scale problem, make it more practical and efficient
- Our future work will focus on incorporating prior knowledge such as invariance in kernel Gaussian Mixture Model and enriching its applications

Conclusion

- Computational cost and speedup techniques on large scale problem By employing kernel trick, the computational cost of kernel eigen-decomposition based methods is almost involved by the eigen-decomposition step.
- If the size N is not very large (e.g. N ≤ 1, 000), it is not a problem to obtain full eigen-decomposition.
- Compared with most classical kernel methods, kGMM can solve problems in a probabilistic framework.
- It can tackle nonlinear problems better than the traditional GMM.
- The authors' future work will focus on incorporating prior knowledge such as invariance in kGMM and enriching its applications

- Table1: Notation List p(lit) wl(it)
- Table2: Parameter Estimation Algorithm for kGMM
- Table3: Comparison results on USPS data set

Reference

- Achlioptas, D., McSherry, F. and Scholkopf, B.: Sampling techniques for kernel methods. In Advances in Neural Information Processing System (NIPS) 14, MIT Press, Cambridge MA (2002)
- Bilmes, J. A.: A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, Technical Report, UC Berkeley, ICSI-TR-97-021 (1997)
- Bishop, C. M.: Neural Networks for Pattern Recognition, Oxford University Press. (1995)
- Dahmen, J., Keysers, D., Ney, H. and Gld, M.O.: Statistical Image Object Recognition using Mixture Densities. Journel of Mathematical Imaging and Vision, 14(3) (2001) 285-296
- Duda, R. O., Hart, P. E. and Stork, D. G.: Pattern Classification, New York: John Wiley & Sons Press, 2nd Edition. (2001)
- Everitt, B. S.: An Introduction to Latent Variable Models, London: Chapman and Hall. (1984)
- Francis R. B. and Michael I. J.: Kernel Independent Component Analysis, Journal of Machine Learning Research, 3, (2002) 1-48
- Gestel, T. V., Suykens, J.A.K., Lanckriet, G., Lambrechts, A., Moor, B. De and Vanderwalle J.: Bayesian framework for least squares support vector machine classifiers, gaussian processs and kernel fisher discriminant analysis. Neural Computation, 15(5) (2002) 1115-1148
- Herbrich, R., Graepel, T. and Campbell, C.: Bayes Point Machines: Estimating the Bayes Point in Kernel Space. In Proceedings of International Joint Conference on Artificial Intelligence Work-shop on Support Vector Machines, (1999) 23-27
- Kwok, J. T.: The Evidence Framework Applied to Support Vector Machines, IEEE Trans. on NN, Vol. 11 (2000) 1162-1173.
- Mika, S., Ratsch, G., Weston, J., Scholkopf, B. and Mller, K.R.: Fisher discriminant analysis with kernels. IEEE Workshop on Neural Networks for Signal Processing IX, (1999) 41-48
- Mjolsness, E. and Decoste, D.: Machine Learning for Science: State of the Art and Future Pros-pects, Science. Vol. 293 (2001)
- Roberts, S. J.: Parametric and Non-Parametric Unsupervised Cluster Analysis, Pattern Recogni-tion, Vol. 30. No 2, (1997) 261-272
- Scholkopf, B., Smola, A.J. and Mller, K.R.: Nonlinear Component Analysis as a Kernel Eigen-value Problem, Neural Computation, 10(5), (1998) 1299-1319
- Scholkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Mller, K. R., Raetsch, G. and Smola, A.: Input Space vs. Feature Space in Kernel-Based Methods, IEEE Trans. on NN, Vol 10. No. 5, (1999) 1000-1017
- Scholkopf, B. and Smola, A. J.: Learning with Kernels: Support Vector Machines, Regularization and Beyond, MIT Press, Cambridge MA (2002)
- Tipping, M. E.: Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research. (2001)
- Vapnik, V.: The Nature of Statistical Learning Theory, 2nd Edition, SpringerVerlag, New York (1997)
- Williams, C. and Seeger, M.: Using the Nystrom Method to Speed Up Kernel Machines. In T. K. Leen, T. G. Diettrich, and V. Tresp, editors, Advances in Neural Information Processing Systems (NIPS)13. MIT Press, Cambridge MA (2001)
- Taylor, J. S., Williams, C., Cristianini, N. and Kandola J.: On the Eigenspectrum of the Gram Matrix and Its Relationship to the Operator Eigenspectrum, N. CesaBianchi et al. (Eds.): ALT 2002, LNAI 2533, Springer-Verlag, Berlin Heidelberg (2002) 23-40
- Ng, A. Y., Jordan, M. I. and Weiss, Y.: On Spectral Clustering: Analysis and an algorithm, Advance in Neural Information Processing Systems (NIPS) 14, MIT Press, Cambridge MA (2002)
- Moghaddam, B. and Pentland, A.: Probabilistic visual learning for object representation, IEEE Trans. on PAMI, Vol. 19, No. 7 (1997) 696-710 (2) If K is a N × N projecting kernel matrix such that Kij = φ(xi) · wjφ(xj), and Kis a N × N matrix, which is centered in the feature space, such that Kij = φ(xi) · wjφ(xj), then

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn