AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Because the disagreement is an upper bound on the error rate of one view, the good performance of multi-view Expectation Maximization can be explained through this property
ICDM, pp.19-26, (2004)
We consider clustering problems in which the available attributes can be split into two independent subsets, such that either subset suffices for learning. Example applications of this multi-view setting include clustering of web pages which have an intrinsic view (the pages themselves) and an extrinsic view (e.g., anchor texts of inbound...More
PPT (Upload PPT)
- In some interesting application domains, instances are represented by attributes that can naturally be split into two subsets, either of which suffices for learning.
- A prominent example are web pages, which can be classified based on their content as well as based on the anchor texts of inbound hyperlinks; other examples include collections of research papers.
- If few labeled examples and, in addition, unlabeled data are available, the co-training algorithm  and other multi-view classification algorithms [15, 5] improve the classification accuracy often substantially.
- Multi-view algorithms train two independent hypotheses which bootstrap by providing each other with labels for the unlabeled data.
- It gives rise to the question whether the multi-view approach can be used to improve clustering algorithms.
- The rest of this paper is organized as follows.
- In some interesting application domains, instances are represented by attributes that can naturally be split into two subsets, either of which suffices for learning
- Example applications of this multi-view setting include clustering of web pages which have an intrinsic view and an extrinsic view; multi-view learning has so far been studied in the context of classification
- A prominent example are web pages, which can be classified based on their content as well as based on the anchor texts of inbound hyperlinks; other examples include collections of research papers
- We presented the problem setting of clustering in a multi-view environment and described two algorithm types that work in this setting in terms of incorporating the conditional independence property of the views
- In our analysis we discovered that the multi-view Expectation Maximization (EM) algorithm optimizes agreement between the views
- Even when no natural feature split is available, and we randomly split the available features into two subsets, we gain significantly better results than the single-view variants in almost all cases
- Because the disagreement is an upper bound on the error rate of one view, the good performance of multi-view EM can be explained through this property
- The authors presented the problem setting of clustering in a multi-view environment and described two algorithm types that work in this setting in terms of incorporating the conditional independence property of the views.
- The EM-based multi-view algorithms significantly outperform the single-view counterparts for several data sets.
- The agglomerative multi-view algorithm yields equal or worse results than the single-view version in most cases.
- The authors identified that the reason for this behavior is that the mixture components have a smaller overlap when the views are concatenated.
- This means in the single-view setting the probability for cross-component merges is lower, which directly improves cluster quality
- Table1: Multi-View EM. The parameter Θv consists of the concept vectors c(jv); j = 1, . . . , k; v = 1, 2; that have unit length c(jv) = 1. k is the desired number of clusters. All example vectors also have unit length x(iv) = 1. We start with randomly initialized concept vectors c(j2), j = 1, . . . , k. An expectation step assigns the documents that are closest to its concept vector c(jv) to the corresponding partition πj(v) (Equation 8)
- Table2: Multi-view agglomerative clustering
- Research on multi-view learning in the semi-supervised setting has been introduced by two papers, Yarowsky  and Blum and Mitchell . Yarowsky describes an algorithm for word sense disambiguation. It uses a classifier based on the local context of a word (view one) and a second classifier using the senses of other occurrences of that word in the same document (view two), where both classifiers iteratively bootstrap each other.
Blum and Mitchell introduce the term co-training as a general term for bootstrapping procedures in which two hypotheses are trained on distinct views. They describe a cotraining algorithm which augments the training set of two classifiers with the np positive and nn negative highest confidence examples from the unlabeled data in each iteration for each view. The two classifiers work on different views and a new training example is exclusively based on the decision of one classifier.
- This work has been supported by the German Science Foundation DFG under grant SCHE540/10-1
Study subjects and analysis
This procedure generates views which are perfectly independent (peers are selected randomly). The resulting classes are based on the following five pairs of the original 20 newsgroup classes: (comp.graphics, rec.autos), (rec.motorcycles, sci.med), (sci.space, misc.forsale), (rec.sport.hockey, soc.religion.christian), (comp.sys.ibm.pc.hardware, comp.os.ms-windows.misc). We randomly select 200 examples for each of the 10 newsgroups, which results in 1000 concatenated examples uniformly distributed over the five classes
data sets: 6
In order to find out how our algorithms perform when there is no natural feature split in the data, we use document data sets and randomly split the available attributes into two subsets and average the performance over 10 distinct attribute splits. We choose six data sets that come with the CLUTO clustering toolkit: re0 (Reuters-21578), fbis (TREC-5), la1 (Los Angeles Times), hitech (San Jose Mercury), tr11 (TREC) and wap (WebACE project). For a detailed description of the data sets see 
document data sets: 6
The total independence property of the artificial data set seems to support the success of multi-view EM. Figure 4 shows the results for the six document data sets without natural multi-view property, where we randomly split the available attribute sets into two subsets. In ten of twelve cases the multi-view outperform the single-view algorithms significantly
- S. Abney. Bootstrapping. In Proc. of the 40th Annual Meeting of the Association for Comp. Linguistics, 2002.
- A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra. A comparative study of generative models for document clustering. In Proceedings of The Ninth ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2003.
- P. Berkhin. Survey of clustering data mining techniques. Unpublished manuscript, available from accrue.com, 2002.
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Conference on Computational Learning Theory, pages 92–100, 1998.
- U. Brefeld and T. Scheffer. Co-EM support vector learning. In Proc. of the Int. Conf. on Machine Learning, 2004.
- M. Collins and Y. Singer. Unsupervised models for named entity classification. In EMNLP, 1999.
- S. Dasgupta, M. Littman, and D. McAllester. PAC generalization bounds for co-training. In Proceedings of Neural Information Processing Systems (NIPS), 2001.
- A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1977.
- I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42-1:143–175, 2001.
- R. Ghani. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the International Conference on Machine Learning, 2002.
- A. Griffiths, L. Robinson, and P. Willett. Hierarchical agglomerative clustering methods for automatic document classification. Journal of Doc., 40(3):175–205, 1984.
- K. Kailing, H. Kriegel, A. Pryakhin, and M. Schubert. Clustering multi-represented objects with noise. In Proc. of the Pacific-Asia Conf. on Knowl. Disc. and Data Mining, 2004.
- G. N. Lance and W. T. Williams. A general theory of classificatory sorting strategies. i. hierarchical systems. Computer Journal, 9:373–380, 1966.
- K. V. Mardia. Statistics of directional data. Journal of the Royal Statistical Society, Series B, 37:349–393, 1975.
- I. Muslea, C. Kloblock, and S. Minton. Active + semisupervised learning = robust multi-view learning. In Proceedings of the International Conference on Machine Learning, pages 435–442, 2002.
- K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of Information and Knowledge Management, 2000.
- J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W. Ma. Recom: Reinforcement clustering of multi-type interrelated data objects. In Proceedings of the ACM SIGIR Conference on Information Retrieval, 2003.
- D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proc. of the 33rd Annual Meeting of the Association for Comp. Linguistics, 1995.
- Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01-40, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2001, 2001.
- S. Zhong and J. Ghosh. Generative model-based clustering of directional data. In SDM Workshop on Clustering HighDimensional Data and Its Applications, 2003.