AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Because the disagreement is an upper bound on the error rate of one view, the good performance of multi-view Expectation Maximization can be explained through this property

Multi-View Clustering

ICDM, pp.19-26, (2004)

Cited by: 647|Views129
EI
Full Text
Bibtex
Weibo

Abstract

We consider clustering problems in which the available attributes can be split into two independent subsets, such that either subset suffices for learning. Example applications of this multi-view setting include clustering of web pages which have an intrinsic view (the pages themselves) and an extrinsic view (e.g., anchor texts of inbound...More

Code:

Data:

0
Introduction
  • In some interesting application domains, instances are represented by attributes that can naturally be split into two subsets, either of which suffices for learning.
  • A prominent example are web pages, which can be classified based on their content as well as based on the anchor texts of inbound hyperlinks; other examples include collections of research papers.
  • If few labeled examples and, in addition, unlabeled data are available, the co-training algorithm [4] and other multi-view classification algorithms [15, 5] improve the classification accuracy often substantially.
  • Multi-view algorithms train two independent hypotheses which bootstrap by providing each other with labels for the unlabeled data.
  • It gives rise to the question whether the multi-view approach can be used to improve clustering algorithms.
  • The rest of this paper is organized as follows.
Highlights
  • In some interesting application domains, instances are represented by attributes that can naturally be split into two subsets, either of which suffices for learning
  • Example applications of this multi-view setting include clustering of web pages which have an intrinsic view and an extrinsic view; multi-view learning has so far been studied in the context of classification
  • A prominent example are web pages, which can be classified based on their content as well as based on the anchor texts of inbound hyperlinks; other examples include collections of research papers
  • We presented the problem setting of clustering in a multi-view environment and described two algorithm types that work in this setting in terms of incorporating the conditional independence property of the views
  • In our analysis we discovered that the multi-view Expectation Maximization (EM) algorithm optimizes agreement between the views
  • Even when no natural feature split is available, and we randomly split the available features into two subsets, we gain significantly better results than the single-view variants in almost all cases
  • Because the disagreement is an upper bound on the error rate of one view, the good performance of multi-view EM can be explained through this property
Conclusion
  • The authors presented the problem setting of clustering in a multi-view environment and described two algorithm types that work in this setting in terms of incorporating the conditional independence property of the views.
  • The EM-based multi-view algorithms significantly outperform the single-view counterparts for several data sets.
  • The agglomerative multi-view algorithm yields equal or worse results than the single-view version in most cases.
  • The authors identified that the reason for this behavior is that the mixture components have a smaller overlap when the views are concatenated.
  • This means in the single-view setting the probability for cross-component merges is lower, which directly improves cluster quality
Tables
  • Table1: Multi-View EM. The parameter Θv consists of the concept vectors c(jv); j = 1, . . . , k; v = 1, 2; that have unit length c(jv) = 1. k is the desired number of clusters. All example vectors also have unit length x(iv) = 1. We start with randomly initialized concept vectors c(j2), j = 1, . . . , k. An expectation step assigns the documents that are closest to its concept vector c(jv) to the corresponding partition πj(v) (Equation 8)
  • Table2: Multi-view agglomerative clustering
Download tables as Excel
Related work
  • Research on multi-view learning in the semi-supervised setting has been introduced by two papers, Yarowsky [18] and Blum and Mitchell [4]. Yarowsky describes an algorithm for word sense disambiguation. It uses a classifier based on the local context of a word (view one) and a second classifier using the senses of other occurrences of that word in the same document (view two), where both classifiers iteratively bootstrap each other.

    Blum and Mitchell introduce the term co-training as a general term for bootstrapping procedures in which two hypotheses are trained on distinct views. They describe a cotraining algorithm which augments the training set of two classifiers with the np positive and nn negative highest confidence examples from the unlabeled data in each iteration for each view. The two classifiers work on different views and a new training example is exclusively based on the decision of one classifier.
Funding
  • This work has been supported by the German Science Foundation DFG under grant SCHE540/10-1
Study subjects and analysis
pairs: 5
This procedure generates views which are perfectly independent (peers are selected randomly). The resulting classes are based on the following five pairs of the original 20 newsgroup classes: (comp.graphics, rec.autos), (rec.motorcycles, sci.med), (sci.space, misc.forsale), (rec.sport.hockey, soc.religion.christian), (comp.sys.ibm.pc.hardware, comp.os.ms-windows.misc). We randomly select 200 examples for each of the 10 newsgroups, which results in 1000 concatenated examples uniformly distributed over the five classes

data sets: 6
In order to find out how our algorithms perform when there is no natural feature split in the data, we use document data sets and randomly split the available attributes into two subsets and average the performance over 10 distinct attribute splits. We choose six data sets that come with the CLUTO clustering toolkit: re0 (Reuters-21578), fbis (TREC-5), la1 (Los Angeles Times), hitech (San Jose Mercury), tr11 (TREC) and wap (WebACE project). For a detailed description of the data sets see [19]

document data sets: 6
The total independence property of the artificial data set seems to support the success of multi-view EM. Figure 4 shows the results for the six document data sets without natural multi-view property, where we randomly split the available attribute sets into two subsets. In ten of twelve cases the multi-view outperform the single-view algorithms significantly

Reference
  • S. Abney. Bootstrapping. In Proc. of the 40th Annual Meeting of the Association for Comp. Linguistics, 2002.
    Google ScholarLocate open access versionFindings
  • A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra. A comparative study of generative models for document clustering. In Proceedings of The Ninth ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2003.
    Google ScholarLocate open access versionFindings
  • P. Berkhin. Survey of clustering data mining techniques. Unpublished manuscript, available from accrue.com, 2002.
    Google ScholarFindings
  • A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Conference on Computational Learning Theory, pages 92–100, 1998.
    Google ScholarLocate open access versionFindings
  • U. Brefeld and T. Scheffer. Co-EM support vector learning. In Proc. of the Int. Conf. on Machine Learning, 2004.
    Google ScholarLocate open access versionFindings
  • M. Collins and Y. Singer. Unsupervised models for named entity classification. In EMNLP, 1999.
    Google ScholarLocate open access versionFindings
  • S. Dasgupta, M. Littman, and D. McAllester. PAC generalization bounds for co-training. In Proceedings of Neural Information Processing Systems (NIPS), 2001.
    Google ScholarLocate open access versionFindings
  • A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1977.
    Google ScholarLocate open access versionFindings
  • I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42-1:143–175, 2001.
    Google ScholarLocate open access versionFindings
  • R. Ghani. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the International Conference on Machine Learning, 2002.
    Google ScholarLocate open access versionFindings
  • A. Griffiths, L. Robinson, and P. Willett. Hierarchical agglomerative clustering methods for automatic document classification. Journal of Doc., 40(3):175–205, 1984.
    Google ScholarLocate open access versionFindings
  • K. Kailing, H. Kriegel, A. Pryakhin, and M. Schubert. Clustering multi-represented objects with noise. In Proc. of the Pacific-Asia Conf. on Knowl. Disc. and Data Mining, 2004.
    Google ScholarLocate open access versionFindings
  • G. N. Lance and W. T. Williams. A general theory of classificatory sorting strategies. i. hierarchical systems. Computer Journal, 9:373–380, 1966.
    Google ScholarLocate open access versionFindings
  • K. V. Mardia. Statistics of directional data. Journal of the Royal Statistical Society, Series B, 37:349–393, 1975.
    Google ScholarLocate open access versionFindings
  • I. Muslea, C. Kloblock, and S. Minton. Active + semisupervised learning = robust multi-view learning. In Proceedings of the International Conference on Machine Learning, pages 435–442, 2002.
    Google ScholarLocate open access versionFindings
  • K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of Information and Knowledge Management, 2000.
    Google ScholarLocate open access versionFindings
  • J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W. Ma. Recom: Reinforcement clustering of multi-type interrelated data objects. In Proceedings of the ACM SIGIR Conference on Information Retrieval, 2003.
    Google ScholarLocate open access versionFindings
  • D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proc. of the 33rd Annual Meeting of the Association for Comp. Linguistics, 1995.
    Google ScholarLocate open access versionFindings
  • Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01-40, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2001, 2001.
    Google ScholarFindings
  • S. Zhong and J. Ghosh. Generative model-based clustering of directional data. In SDM Workshop on Clustering HighDimensional Data and Its Applications, 2003.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科