A generalization based hybrid algorithm for clustering semi-structured data

A generalization based hybrid algorithm for clustering semi-structured data(2004)

引用 23|浏览10
暂无评分
摘要
Various clustering algorithms have been developed to group data into classes in diverse domains. These clustering algorithms work effectively on structured data, but they perform poorly on semi-structured data. This is because semi-structure data usually have the properties of high dimensionality and less rigid structure. Additionally, traditional clustering algorithms assume there are no relationships among attributes and treat each attribute as an independent entity when calculating the similarity among objects. In this work, a generalized based methodology that combines attribute hierarchy construction, object generalization and data clustering is presented. The algorithm works well on semi-structured data and requires only a minimum of domain knowledge. Since the algorithm reduces the dimensionality of the semi-structured data, clustering of the resulting generalized data often requires less execution time and computer memory. Experimental results are provided that show this proposed methodology can significantly improve the quality of clustering significantly in some cases. Moreover, when the number of data points is substantially larger than the number of the attributes, this new approach produces more efficient results in less execution time.
更多
查看译文
关键词
traditional clustering algorithm,group data,hybrid algorithm,execution time,semi-structure data,semi-structured data,data point,generalized data,structured data,data clustering,clustering algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要