Itch: Information-Theoretic Cluster Hierarchies

ECMLPKDD'10: Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I(2010)

引用 7|浏览54
暂无评分
摘要
Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e. g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings. With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals.
更多
查看译文
关键词
hierarchical cluster structure,hierarchical clustering,hierarchical clustering algorithm,hierarchical clustering method,hierarchical variant,achievable compression rate,data set,effective data compression,natural objective function,probability density function,information-theoretic cluster hierarchy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要