SciCSM: novel contrast set mining over scientific datasets using bitmap indices

International Conference on Scientific and Statistical DB Management(2015)

引用 25|浏览45
暂无评分
摘要
Contrast set mining is a broadly applicable exploratory technique, which identifies interesting differences across contrast groups. The existing algorithms primarily target relational datasets with categorical attributes. There is clearly a need to apply this method to discover interesting patterns across scientific datasets, which feature arrays with numeric values. In this paper, we present a novel algorithm, SciCSM, for efficient contrast set mining over array-based datasets. We define how \"interesting\" contrast sets can be characterized for numeric and array data -- handling the fact that subsets can involve both value-based and/or dimension-based attributes. We extensively use bitmap indices to reduce computational complexity and enable processing of larger-scale data. We demonstrate both high efficiency and effectiveness of our algorithm by using multiple real-life datasets.
更多
查看译文
关键词
Contrast Set Mining, Scientific Data Management, Bitmap Indexing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要