Mining Redescriptions with Siren.

TKDD(2018)

引用 5|浏览61
暂无评分
摘要
In many areas of science, scientists need to find distinct common characterizations of the same objects and, vice versa, to identify sets of objects that admit multiple shared descriptions. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions both in terms of the fauna that inhabits them and of their bioclimatic conditions. In data analysis, the task of automatically generating such alternative characterizations is called redescription mining. If a domain expert wants to use redescription mining in his research, merely being able to find redescriptions is not enough. He must also be able to understand the redescriptions found, adjust them to better match his domain knowledge, test alternative hypotheses with them, and guide the mining process toward results he considers interesting. To facilitate these goals, we introduce Siren, an interactive tool for mining and visualizing redescriptions. Siren allows to obtain redescriptions in an anytime fashion through efficient, distributed mining, to examine the results in various linked visualizations, to interact with the results either directly or via the visualizations, and to guide the mining algorithm toward specific redescriptions. In this article, we explain the features of Siren and why they are useful for redescription mining. We also propose two novel redescription mining algorithms that improve the generalizability of the results compared to the existing ones.
更多
查看译文
关键词
Redescription mining, interactive data mining, visual data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要