Extended Min-Hash Focusing On Intersection Cardinality

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I(2018)

引用 0|浏览6
暂无评分
摘要
Min-Hash is a reputable hashing technique which realizes set similarity search. Min-Hash assumes the Jaccard similarity vertical bar A boolean AND B vertical bar/vertical bar A boolean OR B vertical bar as the similarity measure between two sets A and B. Accordingly, Min-Hash is not optimal for applications which would like to measure the set similarity with the intersection cardinality vertical bar A boolean AND B vertical bar, since the Jaccard similarity decreases irrespective of vertical bar A boolean AND B vertical bar, as the gap between vertical bar A vertical bar and vertical bar B vertical bar becomes larger. This paper shows that, by modifying Min-Hash slightly, we can effectively settle the above difficulty inherent to Min-Hash. Our method is shown to be valid both by theoretical analysis and with experiments.
更多
查看译文
关键词
Set similarity search, Min-Hash, Intersection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要