Streaming Algorithms for Estimating High Set Similarities in LogLog Space

IEEE Transactions on Knowledge and Data Engineering(2021)

引用 7|浏览525
暂无评分
摘要
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases and machine learning. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications. Its two compressed versions, $b$更多
查看译文
关键词
Registers,Estimation error,Time complexity,Trajectory,Databases,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要