Streaming Algorithms for Estimating High Set Similarities in LogLog Space
IEEE Transactions on Knowledge and Data Engineering(2021)
摘要
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases and machine learning. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications. Its two compressed versions, $b$更多
查看译文
关键词
Registers,Estimation error,Time complexity,Trajectory,Databases,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要