Web scale photo hash clustering on a single machine

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2015)

引用 108|浏览248
暂无评分
摘要
This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.
更多
查看译文
关键词
Web scale photo hash clustering,single machine,photo sharing Web site,Facebook,Google,Instagram,binary hash code,binary cluster center,binary k-means algorithm,similarity-preserving hashes,binary center,hash index,clustering accuracy,online clustering method,clustering large photo stream,spam detection,photo discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要