PFO: A Parallel Friendly High Performance System for Online Query and Update of Nearest Neighbors

arxiv(2016)

引用 1|浏览6
暂无评分
摘要
Nearest Neighbor(s) search is the fundamental computational primitive to tackle massive dataset. Locality Sensitive Hashing (LSH) has been a bracing tool for Nearest Neighbor(s) search in high dimensional spaces. However, traditional LSH systems cannot be applied in online big data systems to handle a large volume of query/update requests, because most of the systems optimize the query efficiency with the assumption of infrequent updates and missing the parallel-friendly design. As a result, the state-of-the-art LSH systems cannot adapt the system response to the user behavior interactively. In this paper, we propose a new LSH system called PFO. It handles query/update requests in RAM and scales the system capacity by using flash memory. To achieve high streaming data throughput, PFO adopts a parallel-friendly indexing structure while preserving the distance between data points. Further, it accommodates inbound data in real-time and dispatches update requests intelligently to eliminate the cross-threads synchronization. We carried out extensive evaluations with large synthetic and standard benchmark datasets. Results demonstrate that PFO delivers shorter latency and offers scalable capacity compared with the existing LSH systems. PFO serves with higher throughput than the state-of-the-art LSH indexing structure when dealing with online query/update requests to nearest neighbors. Meanwhile, PFO returns neighbors with much better quality, thus being efficient to handle online big data applications, e.g. streaming recommendation system, interactive machine learning systems.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要