GigaHash: scalable minimal perfect hashing for billions of urls.

WWW'07: 16th International World Wide Web Conference Banff Alberta Canada May, 2007(2007)

引用 8|浏览15
暂无评分
摘要
A minimal perfect function maps a static set of n keys on to the range of integers {0,1,2,...,n - 1}. We present a scalable high performance algorithm based on random graphs for constructing minimal perfect hash functions (MPHFs). For a set of n keys, our algorithm outputs a description of h in expected time O(n). The evaluation of h(x) requires three memory accesses for any key x and the description of h takes up 0.89n bytes (7.13n bits). This is the best (most space efficient) known result to date. Using a simple heuristic and Huffman coding, the space requirement is further reduced to 0.79n bytes (6.86n bits). We present a high performance architecture that is easy to parallelize and scales well to very large data sets encountered in internet search applications. Experimental results on a one billion URL dataset obtained from Live Search crawl data, show that the proposed algorithm (a)finds an MPHF for one billion URLs in less than 4 minutes, and (b) requires only 6.86 bits/key for the description of h.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要