Why simple hash functions work: exploiting the entropy in a data stream

Symposium on Discrete Algorithms(2008)

引用 157|浏览42
暂无评分
摘要
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Speciflcally, following the large body of literature on random sources and randomness extraction, we model the data as coming from a \block source," whereby each new data item has some \entropy" given the previous ones. As long as the (Renyi) entropy per data item is su-ciently large, we note that the resulting behavior when choosing a hash function from a 2-universal family is essentially the same as for a truly random hash function. We describe results for several sample applications, including linear probing, balanced allocations, and Bloom fllters.
更多
查看译文
关键词
random source,balanced allo- cations,2-universal hash function,hash function,universal hash function,data item,performance guarantee,data structure,hashing,linear probing,bloom filters,pairwise independence,randomness extractors.,new data item,data stream,random hash function,simple hash functions work,simple hash function,renyi entropy,bloom filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要