Safe Haskell	None
Language	Haskell98

Data.HashTable.ST.Cuckoo

Description

A hash table using the cuckoo strategy. (See http://en.wikipedia.org/wiki/Cuckoo_hashing). Use this hash table if you...

want the fastest possible inserts, and very fast lookups.
are conscious of memory usage; this table has less space overhead than Data.HashTable.ST.Basic or Data.HashTable.ST.Linear.
don't care that a table resize might pause for a long time to rehash all of the key-value mappings.

Details:

The basic idea of cuckoo hashing, first introduced by Pagh and Rodler in 2001, is to use d hash functions instead of only one; in this implementation d=2 and the strategy we use is to split up a flat array of slots into k buckets, each cache-line-sized:

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
|x0|x1|x2|x3|x4|x5|x6|x7|y0|y1|y2|y3|y4|y5|y6|y7|z0|z1|z2........|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
[  ^^^  bucket 0  ^^^  ][  ^^^  bucket 1  ^^^  ]...

There are actually three parallel arrays: one unboxed array of Ints for hash codes, one boxed array for keys, and one boxed array for values. When looking up a key-value mapping, we hash the key using two hash functions and look in both buckets in the hash code array for the key. Each bucket is cache-line sized, with its keys in no particular order. Because the hash code array is unboxed, we can search it for the key using a highly-efficient branchless strategy in C code, using SSE instructions if available.

On insert, if both buckets are full, we knock out a randomly-selected entry from one of the buckets (using a random walk ensures that "key cycles" are broken with maximum probability) and try to repeat the insert procedure. This process may not succeed; if all items have not successfully found a home after some number of tries, we give up and rehash all of the elements into a larger table.

Space overhead: experimental results

The implementation of cuckoo hash given here is almost as fast for lookups as the basic open-addressing hash table using linear probing, and on average is more space-efficient: in randomized testing on my 64-bit machine (see test/compute-overhead/ComputeOverhead.hs in the source distribution), mean overhead is 0.77 machine words per key-value mapping, with a standard deviation of 0.29 words, and 1.23 words per mapping at the 95th percentile.

References:

A. Pagh and F. Rodler. Cuckoo hashing. In /Proceedings of the 9th Annual European Symposium on Algorithms/, pp. 121-133, 2001.

Synopsis

data HashTable s k v
new :: ST s (HashTable s k v)
newSized :: Int -> ST s (HashTable s k v)
delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()
lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)
insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a
mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a
mapM_ :: ((k, v) -> ST s a) -> HashTable s k v -> ST s ()
foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a
lookupIndex :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s (Maybe Word)
nextByIndex :: HashTable s k v -> Word -> ST s (Maybe (Word, k, v))

Documentation

data HashTable s k v Source #

A cuckoo hash table.

Instances

HashTable HashTable Source #
Instance details Defined in Data.HashTable.ST.Cuckoo Methods new :: ST s (HashTable s k v) Source # newSized :: Int -> ST s (HashTable s k v) Source # mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a Source # mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a Source # insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source # delete :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s () Source # lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source # foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source # mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s () Source # lookupIndex :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe Word) Source # nextByIndex :: HashTable s k v -> Word -> ST s (Maybe (Word, k, v)) Source # computeOverhead :: HashTable s k v -> ST s Double Source #
Show (HashTable s k v) Source #
Instance details Defined in Data.HashTable.ST.Cuckoo Methods showsPrec :: Int -> HashTable s k v -> ShowS # show :: HashTable s k v -> String # showList :: [HashTable s k v] -> ShowS #