hashtables-1.2.0.1: Mutable hash tables in the ST monad

Safe HaskellNone
LanguageHaskell98

Data.HashTable.ST.Cuckoo

Description

A hash table using the cuckoo strategy. (See http://en.wikipedia.org/wiki/Cuckoo_hashing). Use this hash table if you...

  • want the fastest possible inserts, and very fast lookups.
  • are conscious of memory usage; this table has less space overhead than Data.HashTable.ST.Basic or Data.HashTable.ST.Linear.
  • don't care that a table resize might pause for a long time to rehash all of the key-value mappings.

Details:

The basic idea of cuckoo hashing, first introduced by Pagh and Rodler in 2001, is to use d hash functions instead of only one; in this implementation d=2 and the strategy we use is to split up a flat array of slots into k buckets, each cache-line-sized:

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
|x0|x1|x2|x3|x4|x5|x6|x7|y0|y1|y2|y3|y4|y5|y6|y7|z0|z1|z2........|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
[  ^^^  bucket 0  ^^^  ][  ^^^  bucket 1  ^^^  ]...

There are actually three parallel arrays: one unboxed array of Ints for hash codes, one boxed array for keys, and one boxed array for values. When looking up a key-value mapping, we hash the key using two hash functions and look in both buckets in the hash code array for the key. Each bucket is cache-line sized, with its keys in no particular order. Because the hash code array is unboxed, we can search it for the key using a highly-efficient branchless strategy in C code, using SSE instructions if available.

On insert, if both buckets are full, we knock out a randomly-selected entry from one of the buckets (using a random walk ensures that "key cycles" are broken with maximum probability) and try to repeat the insert procedure. This process may not succeed; if all items have not successfully found a home after some number of tries, we give up and rehash all of the elements into a larger table.

Space overhead: experimental results

The implementation of cuckoo hash given here is almost as fast for lookups as the basic open-addressing hash table using linear probing, and on average is more space-efficient: in randomized testing on my 64-bit machine (see test/compute-overhead/ComputeOverhead.hs in the source distribution), mean overhead is 0.77 machine words per key-value mapping, with a standard deviation of 0.29 words, and 1.23 words per mapping at the 95th percentile.

References:

  • A. Pagh and F. Rodler. Cuckoo hashing. In /Proceedings of the 9th Annual European Symposium on Algorithms/, pp. 121-133, 2001.

Synopsis

Documentation

data HashTable s k v Source

A cuckoo hash table.

Instances

new :: ST s (HashTable s k v) Source

See the documentation for this function in Data.HashTable.Class.

newSized :: Int -> ST s (HashTable s k v) Source

See the documentation for this function in Data.HashTable.Class.

delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s () Source

See the documentation for this function in Data.HashTable.Class.

lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source

See the documentation for this function in Data.HashTable.Class.

insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source

See the documentation for this function in Data.HashTable.Class.

mapM_ :: ((k, v) -> ST s a) -> HashTable s k v -> ST s () Source

See the documentation for this function in Data.HashTable.Class.

foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source

See the documentation for this function in Data.HashTable.Class.