Copyright | Copyright © 2019 Lars Kuhtz <lakuhtz@gmail.com> |
---|---|
License | BSD3 |
Maintainer | Lars Kuhtz <lakuhtz@gmail.com> |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
Haskell implementation of Cuckoo filters as described in
Cuckoo filters are a data structure for probabilistic set membership. They support insertion, deletion, and membership queries for set elements.
Membership queries may return false positive results. But queries don't return false negative results.
Unlike Bloom filters, Cuckoo filters maintain an upper bound on the false positive rate that is independent of the load of the filter. However, insertion of new elements in the filter can fail. For typical configurations this probability is very small for load factors smaller than 90 percent.
Synopsis
- newtype Salt = Salt Int
- class CuckooFilterHash a where
- cuckooHash :: Salt -> a -> Word64
- cuckooFingerprint :: Salt -> a -> Word64
- sip :: Storable a => Int -> a -> Word64
- sip_bytes :: ByteArrayAccess a => Int -> a -> Word64
- fnv1a :: Storable a => Int -> a -> Word64
- fnv1a_bytes :: ByteArrayAccess a => Int -> a -> Word64
- data CuckooFilter s (b :: Nat) (f :: Nat) (a :: Type)
- type CuckooFilterIO b f a = CuckooFilter RealWorld b f a
- newCuckooFilter :: forall m b f a. KnownNat b => KnownNat f => PrimMonad m => Salt -> Natural -> m (CuckooFilter (PrimState m) b f a)
- insert :: forall b f a m. KnownNat f => KnownNat b => PrimMonad m => CuckooFilterHash a => CuckooFilter (PrimState m) b f a -> a -> m Bool
- member :: CuckooFilterHash a => PrimMonad m => KnownNat f => KnownNat b => CuckooFilter (PrimState m) b f a -> a -> m Bool
- delete :: CuckooFilterHash a => PrimMonad m => KnownNat f => KnownNat b => CuckooFilter (PrimState m) b f a -> a -> m Bool
- sizeInAllocatedBytes :: forall b f a s. KnownNat f => KnownNat b => CuckooFilter s b f a -> Int
- capacityInItems :: forall b f a s. KnownNat b => CuckooFilter s b f a -> Int
- itemCount :: forall b f a m. PrimMonad m => KnownNat b => KnownNat f => CuckooFilter (PrimState m) b f a -> m Int
- loadFactor :: forall b f a m. PrimMonad m => KnownNat b => KnownNat f => CuckooFilter (PrimState m) b f a -> m Double
- showFilter :: forall b f a. KnownNat f => KnownNat b => CuckooFilter RealWorld b f a -> IO [[String]]
- itemHashes :: forall b f a s. KnownNat f => CuckooFilterHash a => CuckooFilter s b f a -> a -> (Int, Int, Word64)
Hash Functions
Salt for hash computations.
class CuckooFilterHash a where Source #
Choosing good hash functions is imperative for a good performance of a cuckoo filter. The hash functions must be
- independent and
- provide good uniformity on the lower bits of the output.
The default implementations use sip hash for cuckooHash
and fnv1a
(64
bit) for cuckooFingerprint
.
Nothing
cuckooHash :: Salt -> a -> Word64 Source #
This function must provide good entropy on the lower \(2^b - 1\) bits of the result, where \(b\) is the number of buckets.
cuckooFingerprint :: Salt -> a -> Word64 Source #
This function must provide good entropy on the lower bits of the size of a fingerprint.
cuckooHash :: Storable a => Salt -> a -> Word64 Source #
This function must provide good entropy on the lower \(2^b - 1\) bits of the result, where \(b\) is the number of buckets.
cuckooFingerprint :: Storable a => Salt -> a -> Word64 Source #
This function must provide good entropy on the lower bits of the size of a fingerprint.
Hash functions
Computes a Sip hash for a value that has an Storable
instance.
The first argument is a salt value that is used to derive the key for the hash computation.
:: ByteArrayAccess a | |
=> Int | Salt |
-> a | Value that is hashes |
-> Word64 |
Computes a Sip hash for a value that is an instance of
ByteArrayAccess
.
The first argument is a salt value that is used to derive the key for the hash computation.
Computes a 64 bit Fnv1a hash for a value that has an Storable
instance.
The first argument is use as a salt.
:: ByteArrayAccess a | |
=> Int | Salt |
-> a | Value that is hashes |
-> Word64 |
Computes a 64 bit Fnv1a hash for a value that is an instance of
ByteArrayAccess
.
The first argument is use as a salt.
Cuckoo Filter
data CuckooFilter s (b :: Nat) (f :: Nat) (a :: Type) Source #
Cuckoo Filter with
- State token
s :: Type
, - bucket size
b :: Nat
, - fingerprint size
f :: Nat
, and - content type
a :: Type
.
The following constraints apply
- \(0 < f \leq 32\)
- \(0 < b\)
The implementation is not thread safe. For concurrent use the filter must be wrapped in a read-write lock.
type CuckooFilterIO b f a = CuckooFilter RealWorld b f a Source #
Cuckoo filter that can be used in the IO
monad.
:: KnownNat b | |
=> KnownNat f | |
=> PrimMonad m | |
=> Salt | Salt for the hash functions |
-> Natural | Size (must be positive) |
-> m (CuckooFilter (PrimState m) b f a) |
Create a new Cuckoo filter that has at least the given capacity.
The type parameters are
- bucket size
b :: Nat
, - fingerprint size
f :: Nat
, - content type
a :: Type
, and - Monad
m :: Type -> Type
,
Enabling the TypeApplications
language extension provides a convenient way
for passing the type parameters to the function.
The following constraints apply:
- \(0 < f \leq 32\),
- \(0 < b\).
The false positive rate depends mostly on the value of f
. It is bounded
from above by \(\frac{2b}{2^f}\). In most cases 4
is a good choice for b
.
Actual performance depends on the choice of good hash functions that provide high uniformity on the lower bits.
The actual capacity may be much larger than what is requested, because the actual bucket count is a power of two.
Cuckoo Filter Operations
insert :: forall b f a m. KnownNat f => KnownNat b => PrimMonad m => CuckooFilterHash a => CuckooFilter (PrimState m) b f a -> a -> m Bool Source #
Insert an item into the filter and return whether the operation was successful. If insertion fails, the filter is unchanged.
This function is not thread safe. No concurrent writes or reads should occur while this function is executed. If this is needed a lock must be used.
This function is not exception safe. The filter must not be used any more after an asynchronous exception has been throw during the computation of this function. If this function is used in the presence of asynchronous exceptions it should be apprioriately masked.
member :: CuckooFilterHash a => PrimMonad m => KnownNat f => KnownNat b => CuckooFilter (PrimState m) b f a -> a -> m Bool Source #
Test whether an item is in the set that is represented by the Cuckoo filter.
A negative result means that the item is definitively not in the set. A
positive result means that the item is most likely in the set. The rate of
false positives is bounded from above by \(\frac{2b}{2^f}\) where b
is the number
of items per bucket and f
is the size of a fingerprint in bits.
delete :: CuckooFilterHash a => PrimMonad m => KnownNat f => KnownNat b => CuckooFilter (PrimState m) b f a -> a -> m Bool Source #
Delete an items from the filter.
IMPORTANT An item must only be deleted if it was successfully added to the filter before (and hasn't been deleted since then).
Deleting an item that isn't in the filter will result in the filter returning false negative results.
This function is not thread safe. No concurrent writes must occur while this function is executed. If this is needed a lock must be used. Concurrent reads are fine.
Utils
sizeInAllocatedBytes :: forall b f a s. KnownNat f => KnownNat b => CuckooFilter s b f a -> Int Source #
The total number of bytes allocated for storing items in the filter.
capacityInItems :: forall b f a s. KnownNat b => CuckooFilter s b f a -> Int Source #
Total number of items that the filter can hold. In practice a load factor of ~95% of this number can be reached.
itemCount :: forall b f a m. PrimMonad m => KnownNat b => KnownNat f => CuckooFilter (PrimState m) b f a -> m Int Source #
Number of items currently stored in the filter.
Note that computing this number is expensive \(\mathcal{O}(n)\).
loadFactor :: forall b f a m. PrimMonad m => KnownNat b => KnownNat f => CuckooFilter (PrimState m) b f a -> m Double Source #
The current load factor of the filter in percent.
loadFactor f = 100 * itemCount f / capacityInItems
Note that computing this number is expensive \(\mathcal{O}(n)\).
Debugging Utils
showFilter :: forall b f a. KnownNat f => KnownNat b => CuckooFilter RealWorld b f a -> IO [[String]] Source #
Show the contents of the filter as a list of buckets with values show in hex. Used for debugging purposes.
itemHashes :: forall b f a s. KnownNat f => CuckooFilterHash a => CuckooFilter s b f a -> a -> (Int, Int, Word64) Source #
Returns the different hashes that are associated with an item in the filter. Used for debugging purposes.