hyperloglogplus-0.1.0.0: Approximate cardinality estimation using constant space

Safe HaskellNone
LanguageHaskell2010

Data.HyperLogLogPlus

Synopsis

Documentation

data HyperLogLogPlus p k

HyperLogLog++ cardinality estimation paired with MinHash for intersection estimation

  • p - precision of HLL structure
  • k - precision of MinHash structure (max size)

Create new counter:

>>> :set -XDataKinds
>>> :load Data.HyperLogLogPlus
>>> type HLL = HyperLogLogPlus 12 8192
>>> mempty :: HLL
HyperLogLogPlus [ p = 12 k = 8192 ] [ minSet size = 0 ]

HyperLogLogPlus and MinHash precisions are specified in a type. HLL precision p should be between 4 and 18, starting from 10 for good accuracy.

MinHash precision k ideally should be greater or equal 8192 for decent intersection estimation.

Estimating number of unique items:

>>> size (foldr insert mempty [1 .. 75000] :: HLL)
75090

Combine multiple counters:

>>> size $ (foldr insert mempty [1 .. 5000] ::  HLL) <> (foldr insert mempty [3000 .. 10000] :: HLL)
10044

Compute estimated set intersection:

>>> intersection $ [(foldr insert mempty [1 .. 15000] ::  HLL), (foldr insert mempty [12000 .. 20000] :: HLL)]
3100

Instances

size :: forall p k. (KnownNat p, KnownNat k) => HyperLogLogPlus p k -> Word64

Compute estimated size of HyperLogLogPlus. If number of inserted values is smaller than MinHash precision this will return exact value

insert :: forall p k a. (KnownNat p, KnownNat k, Hashable64 a) => a -> HyperLogLogPlus p k -> HyperLogLogPlus p k

Insert hashable value

insertHash :: forall p k. (KnownNat p, KnownNat k) => Hash64 -> HyperLogLogPlus p k -> HyperLogLogPlus p k

Insert already hashed value

intersection :: forall p k. (KnownNat p, KnownNat k) => [HyperLogLogPlus p k] -> Word64

Returns an estimate of the size of the intersection of the given HyperLogLogPlus objects

cast :: forall p1 k1 p2 k2. (KnownNat p1, KnownNat k1, KnownNat p2, KnownNat k2, 4 <= p2, p2 <= 18) => HyperLogLogPlus p1 k1 -> Maybe (HyperLogLogPlus p2 k2)

Cast HyperLogLogPlus to new precision levels

  1. New HLL precision should less or equal to old one
  2. New MinHash precision has to be less or equal to old one, or it can be larger, but only if number of inserted hashes in old structure is smaller than old precision (size limit)