cuckoo-filter: Pure and impure Cuckoo Filter

[ data, library, mit, program ] [ Propose Tags ]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.2.0.1, 0.2.0.2
Change log ChangeLog.md
Dependencies aeson, base (>=4.7 && <5), cereal, containers, criterion, cuckoo-filter, hashable, random [details]
License MIT
Copyright 2018 Chris Coffey
Author Chris Coffey
Maintainer chris@foldl.io
Category Data
Home page https://github.com/ChrisCoffey/cuckoo-filter#readme
Bug tracker https://github.com/ChrisCoffey/cuckoo-filter/issues
Source repo head: git clone https://github.com/ChrisCoffey/cuckoo-filter
Uploaded by ChrisCoffey at 2018-10-28T02:20:22Z
Distributions
Executables benchmarks
Downloads 2817 total (23 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-10-28 [all 1 reports]

Readme for cuckoo-filter-0.1.0.1

[back to package description]

cuckoo-filter

HackageBuild Status

Cuckoo filters are a probabilistic data structure used to answer questions like "Have I already seen this user" or "Is this word in the English language?". They're probabilistic because each membership operation has a false positive probability. It guarnatees that there will never be a false negative, but may have a low chance of false positives.

Bloom filters are the cannonical probabilistic filter structure, and cuckoo filters are a simlar but different tool. As a bloom filter's load factor increases, the chance of false positive trends towards 100%, but the inserts will never fail. On the other hand, a Cuckoo filter retains a relatively stable false positive probability under load, but as load approahes 95% inserts will begin to fail. In either case you probably want to resize your filter...

This implementation has the following properties:

  • Buckets of 4 elements
  • 8 bit fingerprints
  • Cycle termination during item kicking occurs after (0.1 * size) buckets have been checked.
  • Size may be any non-zero natural number (not limited to powers of 2)

For more details about how Cuckoo filters work, I recommend you read Fan et. al.'s 2016 paper https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf.

Usage

Cuckoo filters support three operations: insert, member, and delete. See the haddocks for details.

Performance

As you'll find in the criterion results, the pure version of the filter can handle ~1.6 million insertions/s. From memory profiles, the vast majority of the memory is taken up by the underlying implementation of Filter, so this is an obvious area for improvement.

The current implementation avoids pre-allocating memory for the filter, so the heap usage will incrase linearly with insert calls. This obviously helps keep heap usage low for sparse filters, but also means inserts are slower than they would be in a mutable implementation.

TODO

  • Benchmark against a Bloom filter implementation
  • Introduce a mutable version of Filter and a typeclass for the storage interactions