Safe Haskell | None |
---|---|
Language | Haskell2010 |
This is a rolling hash used to calculate the hashes of fixed-length sequences of characters within a given ByteString. BuzzHash is nice in that it makes this calculation very efficient.
- hashes :: WindowSize -> ByteString -> Vector Word64
- split :: WindowSize -> BlockShift -> ByteString -> (ByteString, ByteString)
- slowSplit :: WindowSize -> BlockShift -> ByteString -> (ByteString, ByteString)
- data Hash = Hash {
- windowLength :: !WindowSize
- currentVal :: !Word64
- init :: ByteString -> Hash
- roll :: WindowSize -> Word64 -> Word8 -> Word8 -> Word64
- h :: Word8 -> Word64
Documentation
:: WindowSize | How many bytes to put into each hash |
-> ByteString | The |
-> Vector Word64 |
Determine the hash of every len
sequence of bytes in the given
ByteString
. Because this uses BuzzHash, a rolling hash, this runs in O(n)
time dependent on the length of bs
, and independent of len
.
This will generate (length bs - len + 1)
64-bit hashes.
This function really doesn't serve any purpose anymore, and is just kept
around as a sanity check for the much faster split
function.
:: WindowSize | How many bytes to use when generating the pattern hash |
-> BlockShift | log2 of the desired block size |
-> ByteString | The ByteString to split |
-> (ByteString, ByteString) |
Split up a ByteString into a complete block and a remainder. If no break point is found in the given string, the first element of the resulting pair will be empty.
:: WindowSize | How many bytes to use for pattern search |
-> BlockShift | log2 of the desired block size |
-> ByteString | The ByteString to split |
-> (ByteString, ByteString) |
A representation of a hash that allows rolling hashes to be easily calculated.
Hash | |
|
init :: ByteString -> Hash Source
Create a Hash
instance using an entire ByteString
. This doesn't have
any sort of length argument to do partial ByteString
s because ByteString
supports efficient slices on its own.
roll :: WindowSize -> Word64 -> Word8 -> Word8 -> Word64 Source
Roll the Hash
to the next byte over in the ByteString
being hashed.
This doesn't do any sort of checking to ensure that old
and new
are
actually correct, so this is probably easy to mess up when using it manually.
The expected usage is that one would initialize a hash using init
on the
beginning of some ByteString
, and then to calculate the hash of each
sequence of bytes one would manually track the first and last byte of each
window on the ByteString
. hashes
does this for you...