Safe Haskell | None |
---|---|
Language | Haskell2010 |
Vector of (small) words which adapt their representation to make them more compact when the elements are small.
This is data structure engineered to store large amount of small vectors of small elements compactly on memory.
For example the list [1..14] :: [Int]
consumes 576 bytes (72 words) on
a 64 bit machine, while the corresponding WordVec
takes only
16 bytes (2 words), and the one corresponding to [101..115]
still only
24 bytes (3 words).
Unboxed arrays or unboxed vectors are better, as they only have a constant overhead, but those constants are big: 13 words (104 bytes on 64 bit) for unboxed arrays, and 6 words (48 bytes) for unboxed vectors. And you still have to select the number of bits per element in advance.
Some operations may be a bit slower, but hopefully the cache-friendlyness
will somewhat balance that (a simple microbenchmark with Map
-s
indexed by [Int]
vs. WordVec
showed a 2x improvement in speed and
20x improvement in memory usage). In any case the primary goal
here is optimized memory usage.
TODO: ability to add user-defined (fixed-length) header, it can be useful for some applications
Synopsis
- newtype WordVec = WordVec Blob
- data Shape = Shape {}
- vecShape :: WordVec -> Shape
- vecShape' :: WordVec -> (Bool, Shape)
- vecIsSmall :: WordVec -> Bool
- vecLen :: WordVec -> Int
- vecBits :: WordVec -> Int
- showWordVec :: WordVec -> String
- showsPrecWordVec :: Int -> WordVec -> ShowS
- empty :: WordVec
- null :: WordVec -> Bool
- unsafeIndex :: Int -> WordVec -> Word
- safeIndex :: Int -> WordVec -> Maybe Word
- head :: WordVec -> Word
- toList :: WordVec -> [Word]
- toList_naive :: WordVec -> [Word]
- fromList :: [Word] -> WordVec
- fromList' :: Shape -> [Word] -> WordVec
- naiveMap :: (Word -> Word) -> WordVec -> WordVec
- boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec
- concat :: WordVec -> WordVec -> WordVec
- naiveZipWith :: (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
- boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
- bitsNeededFor :: Word -> Int
The dynamic Word vector type
Dynamic word vectors are internally Blob
-s, which the first few bits
encoding their shape, and after that their content.
- small vectors has 2 bits of "resolution" and 5 bits of length
- big vectors has 4 bits of "resolution" and 27 bits of length
Resolution encodes the number of bits per element. The latter is always a multiple of 4 (that is: 4 bits per element, or 8, or 12, etc. up to 64 bits per element).
We use the very first bit to decide which of these two encoding we use. (if we would make a sum type instead, it would take 2 extra words...)
The "shape" of a dynamic word vector
vecIsSmall :: WordVec -> Bool Source #
Instances
showWordVec :: WordVec -> String Source #
Empty vectors
Indexing
Conversion to/from lists
toList_naive :: WordVec -> [Word] Source #
Another implementation of toList
, for testing purposes only
Some more operations
boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec Source #
If you have a (nearly sharp) upper bound to the result of your of function on your vector, mapping can be more efficient
boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec Source #
If you have a (nearly sharp) upper bound to the result of your of function on your vector, zipping can be more efficient
Misc helpers
bitsNeededFor :: Word -> Int Source #