Safe Haskell	None
Language	Haskell2010

Data.Vector.Compact.WordVec

Contents

The dynamic Word vector type
Show instance
Empty vector, singleton
Conversion to/from lists
Indexing
Head, tail, etc
Specialized operations
Generic operations
Number of bits needed

Description

Vector of (small) words which adapt their representation to make them more compact when the elements are small.

This is data structure engineered to store large amount of small vectors of small elements compactly on memory.

For example the list [1..14] :: [Int] consumes 560 bytes (14x5=70 words) on a 64 bit machine, while the corresponding WordVec takes only 16 bytes (2 words), and the one corresponding to [101..115] still only 24 bytes (3 words).

Unboxed arrays or unboxed vectors are better, as they only have a constant overhead, but those constants are big: 13 words (104 bytes on 64 bit) for unboxed arrays, and 6 words (48 bytes) for unboxed vectors. And you still have to select the number of bits per element in advance.

Some operations may be a bit slower, but hopefully the cache-friendlyness will somewhat balance that (a simple microbenchmark with Map-s indexed by [Int] vs. WordVec showed a 2x improvement in speed and 20x improvement in memory usage). In any case the primary goal here is optimized memory usage.

This module should be imported qualified (to avoid name clashes with Prelude).

TODO: ability to add user-defined (fixed-length) header, it can be potentially useful for some applications

Synopsis

newtype WordVec = WordVec Blob
data Shape = Shape {
- shapeLen :: !Int
- shapeBits :: !Int
}
vecShape :: WordVec -> Shape
vecShape' :: WordVec -> (Bool, Shape)
vecLen :: WordVec -> Int
vecBits :: WordVec -> Int
vecIsSmall :: WordVec -> Bool
showWordVec :: WordVec -> String
showsPrecWordVec :: Int -> WordVec -> ShowS
null :: WordVec -> Bool
empty :: WordVec
singleton :: Word -> WordVec
isSingleton :: WordVec -> Maybe Word
fromList :: [Word] -> WordVec
fromListN :: Int -> Word -> [Word] -> WordVec
fromList' :: Shape -> [Word] -> WordVec
toList :: WordVec -> [Word]
toRevList :: WordVec -> [Word]
unsafeIndex :: Int -> WordVec -> Word
safeIndex :: Int -> WordVec -> Maybe Word
head :: WordVec -> Word
tail :: WordVec -> WordVec
cons :: Word -> WordVec -> WordVec
uncons :: WordVec -> Maybe (Word, WordVec)
last :: WordVec -> Word
snoc :: WordVec -> Word -> WordVec
concat :: WordVec -> WordVec -> WordVec
sum :: WordVec -> Word
maximum :: WordVec -> Word
eqStrict :: WordVec -> WordVec -> Bool
eqExtZero :: WordVec -> WordVec -> Bool
cmpStrict :: WordVec -> WordVec -> Ordering
cmpExtZero :: WordVec -> WordVec -> Ordering
lessOrEqual :: WordVec -> WordVec -> Bool
partialSumsLessOrEqual :: WordVec -> WordVec -> Bool
add :: WordVec -> WordVec -> WordVec
subtract :: WordVec -> WordVec -> Maybe WordVec
scale :: Word -> WordVec -> WordVec
partialSums :: WordVec -> WordVec
fold :: (a -> Word -> a) -> a -> WordVec -> a
naiveMap :: (Word -> Word) -> WordVec -> WordVec
boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec
naiveZipWith :: (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
listZipWith :: (Word -> Word -> a) -> WordVec -> WordVec -> [a]
bitsNeededFor :: Word -> Int
bitsNeededFor' :: Word -> Int
roundBits :: Int -> Int

The dynamic Word vector type

newtype WordVec Source #

Dynamic word vectors are internally Blob-s, which the first few bits encoding their shape, and after that their content.

small vectors has 2 bits of "resolution" and 5 bits of length
big vectors has 4 bits of "resolution" and 27 bits of length

Resolution encodes the number of bits per element. The latter is always a multiple of 4 (that is: 4 bits per element, or 8, or 12, etc. up to 64 bits per element).

We use the very first bit to decide which of these two encoding we use. (if we would make a sum type instead, it would take 2 extra words...)

About the instances:

the Eq instance is strict: x == y iff toList x == toList y. For an equality which disregards trailing zeros, see eqExtZero
the Ord instance first compares the length, then if the lengths are equal, compares the content lexicographically. For a comparison which disregards the length, and lexicographically compares the sequences extended with zeros, see cmpExtZero

Constructors

WordVec Blob

Instances

Instances details

Eq WordVec Source #	The Eq instance is strict: `x == y` iff `toList x == toList y`. For an equality which disregards trailing zeros, see `eqExtZero`.
Instance details Defined in Data.Vector.Compact.WordVec Methods (==) :: WordVec -> WordVec -> Bool # (/=) :: WordVec -> WordVec -> Bool #
Ord WordVec Source #	The Ord instance first compares the length, then if the lengths are equal, compares the content lexicographically. For a different ordering, see `cmpExtZero`.
Instance details Defined in Data.Vector.Compact.WordVec Methods compare :: WordVec -> WordVec -> Ordering # (<) :: WordVec -> WordVec -> Bool # (<=) :: WordVec -> WordVec -> Bool # (>) :: WordVec -> WordVec -> Bool # (>=) :: WordVec -> WordVec -> Bool # max :: WordVec -> WordVec -> WordVec # min :: WordVec -> WordVec -> WordVec #
Show WordVec Source #
Instance details Defined in Data.Vector.Compact.WordVec Methods showsPrec :: Int -> WordVec -> ShowS # show :: WordVec -> String # showList :: [WordVec] -> ShowS #

data Shape Source #

The "shape" of a dynamic word vector

Constructors

Shape
Fields shapeLen :: !Int length of the vector shapeBits :: !Int bits per element (quantized to multiples of 4)

Instances

Instances details

Eq Shape Source #
Instance details Defined in Data.Vector.Compact.WordVec Methods (==) :: Shape -> Shape -> Bool # (/=) :: Shape -> Shape -> Bool #
Show Shape Source #
Instance details Defined in Data.Vector.Compact.WordVec Methods showsPrec :: Int -> Shape -> ShowS # show :: Shape -> String # showList :: [Shape] -> ShowS #

vecShape :: WordVec -> Shape Source #

vecShape' :: WordVec -> (Bool, Shape) Source #

vecShape' vec == (vecIsSmall vec , vecShape vec)

vecLen :: WordVec -> Int Source #

The length of the vector

vecBits :: WordVec -> Int Source #

The number of bits per element used to encode the vector

vecIsSmall :: WordVec -> Bool Source #

True if the internal representation is the "small" one

Show instance

showWordVec :: WordVec -> String Source #

showsPrecWordVec :: Int -> WordVec -> ShowS Source #

Empty vector, singleton

null :: WordVec -> Bool Source #

empty :: WordVec Source #

singleton :: Word -> WordVec Source #

isSingleton :: WordVec -> Maybe Word Source #

Conversion to/from lists

fromList :: [Word] -> WordVec Source #

fromListN Source #

Arguments

:: Int	length
-> Word	maximum (or just an upper bound)
-> [Word]	elements
-> WordVec

This is faster than fromList

fromList' :: Shape -> [Word] -> WordVec Source #

If you know the shape in advance, it's faster to use this function

toList :: WordVec -> [Word] Source #

toRevList :: WordVec -> [Word] Source #

toRevList vec == reverse (toList vec), but should be faster (?)

Indexing

unsafeIndex :: Int -> WordVec -> Word Source #

No boundary check is done. Indexing starts from 0.

safeIndex :: Int -> WordVec -> Maybe Word Source #

Head, tail, etc

head :: WordVec -> Word Source #

Note: For the empty vector, head returns 0

tail :: WordVec -> WordVec Source #

Note: For the empty vector, tail returns (another) empty vector

cons :: Word -> WordVec -> WordVec Source #

Prepends an element

uncons :: WordVec -> Maybe (Word, WordVec) Source #

last :: WordVec -> Word Source #

Note: For the empty vector, last returns 0

snoc :: WordVec -> Word -> WordVec Source #

Appends an element

concat :: WordVec -> WordVec -> WordVec Source #

Specialized operations

These are are faster than the generic operations below, and should be preferred to those.

sum :: WordVec -> Word Source #

Sum of the elements of the vector

maximum :: WordVec -> Word Source #

Maximum of the elements of the vector

Specialized "zipping folds"

eqStrict :: WordVec -> WordVec -> Bool Source #

Strict equality of vectors (same length, same content)

eqExtZero :: WordVec -> WordVec -> Bool Source #

Equality of vectors extended with zeros to infinity

cmpStrict :: WordVec -> WordVec -> Ordering Source #

Strict comparison of vectors (first compare the lengths; if the lengths are the same then compare lexicographically)

cmpExtZero :: WordVec -> WordVec -> Ordering Source #

Lexicographic ordering of vectors extended with zeros to infinity

lessOrEqual :: WordVec -> WordVec -> Bool Source #

Pointwise comparison of vectors extended with zeros to infinity

partialSumsLessOrEqual :: WordVec -> WordVec -> Bool Source #

Pointwise comparison of partial sums of vectors extended with zeros to infinity

For example [x1,x2,x3] <= [y1,y2,y3] iff (x1 <=y1 && x1+x2 <= y1+y2 && x1+x2+x3 <= y1+y2+y3).

Specialized zips

add :: WordVec -> WordVec -> WordVec Source #

Pointwise addition of vectors. The shorter one is extended by zeros.

subtract :: WordVec -> WordVec -> Maybe WordVec Source #

Pointwise subtraction of vectors. The shorter one is extended by zeros. If any element would become negative, we return Nothing

Specialized maps

scale :: Word -> WordVec -> WordVec Source #

Pointwise multiplication by a constant.

Specialized scans

partialSums :: WordVec -> WordVec Source #

toList (partialSums vec) == tail (scanl (+) 0 $ toList vec)

Generic operations

fold :: (a -> Word -> a) -> a -> WordVec -> a Source #

Left fold

naiveMap :: (Word -> Word) -> WordVec -> WordVec Source #

boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec Source #

If you have a (nearly sharp) upper bound to the result of your of function on your vector, mapping can be more efficient

naiveZipWith :: (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec Source #

boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec Source #

If you have a (nearly sharp) upper bound to the result of your of function on your vector, zipping can be more efficient

listZipWith :: (Word -> Word -> a) -> WordVec -> WordVec -> [a] Source #

Number of bits needed

bitsNeededFor :: Word -> Int Source #

Number of bits needed to encode a given number, rounded up to multiples of four

bitsNeededFor' :: Word -> Int Source #

Number of bits needed to encode a given number

roundBits :: Int -> Int Source #

We only allow multiples of 4.