hw-rankselect-base
Rank and select operations.
This library will use support for some BMI2 CPU instructions on some x86 based
CPUs if compiled with the appropriate flags on ghc-8.4.1
or later.
Rank and select
This library provides the following functions on various types:
rank1
rank0
select1
select0
Type class instances are provided for the following primitive types:
Bool
Word8
Word16
Word32
Word64
Moreover additional type class instances are provided for []
, Vector
from both Data.Vector
, and Data.Vector.Storable
of these primitive
types.
Examples
Check the convenience imports in the project's .ghci
file.
Run the repl in convenience script (uses stack).
$ ./run-stack.sh repl
Then create a rank-select bit-string of the desired type:
λ> let bs = fromJust $ bitRead "0001001001100001000001000110101000101000" :: Word64
"00010010 01100001 00000100 01101010 00101000 00000000 00000000 00000000"
Call the rank-select operations on the bit-string
λ> rank1 bs 20
1
λ> select1 bs 4
11
Vector indexing conventions
This library follows standard 1-based counting conventions typically found in
Computer Science literature where select1 10 2 = 4
as illustrated here:
8 7 6 5 [4]3 2 1
0 0 0 0 1 0 1 0
The standard convention for the bmi2
implementation, comes at a small cost.
An internal function select1Word64Bmi2Base0
demonstrates 0-based counting
that is slightly faster when implemented with the bmi2
instruction set where
select1 10 1 = 3
as illustrated here:
7 6 5 4 [3]2 1 0
0 0 0 0 1 0 1 0
The word-vector-based type classes instances are not intended to be used
in high-performance code because where random-access on large bit-vectors
are needed because they have poor performance due to having to do a linear
scan.
For smaller bit-vectors that fit on one page of memory, they do quite well.
In fact, the hw-dsv library
uses them for small vectors.
Bit-vectors larger than say 4096-bits need indexing to achieve reasonable
random-access performance.
An indexed bit-vector implementation can found in the
hw-rankselect package.
Architecture notes
This library has only been tested on little-endian CPU architectures.
Anyone wishing to use this on big-endian CPU architectures will need to
confirm that this works properly.
Compilation
It is sufficient to build, test and benchmark the library as follows
for emulated behaviour:
stack build
stack test
stack bench
To target the BMI2 instruction set, add the bmi2
flag:
stack build --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2
stack test --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2
stack bench --flag bits-extra:bmi2 --flag hw-rankselect-base:bmi2
Benchmark results
The following benchmark shows the kinds of performance gain that can
be expected from enabling the BMI2 instruction set for CPU targets
that support them:
benchmarking 64-bit/Once: Select1 Broadword
time 14.75 ns (14.63 ns .. 14.90 ns)
0.996 R² (0.987 R² .. 0.999 R²)
mean 15.35 ns (14.92 ns .. 16.70 ns)
std dev 2.355 ns (607.2 ps .. 4.849 ns)
variance introduced by outliers: 96% (severely inflated)
benchmarking 64-bit/Once: Select1 Bmi2
time 6.026 ns (5.933 ns .. 6.134 ns)
0.999 R² (0.998 R² .. 0.999 R²)
mean 6.024 ns (5.966 ns .. 6.096 ns)
std dev 224.4 ps (176.9 ps .. 318.6 ps)
variance introduced by outliers: 62% (severely inflated)
benchmarking 32-bit/Once: Select1 Broadword
time 26.09 ns (25.84 ns .. 26.40 ns)
0.999 R² (0.998 R² .. 0.999 R²)
mean 26.67 ns (26.37 ns .. 27.01 ns)
std dev 1.017 ns (848.4 ps .. 1.291 ns)
variance introduced by outliers: 61% (severely inflated)
benchmarking 32-bit/Once: Select1 Bmi2
time 8.613 ns (8.543 ns .. 8.687 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 8.592 ns (8.515 ns .. 8.671 ns)
std dev 248.3 ps (216.2 ps .. 294.8 ps)
variance introduced by outliers: 48% (moderately inflated)