futhark-0.25.15: An optimising compiler for a functional, array-oriented language.
Safe HaskellSafe-Inferred
LanguageGHC2021

Futhark.CodeGen.ImpGen.GPU.SegHist

Description

Our compilation strategy for SegHist is based around avoiding bin conflicts. We do this by splitting the input into chunks, and for each chunk computing a single subhistogram. Then we combine the subhistograms using an ordinary segmented reduction (SegRed).

There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.

We also use a heuristic strategy for computing subhistograms in shared memory when possible. Given:

H: total size of histograms in bytes, including any lock arrays.

G: block size

T: number of bytes of shared memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).

LMAX: maximum amount of shared memory per threadblock (hard limit).

We wish to compute:

COOP: cooperation level (number of threads per subhistogram)

LH: number of shared memory subhistograms

We do this as:

COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use shared memory else use global memory

Synopsis

Documentation

compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen () Source #

Generate code for a segmented histogram called from the host.