futhark-0.21.1: An optimising compiler for a functional, array-oriented language.
Safe HaskellNone
LanguageHaskell2010

Futhark.CodeGen.ImpGen.GPU.Base

Synopsis

Documentation

data Locks Source #

Information about the locks available for accumulators.

Constructors

Locks 

data Target Source #

Which target are we ultimately generating code for? While most of the kernels code is the same, there are some cases where we generate special code based on the ultimate low-level API we are targeting.

Constructors

CUDA 
OpenCL 

sReplicate :: VName -> SubExp -> CallKernelGen () Source #

Perform a Replicate with a kernel.

sIota :: VName -> TExp Int64 -> Exp -> Exp -> IntType -> CallKernelGen () Source #

Perform an Iota with a kernel.

virtualiseGroups :: SegVirt -> TExp Int32 -> (TExp Int32 -> InKernelGen ()) -> InKernelGen () Source #

For many kernels, we may not have enough physical groups to cover the logical iteration space. Some groups thus have to perform double duty; we put an outer loop to accomplish this. The advantage over just launching a bazillion threads is that the cost of memory expansion should be proportional to the number of *physical* threads (hardware parallelism), not the amount of application parallelism.

groupLoop :: TExp Int64 -> (TExp Int64 -> InKernelGen ()) -> InKernelGen () Source #

Assign iterations of a for-loop to threads in the workgroup. The passed-in function is invoked with the (symbolic) iteration. For multidimensional loops, use groupCoverSpace.

kernelLoop :: IntExp t => TExp t -> TExp t -> TExp t -> (TExp t -> InKernelGen ()) -> InKernelGen () Source #

Assign iterations of a for-loop to all threads in the kernel. The passed-in function is invoked with the (symbolic) iteration. threadOperations will be in effect in the body. For multidimensional loops, use groupCoverSpace.

groupCoverSpace :: [TExp Int64] -> ([TExp Int64] -> InKernelGen ()) -> InKernelGen () Source #

Iterate collectively though a multidimensional space, such that all threads in the group participate. The passed-in function is invoked with a (symbolic) point in the index space.

atomicUpdateLocking :: AtomicBinOp -> Lambda GPUMem -> AtomicUpdate GPUMem KernelEnv Source #

Do an atomic update corresponding to a binary operator lambda.

type AtomicBinOp = BinOp -> Maybe (VName -> VName -> Count Elements (TExp Int64) -> Exp -> AtomicOp) Source #

Is there an atomic BinOp corresponding to this BinOp?

data Locking Source #

Locking strategy used for an atomic update.

Constructors

Locking 

Fields

data AtomicUpdate rep r Source #

The mechanism that will be used for performing the atomic update. Approximates how efficient it will be. Ordered from most to least efficient.

Constructors

AtomicPrim (DoAtomicUpdate rep r)

Supported directly by primitive.

AtomicCAS (DoAtomicUpdate rep r)

Can be done by efficient swaps.

AtomicLocking (Locking -> DoAtomicUpdate rep r)

Requires explicit locking.

type DoAtomicUpdate rep r = Space -> [VName] -> [TExp Int64] -> ImpM rep r KernelOp () Source #

A function for generating code for an atomic update. Assumes that the bucket is in-bounds.