Copyright | [2009..2017] Trevor L. McDonell |
---|---|
License | BSD |
Safe Haskell | None |
Language | Haskell98 |
Kernel execution control for low-level driver interface
- newtype Fun = Fun (Ptr ())
- data FunParam where
- data FunAttribute
- data SharedMem
- requires :: Fun -> FunAttribute -> IO Int
- setCacheConfigFun :: Fun -> Cache -> IO ()
- setSharedMemConfigFun :: Fun -> SharedMem -> IO ()
- launchKernel :: Fun -> (Int, Int, Int) -> (Int, Int, Int) -> Int -> Maybe Stream -> [FunParam] -> IO ()
- launchKernel' :: Fun -> (Int, Int, Int) -> (Int, Int, Int) -> Int -> Maybe Stream -> [FunParam] -> IO ()
- setBlockShape :: Fun -> (Int, Int, Int) -> IO ()
- setSharedSize :: Fun -> Integer -> IO ()
- setParams :: Fun -> [FunParam] -> IO ()
- launch :: Fun -> (Int, Int) -> Maybe Stream -> IO ()
Kernel Execution
data FunAttribute Source #
Function attributes
MaxKernelThreadsPerBlock | |
SharedSizeBytes | |
ConstSizeBytes | |
LocalSizeBytes | |
NumRegs | |
PtxVersion | |
BinaryVersion | |
CacheModeCa | |
CU_FUNC_ATTRIBUTE_MAX |
Enum FunAttribute Source # | Kernel function parameters |
Eq FunAttribute Source # | |
Show FunAttribute Source # | |
Device shared memory configuration preference
requires :: Fun -> FunAttribute -> IO Int Source #
Returns the value of the selected attribute requirement for the given kernel.
setCacheConfigFun :: Fun -> Cache -> IO () Source #
On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.
Switching between configuration modes may insert a device-side synchronisation point for streamed kernel launches.
Requires CUDA-3.0.
setSharedMemConfigFun :: Fun -> SharedMem -> IO () Source #
Set the shared memory configuration of a device function.
On devices with configurable shared memory banks, this will force all subsequent launches of the given device function to use the specified shared memory bank size configuration. On launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function configuration. Changes in shared memory configuration may introduction a device side synchronisation between kernel launches.
Any per-function configuration specified by setSharedMemConfig
will
override the context-wide configuration set with
setSharedMem
.
Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.
This function will do nothing on devices with fixed shared memory bank size.
Requires CUDA-5.0.
:: Fun | function to execute |
-> (Int, Int, Int) | block grid dimension |
-> (Int, Int, Int) | thread block shape |
-> Int | shared memory (bytes) |
-> Maybe Stream | (optional) stream to execute in |
-> [FunParam] | list of function parameters |
-> IO () |
Invoke a kernel on a (gx * gy * gz)
grid of blocks, where each block
contains (tx * ty * tz)
threads and has access to a given number of bytes
of shared memory. The launch may also be associated with a specific Stream
.
In launchKernel
, the number of kernel parameters and their offsets and
sizes do not need to be specified, as this information is retrieved directly
from the kernel's image. This requires the kernel to have been compiled with
toolchain version 3.2 or later.
The alternative launchKernel'
will pass the arguments in directly,
requiring the application to know the size and alignment/padding of each
kernel parameter.
:: Fun | function to execute |
-> (Int, Int, Int) | block grid dimension |
-> (Int, Int, Int) | thread block shape |
-> Int | shared memory (bytes) |
-> Maybe Stream | (optional) stream to execute in |
-> [FunParam] | list of function parameters |
-> IO () |
Invoke a kernel on a (gx * gy * gz)
grid of blocks, where each block
contains (tx * ty * tz)
threads and has access to a given number of bytes
of shared memory. The launch may also be associated with a specific Stream
.
In launchKernel
, the number of kernel parameters and their offsets and
sizes do not need to be specified, as this information is retrieved directly
from the kernel's image. This requires the kernel to have been compiled with
toolchain version 3.2 or later.
The alternative launchKernel'
will pass the arguments in directly,
requiring the application to know the size and alignment/padding of each
kernel parameter.
setBlockShape :: Fun -> (Int, Int, Int) -> IO () Source #
Deprecated: use launchKernel instead
Specify the (x,y,z)
dimensions of the thread blocks that are created when
the given kernel function is launched.
setSharedSize :: Fun -> Integer -> IO () Source #
Deprecated: use launchKernel instead
Set the number of bytes of dynamic shared memory to be available to each thread block when the function is launched