Copyright	[2014..2017] Trevor L. McDonell
License	BSD3
Maintainer	Trevor L. McDonell <tmcdonell@cse.unsw.edu.au>
Stability	experimental
Portability	non-portable (GHC extensions)
Safe Haskell	None
Language	Haskell2010

Foreign.CUDA.BLAS

Contents

Control
Operations

Description

The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA GPUs.

To use operations from the cuBLAS library, the user must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the desired sequence of cuBLAS functions, then copy the results from the GPU memory space back to the host.

The cuda package can be used for writing to and retrieving data from the GPU.

Data layout

Unlike modern BLAS libraries, cuBLAS only provides Fortran-style implementations of the subprograms, using column-major storage and 1-based indexing.

The ?geam operation can be used to perform matrix transposition.

Example

At a short example, we show how to compute the following matrix-matrix product with dgemm:

\[ \left(\begin{matrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{matrix}\right) \cdot \left(\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{matrix}\right) = \left(\begin{matrix} 9 & 12 & 15 \\ 19 & 26 & 33 \\ 29 & 40 & 51 \\ \end{matrix}\right) \]

I assume you know how to initialise the CUDA environment, as described in the Foreign.CUDA.Driver module:

>>> import Foreign.CUDA.Driver as CUDA
>>> import Foreign.CUDA.BLAS as BLAS
>>> CUDA.initialise []
>>> dev <- CUDA.device 0
>>> ctx <- CUDA.create dev []

Just as we must create a CUDA execution context with create before interacting with the GPU, we must create a BLAS context handle before executing any cuBLAS library operations, which will be associated with the current device context:

>>> hdl <- BLAS.create

Now, let us generate the matrix data on the GPU. (For simplicity in this example we will just marshal the data via lists, but in a real application with a large amount of data we should of course use some kind of unboxed array):

>>> let rowsA = 3; colsA = 2; sizeA = rowsA * colsA
>>> let rowsB = 2; colsB = 3; sizeB = rowsB * colsB
>>> let sizeC = rowsA * colsB
>>> matA <- CUDA.newListArray (take sizeA [1..])
>>> matB <- CUDA.newListArray (take sizeB [1..])
>>> matC <- CUDA.mallocArray sizeC

Note in the above that we store data in row-major order, as is the convention in C. However, the cuBLAS library assumes a column-major representation, as is the style of Fortran. However, we can make use of the following equivalency:

\[ B^T \cdot A^T = (A \cdot B)^T \]

and, since the transposed matrix in column-major representation is equivalent to our matrix in row-major representation, we can avoid any actual data manipulation to get things into a form suitable for cuBLAS (phew!).

The final thing to take care of are the scaling parameters to the dgemm operation, \(\alpha\) and \(\beta\). By default, it is assumed that these values reside in host memory, but this setting can be changed with setPointerMode; When set to Device mode, the function withDevicePtr can be used to treat the device memory pointer as a plain pointer to pass to the function.

Now, we are ready to piece it all together:

>>> import Foreign.Marshal
>>> with 1.0 $ \alpha ->
>>> with 0.0 $ \beta ->
>>> dgemm hdl N N colsB rowsA colsA alpha matB colsB matA colsA beta matC colsB

And retrieve the result:

>>> print =<< CUDA.peekListArray sizeC matC
[9.0,12.0,15.0,19.0,26.0,33.0,29.0,40.0,51.0]

Finally, we should free the device memory we allocated, and release the BLAS context handle:

>>> BLAS.destroy hdl

Additional information

For more information, see the NVIDIA cuBLAS documentation:

http://docs.nvidia.com/cuda/cublas/index.html

Synopsis

data MathMode
- = DefaultMath
- | TensorMath
data AtomicsMode
- = NotAllowed
- | Allowed
data PointerMode
- = Host
- | Device
newtype Handle = Handle (Ptr ())
create :: IO Handle
destroy :: Handle -> IO ()
setPointerMode :: Handle -> PointerMode -> IO ()
getPointerMode :: Handle -> IO PointerMode
setAtomicsMode :: Handle -> AtomicsMode -> IO ()
getAtomicsMode :: Handle -> IO AtomicsMode
setMathMode :: Handle -> MathMode -> IO ()
getMathMode :: Handle -> IO MathMode
module Foreign.CUDA.BLAS.Stream
data CUBLASException
- = ExitCode Status
- | UserError String
data Status
- = Success
- | NotInitialized
- | AllocFailed
- | InvalidValue
- | ArchMismatch
- | MappingError
- | ExecutionFailed
- | InternalError
- | NotSupported
- | LicenseError
describe :: Status -> String
cublasError :: String -> IO a
requireSDK :: Name -> Double -> IO a
checkStatus :: CInt -> IO ()
module Foreign.CUDA.BLAS.Level1
module Foreign.CUDA.BLAS.Level2
module Foreign.CUDA.BLAS.Level3

Control

data MathMode Source #

Enum for default math mode / tensor math mode

Constructors

DefaultMath
TensorMath

Instances

Instances details

Enum MathMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods succ :: MathMode -> MathMode # pred :: MathMode -> MathMode # toEnum :: Int -> MathMode # fromEnum :: MathMode -> Int # enumFrom :: MathMode -> [MathMode] # enumFromThen :: MathMode -> MathMode -> [MathMode] # enumFromTo :: MathMode -> MathMode -> [MathMode] # enumFromThenTo :: MathMode -> MathMode -> MathMode -> [MathMode] #
Eq MathMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods (==) :: MathMode -> MathMode -> Bool # (/=) :: MathMode -> MathMode -> Bool #
Show MathMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods showsPrec :: Int -> MathMode -> ShowS # show :: MathMode -> String # showList :: [MathMode] -> ShowS #

data AtomicsMode Source #

Determines whether cuBLAS routines can alternate implementations which make use of atomic instructions.

http://docs.nvidia.com/cuda/cublas/index.html#cublasatomicsmode_t

Constructors

NotAllowed
Allowed

Instances

Instances details

Enum AtomicsMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods succ :: AtomicsMode -> AtomicsMode # pred :: AtomicsMode -> AtomicsMode # toEnum :: Int -> AtomicsMode # fromEnum :: AtomicsMode -> Int # enumFrom :: AtomicsMode -> [AtomicsMode] # enumFromThen :: AtomicsMode -> AtomicsMode -> [AtomicsMode] # enumFromTo :: AtomicsMode -> AtomicsMode -> [AtomicsMode] # enumFromThenTo :: AtomicsMode -> AtomicsMode -> AtomicsMode -> [AtomicsMode] #
Eq AtomicsMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods (==) :: AtomicsMode -> AtomicsMode -> Bool # (/=) :: AtomicsMode -> AtomicsMode -> Bool #
Show AtomicsMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods showsPrec :: Int -> AtomicsMode -> ShowS # show :: AtomicsMode -> String # showList :: [AtomicsMode] -> ShowS #

data PointerMode Source #

For functions which take scalar value arguments, determines whether those values are passed by reference on the host or device.

http://docs.nvidia.com/cuda/cublas/index.html#cublaspointermode_t

Constructors

Host
Device

Instances

Instances details

Enum PointerMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods succ :: PointerMode -> PointerMode # pred :: PointerMode -> PointerMode # toEnum :: Int -> PointerMode # fromEnum :: PointerMode -> Int # enumFrom :: PointerMode -> [PointerMode] # enumFromThen :: PointerMode -> PointerMode -> [PointerMode] # enumFromTo :: PointerMode -> PointerMode -> [PointerMode] # enumFromThenTo :: PointerMode -> PointerMode -> PointerMode -> [PointerMode] #
Eq PointerMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods (==) :: PointerMode -> PointerMode -> Bool # (/=) :: PointerMode -> PointerMode -> Bool #
Show PointerMode Source #
Instance details Defined in Foreign.CUDA.BLAS.Internal.Types Methods showsPrec :: Int -> PointerMode -> ShowS # show :: PointerMode -> String # showList :: [PointerMode] -> ShowS #

newtype Handle Source #

An opaque handle to the cuBLAS library context, which is passed to all library function calls.

http://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t

Constructors

Handle (Ptr ())

create :: IO Handle Source #

This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls.

http://docs.nvidia.com/cuda/cublas/index.html#cublascreate

destroy :: Handle -> IO () Source #

This function releases hardware resources used by the CUBLAS library. The release of GPU resources may be deferred until the application exits. This function is usually the last call with a particular handle to the CUBLAS library.

http://docs.nvidia.com/cuda/cublas/index.html#cublasdestroy

setPointerMode :: Handle -> PointerMode -> IO () Source #

Set the pointer mode used by cuBLAS library functions. For example, this controls whether the scaling parameters \(\alpha\) and \(\beta\) of the ?gemm operation are treated as residing in host or device memory.

The default mode is for values to be passed by reference from the host.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetpointermode

getPointerMode :: Handle -> IO PointerMode Source #

Get the pointer mode used by cuBLAS library functions to pass scalar arguments.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetpointermode

setAtomicsMode :: Handle -> AtomicsMode -> IO () Source #

Set whether cuBLAS library functions are allowed to use atomic functions, when available. The implementations are generally faster, but can generate results which are not strictly identical from one run to another.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetatomicsmode

getAtomicsMode :: Handle -> IO AtomicsMode Source #

Determine whether cuBLAS library functions are allowed to use atomic operations.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetatomicsmode

setMathMode :: Handle -> MathMode -> IO () Source #

Set whether cuBLAS library functions are allowed to use Tensor Core operations where available.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode

since 0.4.0.0

getMathMode :: Handle -> IO MathMode Source #

Determine whether cuBLAS library functions are allowed to use Tensor Core operations where available.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetmathmode

since 0.4.0.0

module Foreign.CUDA.BLAS.Stream

data CUBLASException Source #

Constructors

ExitCode Status
UserError String

Instances

Instances details

Show CUBLASException Source #
Instance details Defined in Foreign.CUDA.BLAS.Error Methods showsPrec :: Int -> CUBLASException -> ShowS # show :: CUBLASException -> String # showList :: [CUBLASException] -> ShowS #
Exception CUBLASException Source #
Instance details Defined in Foreign.CUDA.BLAS.Error Methods toException :: CUBLASException -> SomeException # fromException :: SomeException -> Maybe CUBLASException # displayException :: CUBLASException -> String #

data Status Source #

Error codes used by cuBLAS library functions

http://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t

Constructors

Success
NotInitialized
AllocFailed
InvalidValue
ArchMismatch
MappingError
ExecutionFailed
InternalError
NotSupported
LicenseError

Instances

Instances details

Enum Status Source #
Instance details Defined in Foreign.CUDA.BLAS.Error Methods succ :: Status -> Status # pred :: Status -> Status # toEnum :: Int -> Status # fromEnum :: Status -> Int # enumFrom :: Status -> [Status] # enumFromThen :: Status -> Status -> [Status] # enumFromTo :: Status -> Status -> [Status] # enumFromThenTo :: Status -> Status -> Status -> [Status] #
Eq Status Source #
Instance details Defined in Foreign.CUDA.BLAS.Error Methods (==) :: Status -> Status -> Bool # (/=) :: Status -> Status -> Bool #
Show Status Source #
Instance details Defined in Foreign.CUDA.BLAS.Error Methods showsPrec :: Int -> Status -> ShowS # show :: Status -> String # showList :: [Status] -> ShowS #

describe :: Status -> String Source #

cublasError :: String -> IO a Source #

Raise a CUBLASException in the IO Monad

requireSDK :: Name -> Double -> IO a Source #

A specially formatted error message

checkStatus :: CInt -> IO () Source #

Operations

module Foreign.CUDA.BLAS.Level1

module Foreign.CUDA.BLAS.Level2

module Foreign.CUDA.BLAS.Level3