Copyright | [2014..2017] Trevor L. McDonell |
---|---|
License | BSD3 |
Maintainer | Trevor L. McDonell <tmcdonell@cse.unsw.edu.au> |
Stability | experimental |
Portability | non-portable (GHC extensions) |
Safe Haskell | None |
Language | Haskell2010 |
The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA GPUs.
To use operations from the cuBLAS library, the user must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the desired sequence of cuBLAS functions, then copy the results from the GPU memory space back to the host.
The cuda package can be used for writing to and retrieving data from the GPU.
- Data layout
Unlike modern BLAS libraries, cuBLAS only provides Fortran-style implementations of the subprograms, using column-major storage and 1-based indexing.
The ?geam
operation can be used to perform matrix transposition.
- Example
At a short example, we show how to compute the following matrix-matrix
product with dgemm
:
\[ \left(\begin{matrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{matrix}\right) \cdot \left(\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{matrix}\right) = \left(\begin{matrix} 9 & 12 & 15 \\ 19 & 26 & 33 \\ 29 & 40 & 51 \\ \end{matrix}\right) \]
I assume you know how to initialise the CUDA environment, as described in the Foreign.CUDA.Driver module:
>>>
import Foreign.CUDA.Driver as CUDA
>>>
import Foreign.CUDA.BLAS as BLAS
>>>
CUDA.initialise []
>>>
dev <- CUDA.device 0
>>>
ctx <- CUDA.create dev []
Just as we must create a CUDA execution context with
create
before interacting with the GPU, we must create
a BLAS context handle before executing any cuBLAS library operations, which
will be associated with the current device context:
>>>
hdl <- BLAS.create
Now, let us generate the matrix data on the GPU. (For simplicity in this example we will just marshal the data via lists, but in a real application with a large amount of data we should of course use some kind of unboxed array):
>>>
let rowsA = 3; colsA = 2; sizeA = rowsA * colsA
>>>
let rowsB = 2; colsB = 3; sizeB = rowsB * colsB
>>>
let sizeC = rowsA * colsB
>>>
matA <- CUDA.newListArray (take sizeA [1..])
>>>
matB <- CUDA.newListArray (take sizeB [1..])
>>>
matC <- CUDA.mallocArray sizeC
Note in the above that we store data in row-major order, as is the convention in C. However, the cuBLAS library assumes a column-major representation, as is the style of Fortran. However, we can make use of the following equivalency:
\[ B^T \cdot A^T = (A \cdot B)^T \]
and, since the transposed matrix in column-major representation is equivalent to our matrix in row-major representation, we can avoid any actual data manipulation to get things into a form suitable for cuBLAS (phew!).
The final thing to take care of are the scaling parameters to the dgemm
operation, \(\alpha\) and \(\beta\). By default, it is assumed that these
values reside in host memory, but this setting can be changed with
setPointerMode
; When set to Device
mode, the function
withDevicePtr
can be used to treat the device memory
pointer as a plain pointer to pass to the function.
Now, we are ready to piece it all together:
>>>
import Foreign.Marshal
>>>
with 1.0 $ \alpha ->
>>>
with 0.0 $ \beta ->
>>>
dgemm hdl N N colsB rowsA colsA alpha matB colsB matA colsA beta matC colsB
And retrieve the result:
>>>
print =<< CUDA.peekListArray sizeC matC
[9.0,12.0,15.0,19.0,26.0,33.0,29.0,40.0,51.0]
Finally, we should free
the device memory we allocated,
and release the BLAS context handle:
>>>
BLAS.destroy hdl
- Additional information
For more information, see the NVIDIA cuBLAS documentation:
Synopsis
- data MathMode
- data AtomicsMode
- data PointerMode
- newtype Handle = Handle (Ptr ())
- create :: IO Handle
- destroy :: Handle -> IO ()
- setPointerMode :: Handle -> PointerMode -> IO ()
- getPointerMode :: Handle -> IO PointerMode
- setAtomicsMode :: Handle -> AtomicsMode -> IO ()
- getAtomicsMode :: Handle -> IO AtomicsMode
- setMathMode :: Handle -> MathMode -> IO ()
- getMathMode :: Handle -> IO MathMode
- module Foreign.CUDA.BLAS.Stream
- data CUBLASException
- data Status
- describe :: Status -> String
- cublasError :: String -> IO a
- requireSDK :: Name -> Double -> IO a
- checkStatus :: CInt -> IO ()
- module Foreign.CUDA.BLAS.Level1
- module Foreign.CUDA.BLAS.Level2
- module Foreign.CUDA.BLAS.Level3
Control
Enum for default math mode / tensor math mode
Instances
Enum MathMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types | |
Eq MathMode Source # | |
Show MathMode Source # | |
data AtomicsMode Source #
Determines whether cuBLAS routines can alternate implementations which make use of atomic instructions.
http://docs.nvidia.com/cuda/cublas/index.html#cublasatomicsmode_t
Instances
Enum AtomicsMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types succ :: AtomicsMode -> AtomicsMode # pred :: AtomicsMode -> AtomicsMode # toEnum :: Int -> AtomicsMode # fromEnum :: AtomicsMode -> Int # enumFrom :: AtomicsMode -> [AtomicsMode] # enumFromThen :: AtomicsMode -> AtomicsMode -> [AtomicsMode] # enumFromTo :: AtomicsMode -> AtomicsMode -> [AtomicsMode] # enumFromThenTo :: AtomicsMode -> AtomicsMode -> AtomicsMode -> [AtomicsMode] # | |
Eq AtomicsMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types (==) :: AtomicsMode -> AtomicsMode -> Bool # (/=) :: AtomicsMode -> AtomicsMode -> Bool # | |
Show AtomicsMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types showsPrec :: Int -> AtomicsMode -> ShowS # show :: AtomicsMode -> String # showList :: [AtomicsMode] -> ShowS # |
data PointerMode Source #
For functions which take scalar value arguments, determines whether those values are passed by reference on the host or device.
http://docs.nvidia.com/cuda/cublas/index.html#cublaspointermode_t
Instances
Enum PointerMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types succ :: PointerMode -> PointerMode # pred :: PointerMode -> PointerMode # toEnum :: Int -> PointerMode # fromEnum :: PointerMode -> Int # enumFrom :: PointerMode -> [PointerMode] # enumFromThen :: PointerMode -> PointerMode -> [PointerMode] # enumFromTo :: PointerMode -> PointerMode -> [PointerMode] # enumFromThenTo :: PointerMode -> PointerMode -> PointerMode -> [PointerMode] # | |
Eq PointerMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types (==) :: PointerMode -> PointerMode -> Bool # (/=) :: PointerMode -> PointerMode -> Bool # | |
Show PointerMode Source # | |
Defined in Foreign.CUDA.BLAS.Internal.Types showsPrec :: Int -> PointerMode -> ShowS # show :: PointerMode -> String # showList :: [PointerMode] -> ShowS # |
An opaque handle to the cuBLAS library context, which is passed to all library function calls.
http://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t
This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls.
destroy :: Handle -> IO () Source #
This function releases hardware resources used by the CUBLAS library. The release of GPU resources may be deferred until the application exits. This function is usually the last call with a particular handle to the CUBLAS library.
setPointerMode :: Handle -> PointerMode -> IO () Source #
Set the pointer mode used by cuBLAS library functions. For example, this controls whether the scaling parameters \(\alpha\) and \(\beta\) of the ?gemm operation are treated as residing in host or device memory.
The default mode is for values to be passed by reference from the host.
http://docs.nvidia.com/cuda/cublas/index.html#cublassetpointermode
getPointerMode :: Handle -> IO PointerMode Source #
Get the pointer mode used by cuBLAS library functions to pass scalar arguments.
http://docs.nvidia.com/cuda/cublas/index.html#cublasgetpointermode
setAtomicsMode :: Handle -> AtomicsMode -> IO () Source #
Set whether cuBLAS library functions are allowed to use atomic functions, when available. The implementations are generally faster, but can generate results which are not strictly identical from one run to another.
http://docs.nvidia.com/cuda/cublas/index.html#cublassetatomicsmode
getAtomicsMode :: Handle -> IO AtomicsMode Source #
Determine whether cuBLAS library functions are allowed to use atomic operations.
http://docs.nvidia.com/cuda/cublas/index.html#cublasgetatomicsmode
setMathMode :: Handle -> MathMode -> IO () Source #
Set whether cuBLAS library functions are allowed to use Tensor Core operations where available.
http://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode
since 0.4.0.0
getMathMode :: Handle -> IO MathMode Source #
Determine whether cuBLAS library functions are allowed to use Tensor Core operations where available.
http://docs.nvidia.com/cuda/cublas/index.html#cublasgetmathmode
since 0.4.0.0
module Foreign.CUDA.BLAS.Stream
data CUBLASException Source #
Instances
Show CUBLASException Source # | |
Defined in Foreign.CUDA.BLAS.Error showsPrec :: Int -> CUBLASException -> ShowS # show :: CUBLASException -> String # showList :: [CUBLASException] -> ShowS # | |
Exception CUBLASException Source # | |
Defined in Foreign.CUDA.BLAS.Error |
Error codes used by cuBLAS library functions
http://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t
Success | |
NotInitialized | |
AllocFailed | |
InvalidValue | |
ArchMismatch | |
MappingError | |
ExecutionFailed | |
InternalError | |
NotSupported | |
LicenseError |
cublasError :: String -> IO a Source #
Raise a CUBLASException in the IO Monad
checkStatus :: CInt -> IO () Source #
Operations
module Foreign.CUDA.BLAS.Level1
module Foreign.CUDA.BLAS.Level2
module Foreign.CUDA.BLAS.Level3