halide-haskell
Halide is a programming language designed to make
it easier to write high-performance image and array processing code on modern
machines. Rather than being a standalone programming language, Halide is
embedded in C++. This means you write C++ code that builds an in-memory
representation of a Halide pipeline using Halide's C++ API. You can then
compile this representation to an object file, or JIT-compile it and run it in
the same process.
|
This package provides Haskell bindings that allow to write Halide embedded in
Haskell without C++ π.
π Example usage
As a simple example, here's how you could implement array addition with halide-haskell:
{-# LANGUAGE AllowAmbiguousTypes, DataKinds, OverloadedStrings, ViewPatterns #-}
import Language.Halide
-- The algorithm
mkArrayPlus = compile $ \(buffer "a" -> a) (buffer "b" -> b) -> do
-- Create an index variable
i <- mkVar "i"
-- Define the resulting function. We call it "out".
-- In pseudocode it's equivalent to the following: out[i] = a[i] + b[i]
out <- define "out" i $ a ! i + b ! i
-- Perform a fancy optimization and use SIMD: we split the loop over i into
-- an inner and an outer loop and then vectorize the inner loop
inner <- mkVar "inner"
split TailAuto i (i, inner) 4 out >>= vectorize inner
-- Example usage of our Halide pipeline
main :: IO ()
main = do
let a, b :: [Float]
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
-- Compile the code
arrayPlus <- mkArrayPlus
-- We tell Halide to treat our list as a one-dimensional buffer
withHalideBuffer @1 @Float a $ \a' ->
withHalideBuffer b $ \b' ->
-- allocate a temporary buffer for the output
allocaCpuBuffer [length a] $ \out' -> do
-- execute the kernel -- it is a normal function call!
arrayPlus a' b' out'
-- print the result
print =<< peekToList out'
For more examples, have a look at the tutorials.
π» Installing
Currently, the easiest way to install the library is using
Nix. It is not a fundamental limitation, because the
library itself is just a normal Cabal-based Haskell project, but installing &
patching (not all our bug fixes have been upstreamed yet) the system
dependencies is just too much work without Nix.
So, once you have Nix installed, you can add halide-haskell to your flake
inputs like this project
demonstrates
and then include it in your build-depends
section in the Cabal file.
If you just want to try building the library, type
nix build
and to run an example, try
nix run
nix run .#ghc927-intel-ocl.halide-haskell # for Intel OpenCL support
nix run .#ghc927-cuda.halide-haskell # for CUDA support
nix run .#ghc944.halide-haskell # to build with GHC 9.4.4 instead
(for OpenCL and CUDA, you may need to set NIXPKGS_ALLOW_UNFREE=1
)
π€© Motivation
The availability of Deep Learning frameworks such as
PyTorch or JAX has
revolutionized array processing, independently of whether one works on Machine
Learning tasks or other numerical algorithms. The ecosystem in Haskell has been
catching up as well, and there are now multiple good array
libraries (hmatrix,
massiv,
Accelerate,
arrayfire-haskell,
Hasktorch, are all high-quality
libraries). To accommodate multiple domains, such libraries
have to support hundreds if not thousands of operations (e.g. there are more
than 3.5 thousand of so called βnativeβ functions in PyTorch),
and this count does not include specializations for different device and data
types).
To overcome this difficulty, we propose to build a common extension mechanism
for Haskell array libraries. The mechanism is based on embedding the
Halide language into Haskell that allows to
just-in-time (JIT) compile computational kernels for various hardware.
π€¨ Why not Accelerate?
One might wonder, why write another package instead of relying on
Accelerate for the JIT compilation of the
kernels. Accelerate is a Haskell eDSL (embedded Domain Specific Language) for
collective operations on dense multi-dimensional arrays. It relies on
LLVM to JIT compile the computational kernels for the
target architecture including multicore CPUs and GPUs. Users have to rely on
Accelerate to generate high-performance kernels and have no way to force some
low-level optimizations. For example, Trevor L. McDonell et
al. explain that the reason why
hand-written CUDA
implementation of the N-body
problem outperforms Accelerate
is the use of on-chip shared memory. Another example would be the matrix-matrix
product where achieving maximal performance requires writing no fewer than six
nested loops instead of the naive three (ACM Trans. Math. Softw. 34, 3,
Article 12 (May 2008)).
Accelerate has no way of knowing that such optimizations have to be applied and
cannot perform them automatically, and this is precisely the gap that we are
trying to fill by embedding Halide into Haskell.
Halide is a C++ eDSL for high-performance image and array processing. Its core
idea is to decouple the algorithm (i.e. what is computed) from the schedule
(i.e. where and when it is computed). The eDSL allows to quickly prototype and
test the algorithm and then move on to the optimization. Optimizations such as
fusion, tiling, parallelism and vectorization can be freely explored without
the risk of breaking the original algorithm definition. Schedulers can also be
generated automatically by advanced optimization
algorithms
Halide provides a lower level interface than Accelerate and thus does not aim
to replace it. Instead, Halide can be used to extend Accelerate, and later on
one might even think about using Halide as a backend for Accelerate.
π¨ Contributing
Currently, the best way to get started is to use Nix:
nix develop
This will drop you into a shell with all the necessary tools to build the code
such that you can do
cabal build
and
cabal test