Accelerate is an embedded language of array-based computations for high-performance computing in Haskell. Programs in Accelerate are expressed in the form of parameterised collective operations on regular multidimensional arrays, such as maps, reductions, and permutations. Accelerate then takes these computations and optimises, compiles, and executes them for the chosen target architecture.
Accelerate is an embedded language, which distinguishes between vanilla Haskell arrays and embedded language arrays, as well as the computations on each of these. Programs written in Accelerate are not compiled by the regular Haskell compiler (GHC). Rather, each Accelerate backend is a runtime compiler which generates and executes [parallel SIMD] code of the target architecture at application runtime.
Accelerate distinguishes the types of collective operations Acc
from the type of scalar operations Exp
to achieve a stratified language. Collective operations comprise many scalar computations which are executed in parallel, but scalar computations can not initiate new collective operations. This distinction excludes nested, irregular data-parallelism statically; instead, Accelerate is limited to flat data-parallelism involving only regular, multi-dimensional arrays.
The type constructor Acc
represents embedded collective array operations. A term of type Acc a
is an Accelerate program which, once executed, will produce a value of type a
(consisting of one or more arrays). Collective operations of type Acc a
comprise many scalar expressions, represented by the type Exp
, which will be executed in parallel. Although collective operations comprise many scalar operations executed in parallel, scalar operations can not initiate new collective operations. This stratification between scalar operations in Exp
and collective operations in Acc
helps to statically exclude nested data-parallelism, which is difficult to execute efficiently on constrained hardware such as GPUs.
For example, to compute a vector dot product we can write:
import Data.Array.Accelerate as A
import qualified Prelude as P
dotp :: Num a => Vector a -> Vector a -> Acc (Scalar a)
dotp xs ys =
let
xs' = use xs
ys' = use ys
in
fold (+) 0 ( zipWith (*) xs' ys' )
The function dotp
consumes two one-dimensional arrays (Vector
s) of values, and produces a single (Scalar
) result as output. As the return type is wrapped in Acc
, we see that it is an embedded Accelerate computation---it will be evaluated in the object language of dynamically generated parallel code, rather than the meta language of vanilla Haskell.
The arguments to dotp
are plain Haskell arrays (not wrapped in Acc
). To make these arrays accessible to Accelerate computations, they must first be embedded with the use
function. This turns a regular, vanilla array, or tuple of arrays, into an Accelerate array:
use :: Arrays a => a -> Acc a
An Accelerate backed is use to evaluate the embedded computation and return the result back to vanilla Haskell. Calling the run
function of a backend will generate code for the target architecture, compile, and execute it. Currently the following backends are available:
See the Getting Started section for instructions on installing these backends.
Tip! Since Acc
represents embedded computations that will only be executed when evaluated by a backend, we can programmatically generate computations using the meta language (Haskell): for example, unrolling loops or embedding input values directly into the generated code. See the fluid simulation program for an example.
Heads up! It is usually best to keep all intermediate computations in Acc
, and only run
the computation at the very end to produce the final result. This enables optimisations between intermediate computations (e.g. array fusion) and, if the target architecture has a separate memory space, as is the case of GPUs, to prevent excessive data transfers.
The type constructor Exp
represents embedded scalar expressions. The collective operations in Accelerate of type Acc
consist of many scalar operations of type Exp
executed in parallel.
Accelerate implements (or redefines) instances for the familiar Haskell type classes for scalar expressions. In the dotp
program above, this allows us to make use of the usual numeric operations (+)
and (*)
from the Num
typeclass, applied to embedded scalar expressions.
Analogously to use
, to make scalar values accessible to Accelerate computations they must first be embedded with the constant
function.
use :: Elt t => t -> Exp t
Tip! For constant numeric values this is often performed automatically. Notice in the dotp
program we did not need to use the constant
function to inject the initial value 0. This is because GHC applies the fromInteger
function of the Num
typeclass (or the fromRational
function of the Fractional
typeclass for floating point values) to the constant value, which implements the lifting operation for us.
The Array
is the core computational unit of Accelerate. Computations in Accelerate take the form of collective operations over arrays of the type Array sh e
. All programs in Accelerate take zero or more arrays as input and produce one or more arrays as output. The Array
type has two parameters:
sh: is the shape of the array, tracking the dimensionality and extent of each dimension of the array. For example, DIM1
is the type of the shape of a one dimensional array (Vector
), DIM2
for two-dimensional arrays, and so on. See the section array shapes for more information.
e: represents the type each array element, such as Int
or Float
. See the section on allowable array element types for more information.
Array data is stored unboxed in an unzipped struct-of-array representation, and elements are laid out in row-major order (the right-most index of the shape is the fastest varying).
If the object code is executed in a separate memory space, for example on a GPU, arrays will be transferred to the target device as necessary (asynchronously and in parallel with other tasks) and cached on the device as long as sufficient memory is available.
The Elt
class characterises the allowable array element types, and hence the types which can appear in scalar Accelerate expressions. It roughly consists of:
Char
Bool
()
Z
and (:.)
Note that Array
itself is not an allowable array element type; there are no nested arrays in Accelerate, only regular multi-dimensional arrays.
Accelerate arrays consist of these simple atomic types stored efficiently in memory, as consecutive unpacked elements without pointers in an unzipped struct-of-array format.
Adding new instances to Elt
consists of explaining to Accelerate how to map between your data type and a (tuple of) primitive values. For examples see:
Operations in Accelerate consist of collective operations over arrays of type Array sh e
. Much like the repa library, the shape of an array, as well as array indices used to access individual elements, are built inductively using Z
and (:.)
(analogously to a list).
data Z = Z
data tail :. head = tail :. head
The constructor Z
corresponds to a shape with zero dimensions (or a Scalar
array consisting of a single element) and us used to mark the end of the list. The constructor (:.)
adds additional dimensions to the right of an index. For example:
Z :. Int
is the type of the shape of a one-dimensional array (Vector
) indexed by an Int
, while:
Z :. Int :. Int
is the type of the shape of a two-dimensional array indexed by an Int
in each dimension.
This style is used to construct both the type (as shown above) as well as the value of a shape. For example, a vector of ten elements has the following shape:
sh :: Z :. Int
sh = Z :. 10
Heads up! The right-most index corresponds to the innermost dimension. This is the fastest-varying index, and corresponds to the elements of the array which are adjacent in memory.
The common shape and array types that we have seen above are simply type synonyms:
type DIM0 = Z
type DIM1 = DIM0 :. Int
type DIM2 = DIM1 :. Int
-- and so on...
type Scalar e = Array DIM0 e
type Vector e = Array DIM1 e