Re: Overlapped tiling and split tiling

20 views
Skip to first unread message

ol...@okmij.org

unread,
Oct 16, 2012, 11:32:58 PM10/16/12
to mura...@gmail.com, Albert...@inria.fr, stag...@googlegroups.com

Hello!

I have recently had time to improve the previous solution to
Takayuki's stencil challenge. It is now statically assured that the
generated code never accesses an array out of bounds. Whereas the
lower bound of all arrays is assumed 0 for now, the upper bound
becomes known only at run-time. Needless to say the assurance is
provided without generating run-time array-bounds check. The generator
therefore has to symbolically reason about indices, loop variables and
array bounds. The new version has (much) simpler interface and is
hopefully easier to use. There is only one tunable parameter -- and
even that is chosen in a regular way: start from 0 and keep increasing
until the generator stops complaining.

Here is the generator for the two-time--step fused code with no
optimizations. It takes three parameters: the names for the input and
output arrays and the code for the upper array bound. The operations
+@, -@, *@, <-@ are element-wise computations on arrays and (ashift n)
shifts an array by n (to the left, if n is positive). The generator
(looph n) generates a loop and unfolds the first and the last n
iterations.

let t21 max_bounded a b =
forlooph 2 min_bounded max_bounded (
let a = gref0 a in
let w1 = a *@ ashift 1 a in
let wm = a -@ w1 +@ ashift (-1) w1 in
let w2 = wm *@ ashift 1 wm in
b <-@ wm -@ w2 +@ ashift (-1) w2)
;;

That value n (halo) is the tunable parameter. If we set it
to 1 or 0, the generator raises an exception

Exception: Failure "Can't decide if [L-2,H-2) is within [L,H)".

Indeed, the interval [L-2,H-2) is not within the arrays bounds [L,H)
(here, L means the lower bound and H stands for the statically unknown
upper bound). In fact, when the the loop index is 0, the code would
try to access a[-2], which is out-of-bounds. This generator assumes
that out-of-bound accesses to the input (but not the output) array
return a `halo filler'. However, the generator must statically see
that the access is out-of-bounds. If the halo value (loop unrolling)
is 0, it is not possible to know if the access is in- or out-of-bounds
without a run-time check. The generator complains. When the halo value
is 2 or higher, the unrolled iterations provide just enough
information about array access.

To generate the optimal code with B/F of 16/6, we merely add stencil
annotations

let t25 max_bounded a b =
let p = new_prompt () and q = new_prompt () in
push_prompt q (fun () ->
forlooph 2 min_bounded max_bounded (with_body_prompt p (
let a = stencil_memo q p (gref0 a) in
let w1 = stencil_memo q p (a *@ ashift 1 a) in
let wm = stencil_memo q p (a -@ w1 +@ ashift (-1) w1) in
let w2 = stencil_memo q p (wm *@ ashift 1 wm) in
b <-@ wm -@ w2 +@ ashift (-1) w2)))
;;

The stencil memo table needs to know the scope of the loop and the
scope outside of the loop. Stencil variables that persist across
iterations are placed in the out-of-loop scope. The two `prompts' p
and q denote the two scopes.

The full code with description is available at
http://okmij.org/ftp/meta-programming/HPC.html#stencil

I have also written the hand-optimized code that properly deals with
the edge cases, and verified it. The code and the unit tests are in
the file
http://okmij.org/ftp/meta-programming/shift/stencil_hand.ml
If someone could commit this reference code to the Shonan github
repository, it'll be great.

Cheers,
Oleg

Albert Cohen

unread,
Oct 21, 2012, 7:13:05 PM10/21/12
to ol...@okmij.org, mura...@gmail.com, stag...@googlegroups.com
Hello Oleg,

This looks really great. I still have to understand how you deal with
scope extrusion, but I lacked the time so far...
In the mean time, your explanations and examples have been pushed on the
repository, in the "tiling/oleg" directory.

Best,
Albert

Le 17/10/2012 05:32, ol...@okmij.org a �crit :

ol...@okmij.org

unread,
Oct 23, 2012, 4:07:12 AM10/23/12
to Albert...@inria.fr, mura...@gmail.com, stag...@googlegroups.com

Hello!

> In the mean time, your explanations and examples have been pushed on the
> repository, in the "tiling/oleg" directory.

Thank you!

> I still have to understand how you deal with scope extrusion
When you use the developed DSL there is no scope extrusion, I
think. But as one develops it and makes mistakes during the
implementation, one can certainly come across the scope extrusion,
which can indeed be difficult to trace to a cause, when dozens of
bindings get created. I haven't tried the Haskell-based approach yet.
So, although the posted code answers -- to *some* extent -- the challenge,
and although the code could be practically useful, it is, to me,
theoretically unsatisfactory. OTH, perhaps that's the value of the
challenge, to point out what's missing. I got a better idea what kind
of binding-moving facilities and assurances I would like to have to
develop the DSL and be confident it preserves invariants.

Cheers,
Oleg

Reply all
Reply to author
Forward
0 new messages