How to analyse Reduce?

Ruizhe Zhao

unread,

May 7, 2019, 10:10:03 AM5/7/19

to Spatial Users

Hi there,

In the transformer I'm working on, the graph that involves Reduce appears to be quite difficult to analyse.

I intend to traverse from a register by its inputs, and when a Reduce (a.k.a OpReduce in spatial.nodes) is met, perform a specific transform.

To be specific, suppose we have a spatial design that contains the following line of code:

val z = Reduce(Reg[T](0))(N by 1) { i => s1(i) * s2(i) } {_ + _} // T is a FixPt alias

As far as I've discovered, a graph looks as the following will be constructed (after dead code elimination passes):

Here:

RegNew(0) creates the Reg to be accumulated on. Its symbol is x19.
x19 will be read by RegRead(x19) to produce z.
x19 will also be referenced by OpReduce as the accum argument
x19 will be updated by the result from an implicit function FixAdd(b22, b23) through RegWrite(x19, x29)

Based on this topology, I cannot trace to OpReduce simply by using the inputs of a node (a.k.a Sym): reversing the direction to consumers is necessary.

I can definitely do some extra work in my transformer to trace to OpReduce by looking at the consumers (implicit) field as well, but I'm just wondering if there is an elegant way of doing so. I feel like "inputs" is not a proper way to look up producers of an operator, for instance, x19 is an input to RegWrite but it is actually the writing destination.

Please let me know if there is anything that is not clear to you. Thanks!

Best regards

Ruizhe

David Koeplinger

unread,

May 7, 2019, 1:57:12 PM5/7/19

to Spatial Users

Agreed that it looks weird for RegWrite to have its register as an input. Just note that this is the canonical(ish) way of describing memory writes in single static assignment (SSA) form.

Reg defines one logical data structure, much like an Array or SRAM, so, as with other data structures, writes to registers in the IR take the form Update(structure, address, value).

There are multiple ways to detect that a symbol is written in an operation, including looking at the Effects of the node (this includes a written symbols Set), looking at the writers metadata of the original symbol, and just pattern matching on the node.

You're probably already aware of what Reduce is describing here, but just in case:

The first ("map") function (s1(i) * s2(i)) defines how to produce a single element per iteration.

The second ("reduce") function (_ + _) defines how to combine any two elements together. This includes both operation(s) used for how an element might be combined into an accumulator AND how elements might get combined together independent of the accumulator, e.g. in a hardware reduction tree.

Reduce gives us a nice abstraction which isolates the combination function, and therefore the "hard" part of parallelizing reductions (the accumulation cycle) in an isolated way.

This allows us to easily generate parallel implementations of the Reduce without mucking around with isolating the cycle, identifying whether it's a valid candidate for parallelization and figuring it how to duplicate it on other inputs.

This is particularly important because, in general, it's impossible to tell whether a series of operations are associative and, furthermore, floating point operations are NOT associative by nature. Using them in a Reduce and declaring the reduce tells the compiler to override "strict" behavior.

However, you're right, these separate functions does make analyzing the dataflow of the Reduce a pain. There's only a couple things you can do here:

1. Don't try to run the transformation here.

2. Wait until the graph is unrolled to transform - this is what the transformer that fuses multiplies and adds (RewriteTransformer) does.

3. Keep a table of what symbols will alias with the bound symbols (b22 and b23) and visit the reduce function while checking this table.

I would discourage 3 if you can avoid it. There is little you can transform this Reduce into while keeping the operation semantically the same, especially because the way the reduction function will be used varies depending on how the unrolling transformer runs.

Best,

David

Ruizhe Zhao

unread,

May 8, 2019, 8:43:46 AM5/8/19

to Spatial Users

Hi David,

Thank you so much for your explanation.

Just note that this is the canonical(ish) way of describing memory writes in single static assignment (SSA) form.

Thanks for pointing that out. I didn't realise that Spatial IR is in SSA form (is it?)

There's only a couple things you can do here:

My objective is to construct a new Reduce operation based on an existing one. I kind of know how to transform "map" and "reduce" from the ones in the given OpReduce statement. I'm thinking of creating new Blocks by Lambda2 (or stageLambda2, stageBlock) and use them to build OpReduce. Is it a good way to go?

Best regards

Ruizhe

David Koeplinger

unread,

May 8, 2019, 2:40:57 PM5/8/19

to Spatial Users

Spatial is... sort of SSA. I think it would be most accurate to say that Spatial's IR is a hierarchical control-dataflow graph (CDFG), which shares some features in common with SSA.

That said, Spatial doesn't handle mutation through the standard variable renaming + PHI scheme.

Instead, variables, registers, arrays, and the like are allocated as a single symbol with a "mutable" property. This symbol represents the memory allocation itself. Writes/updates to that symbol are separate effectful operations.

e.g. if you write:

var x: Int = 3
if (c) x = x + 5 else x = x - 1
println(x)

This is represented as

x1 = Var(3)

x2 = IfThenElse(x0) {

x3 = VarRead(x1)

x4 = Add(x3, 5)

x5 = VarWrite(x1, x4)

} {

x6 = VarRead(x1)

x7 = Sub(x6, 1)

x8 = VarWrite(x1, x4)

}

x9 = VarRead(x1)

x10 = Println(x9)

Rather than:

x1 = 3

x0 ? {

x2 = x1 + 5

} {

x3 = x1 - 1

}

x4 = PHI(x2, x3)

x5 = Println(x4)

As for transforming Reduce, it sounds like you're on the right track.

Reply all

Reply to author

Forward