Consecutive calls of Map and Reduce operations, fusion needed

37 views
Skip to first unread message

Alexander Filippov

unread,
Jan 20, 2014, 7:42:14 AM1/20/14
to opt...@googlegroups.com
Good afternnon!
 
I am running simple example using SimpleIntVector  DSL provided with forge sources. The example contains consecutive calls of map and reduce operations:
 
val v1 = Vector(10)
val vc = v1.map( e => e+3)
val vc2 = vc.reduce((a,b) => a+b )
 
I run forge in order to generate scala kernels and DEG graph. And what I see as a generated result is separate kernel for Map and one more for Reduce operation. But what I expect to see is" fused" kernel which performs map and reduce together  and do not need any intermediate structures.
 
So, my question is: is Delite able to perform such optimization? If the answer is "yes", could you kindly point me on how to enable this optimization?
 
Best regards,
Alexander Filippov

Arvind Sujeeth

unread,
Jan 20, 2014, 10:55:39 PM1/20/14
to opt...@googlegroups.com
hi Alexander,

Delite is able to perform fusion optimizations. Most likely, the issue
in your SimpleVector example is that Vector(10) allocates a mutable
vector, which can obstruct fusion in the subsequent operations.

If you have OptiML running, a simple test case that demonstrates fusion
is very similar, but starts with an immutable Vector instead:

val v1 = (0::10) { i => i } // constructs an immutable vector of length
10, with values 0,1,2,...9
val vc = v1.map(e => e + 3) // or just v1+3
val vc2 = vc.reduce((a,b) => a+b) // or just vc.sum

If you inspect the generated code that results, you should see only one
multiloop generated, which is the fused kernel performing all of the
operations (v1 construction, map, and reduce) inline as you expect.

cheers,
Arvind
> --
> You received this message because you are subscribed to the Google
> Groups "OptiML" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to optiml+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Alexander Filippov

unread,
Jan 23, 2014, 2:51:09 AM1/23/14
to opt...@googlegroups.com
Hi Arvind,

Thank you for your help! I tried your example and got expected result. 

Playing with OptiML simple test cases, I had some more questions about fusion in Delite. Namely, consider the following example

val buf = SomeDenseVector

val v1 = (0::buf.length) {i:Rep[Int] => buf(i) }
val v2 = (0::buf.length) {i: Rep[Int] => buf(i) + 1}

// buf (2) = 0             // commented yet

val vc1 = v1.map( /* some sunc*/ )
val vc2 = v2.map( /* some func */)

val vc3 = (vc1 + vc2).reduce( (a,b) => a+b)

This example runs fine and produces fully-fused kernel for all operations. 

Now let's uncomment buf(2) = 0.  The programm runs succesfully but we see such a message at compile-time:

error: write to non-mutable Sym(...)

Observing result, we still see all computations fused into one kernel (which is not correct in general). It looks like despite the fact Delite considered fusion to be impossible, it still performed it and, as a result, the program produced incorrect output. At the same time, output produced by interpreted (not staged) version of program is correct. 

So, my questions are: Does Delite always fuse computations (even in case when it is inapplicable)? Is it user who is responsible for program correctness in this case? Is there any way to force Delite not to perform fusion in this case? Is it OK that interpreted and staged version produces different outputs in this case?

Thank you in advance!

best regards,
Alexander.

вторник, 21 января 2014 г., 7:55:39 UTC+4 пользователь asujeeth написал:

Arvind Sujeeth

unread,
Jan 23, 2014, 1:30:05 PM1/23/14
to opt...@googlegroups.com
hi Alex,

No problem. Delite currently runs in what I would call "expert mode", where mutability errors are not fatal, but when you see one, the generated code is not guaranteed to be correct (though sometimes it is anyways). In a production setting, errors like:


error: write to non-mutable Sym(...)

should be fatal errors, as they are a sign that the user is doing something wrong.

What that error means is that you tried to mutate an immutable object. In order to get a mutable copy in OptiML, you can do:

val buf = SomeDenseVector.mutable
buf(2) = 0 // ok


For more information, see http://stanford-ppl.github.io/Delite/optiml/examples.html#mutability

In theory, the interpreter and compiler should always return the same result in a properly designed DSL, but this is not true in practice yet. The main issue is that we don't track effects in the interpreter (and it runs sequentially, so you always get sequential consistency), so you only see those errors in the compiler. With more effort, we could make the interpreter track effects the same way and report the same errors according to the semantic mutability rules in the compiler.

Let me know if any of that didn't make sense or if you have any other questions!

cheers,
Arvind

Alexander Filippov

unread,
Jan 27, 2014, 8:53:29 AM1/27/14
to opt...@googlegroups.com
Hi Arvind,

Thank you, I think I got everything. Just one more question about fusion.  

val buf = SomeDenseVector
val v1 = (0::buf.length) {i:Rep[Int] => buf(i) }
val v2 = (0::buf.length) {i: Rep[Int] => buf(i) + 1}
val vc1 = v1.map( /* some sunc*/ )
val vc2 = v2.map( /* some func */)
val vc3 = (vc1 + vc2).reduce( (a,b) => a+b)

buf (2) = 0             // mutating

val vc4 = (vc1 - vc2).reduce( (a,b) => a+b)  // we expect this is not fused

Looking on the program fragment listed above, one can expect that all computations "before" mutating are fused succesfully. At the same time the calculations in the last row could not be fused and should be placed in separate kernel. On practice, I see  that Delite had fused all the computations into one kernel and, as a result, produced wrong value not only for vc4 but also for vc3. 
Is there any way to force Delite to perform this "partial fusion" without creating additional intermediate variables? I think that this could be useful in some real cases.

best regards,
Alexander

четверг, 23 января 2014 г., 22:30:05 UTC+4 пользователь asujeeth написал:

Arvind Sujeeth

unread,
Jan 28, 2014, 2:22:26 PM1/28/14
to opt...@googlegroups.com
hi Alex,

Is 'buf' in your example explicitly marked mutable (i.e., are you receiving any errors about mutable writes)? If it is not marked mutable, then the compiler will believe all of the loops are safe to fuse. If it is marked mutable, I believe that v1 and v2 should not fuse, but vc1, vc2, vc3 and vc4 should all fuse, since those subsequent operations are immutable.

I am a little bit confused by your example, though. Mutating buf in program order at that point should not have any effect on the value of v1, v2, vc1, vc2, vc3 or vc4, even sequentially, since the mutation happens after the read to buf (when computing v1 and v2). Or am I missing something?

Arvind

Alexander Filippov

unread,
Jan 29, 2014, 8:17:44 AM1/29/14
to opt...@googlegroups.com
Hi Arvind,

That was my incarrucy: 'buf' was not marked mutable. When it is marked, I observe expected behaviour. v1 and v2 are computed explicitly while all other computations are fused. Now I believe I have no misunderstandings here. Thank you for your help!

best regards,
Alexander

вторник, 28 января 2014 г., 23:22:26 UTC+4 пользователь asujeeth написал:
Reply all
Reply to author
Forward
0 new messages