Guidance on Undef Operation

90 views
Skip to first unread message

Mahesh Ravishankar

unread,
Oct 7, 2019, 5:58:53 PM10/7/19
to MLIR
I wanted some guidance on how to handle undef values in the SPIR-V dialect. Please see below for details. Some of the details below requires some basic understanding of SPIR-V dialect.

Issue:

The aim here is to model the OpUndef instruction in the SPIR-V dialect. The semantics of this according to the spec is that the instruction creates an SSA value and each use of the SSA value produces a different bit-pattern (similar to LLVM's undef operation with one difference highlighted below). We can model this directly in the SPIR-V dialect, i.e. an spirv::UndefOp that produces an SSA value with the semantics that every use produces a different bit-pattern.
I find that hard to reason about, since this makes an SSA value produced by the spirv::UndefOp different compared to other SSA values.

Current solution I have in mind :

An alternative way I was thinking of doing this is to define the semantics of spirv::UndefOp as producing an SSA value which represents the same bit-pattern on every use. With this, some special handling is needed during deserialization of a SPIR-V binary into the SPIR-V dialect in MLIR. To stay consistent with the SPIR-V spec, a new UndefOp is created on every use of the result <id> generated by an OpUndef instruction. This is consistent with SPIR-V spec and we don't need to treat every use of the result of a spirv::UndefOp as a different bit-pattern.

Questions
1) I was looking at what LLVM does for undef. The description seems to be consistent with SPIR-V, but there is a slight difference AFAICS. There is no "undef instruction". The undef value is directly embedded into the IR where needed, so this issue does not arise. Could someone confirm this?
2) The LLVM dialect has a llvm.mlir.undef operation that has a specification similar to what the spirv::UndefOp. I think (havent verified), when lowered to LLVM IR an undef value is being created at every use of the result SSA value. This effectively means that the LLVM dialect is modeling undef value as having a different bit pattern on every use. Is there any guidance/opinions of how undef values should/would be modeled in MLIR?

Finally though, I am not really sure there is any observable difference between the two approaches. So this might all be a moot point. Some input here would be very useful to cover any blindspots on my side.

Nuno Lopes

unread,
Oct 7, 2019, 6:28:43 PM10/7/19
to Mahesh Ravishankar, MLIR
Hi,

Regarding your 1st question, it's true that there's no undef instruction in
LLVM. Undef is a value, hence it *has* to be able to produce a different
result on each use (due to how RAUW works).
If undef was an instruction, it could have the semantics that all uses
observe the same value.

There are performance considerations to be had when you impose that all uses
see the same value. In particular, you may need to materialize a value and
keep it in a register so that all uses observe the same value.
Having undef take different values on different uses introduces a lot of
complexity in the optimizers. In particular, optimizers that propagate
information based on braches become hard to implement correctly (e.g., GVN,
range propagation, LICM, etc).

We wrote a paper on the topic a while back:
http://web.ist.utl.pt/nuno.lopes/pubs/undef-pldi17.pdf
TL;DR: avoid an undef where each use can observe a different value. If
possible replicate LLVM's poison instead. If not, use an undef instruction
that makes all uses observe the same value. That will make the life of
optimizer writers easier and avoid a bunch of correctness bugs.

Nuno

Sanjoy Das

unread,
Oct 7, 2019, 6:56:25 PM10/7/19
to Mahesh Ravishankar, MLIR
On Mon, Oct 7, 2019 at 2:58 PM 'Mahesh Ravishankar' via MLIR
<ml...@tensorflow.org> wrote:
>
> I wanted some guidance on how to handle undef values in the SPIR-V dialect. Please see below for details. Some of the details below requires some basic understanding of SPIR-V dialect.
>
> Issue:
>
> The aim here is to model the OpUndef instruction in the SPIR-V dialect. The semantics of this according to the spec is that the instruction creates an SSA value and each use of the SSA value produces a different bit-pattern (similar to LLVM's undef operation with one difference highlighted below). We can model this directly in the SPIR-V dialect, i.e. an spirv::UndefOp that produces an SSA value with the semantics that every use produces a different bit-pattern.
> I find that hard to reason about, since this makes an SSA value produced by the spirv::UndefOp different compared to other SSA values.
>
> Current solution I have in mind :
>
> An alternative way I was thinking of doing this is to define the semantics of spirv::UndefOp as producing an SSA value which represents the same bit-pattern on every use. With this, some special handling is needed during deserialization of a SPIR-V binary into the SPIR-V dialect in MLIR.

This isn't needed for correctness though right? Since every use of
OpUndef _could have_ had the same value? It would even be correct to
"deserialize" each OpUndef into a constant zero.

However, I think we'll have a problem going the other way, from the
SPIR-V dialect to OpUndef. If spirv::UndefOp is defined to produce a
consistent value then it cannot be lowered into OpUndef which will
produce a different value on each use.

-- Sanjoy

> To stay consistent with the SPIR-V spec, a new UndefOp is created on every use of the result <id> generated by an OpUndef instruction. This is consistent with SPIR-V spec and we don't need to treat every use of the result of a spirv::UndefOp as a different bit-pattern.
>
> Questions
> 1) I was looking at what LLVM does for undef. The description seems to be consistent with SPIR-V, but there is a slight difference AFAICS. There is no "undef instruction". The undef value is directly embedded into the IR where needed, so this issue does not arise. Could someone confirm this?
> 2) The LLVM dialect has a llvm.mlir.undef operation that has a specification similar to what the spirv::UndefOp. I think (havent verified), when lowered to LLVM IR an undef value is being created at every use of the result SSA value. This effectively means that the LLVM dialect is modeling undef value as having a different bit pattern on every use. Is there any guidance/opinions of how undef values should/would be modeled in MLIR?
>
> Finally though, I am not really sure there is any observable difference between the two approaches. So this might all be a moot point. Some input here would be very useful to cover any blindspots on my side.
>
> --
> You received this message because you are subscribed to the Google Groups "MLIR" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/067202df-7e32-409f-9418-7ccc82942b9b%40tensorflow.org.

Mehdi AMINI

unread,
Oct 7, 2019, 7:00:57 PM10/7/19
to Nuno Lopes, Mahesh Ravishankar, MLIR
On Mon, Oct 7, 2019 at 3:28 PM Nuno Lopes <nuno....@ist.utl.pt> wrote:
Hi,

Regarding your 1st question, it's true that there's no undef instruction in
LLVM. Undef is a value, hence it *has* to be able to produce a different
result on each use (due to how RAUW works).
If undef was an instruction, it could have the semantics that all uses
observe the same value.

There are performance considerations to be had when you impose that all uses
see the same value. In particular, you may need to materialize a value and
keep it in a register so that all uses observe the same value.
Having undef take different values on different uses introduces a lot of
complexity in the optimizers.

But it also simplifies other optimizers right? Not trying to downplay the other problems, your paper did a good job at it :)
I seemed to me that many peepholes can take local decision and propagate undef without having to "remember" or encode the value they chose to based their early decision on.
It means that `if x == 0 && x == 1` is allowed to evaluate to true when x is undef in the IR, because the two conditions can be evaluated in isolation.

-- 
Mehdi



 
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Mahesh Ravishankar

unread,
Oct 7, 2019, 7:27:35 PM10/7/19
to Nuno Lopes, MLIR
@nuno.lopes 
Thanks for the pointer to the paper (had seen this before, but it was out of my cache and is a great read!).
1) I agree with your assessment about complexity of optimization passes if every use represents a different bit pattern. My goal is to avoid that as well.
2) To provide more context, SPIR-V is a binary format designed to be a portable IR for GPU programs. In addition any hardware that has a Vulkan capable driver must support execution of kernels specified in the SPIR-V format. The spec does not have a specification for poison values, but does have a specification of Undef value.
3) The SPIR-V dialect in MLIR is designed with two objectives: 1) To be able to generate SPIR-V binary from MLIR, and 2) To allow for compiler analysis/optimizations at this level. MLIR also has a way to deserialize a SPIR-V binary blob (generated from a third party compiler) into MLIR dialect. Given this we have to support the OpUndef instruction as specified by the spec. So the SPIR-V dialect has to support spec semantics exactly. Even if we use poison value in the SPIR-V dialect, other compilers might not, and this means, undef needs to supported in SPIR-V dialect.


On Mon, Oct 7, 2019 at 3:28 PM Nuno Lopes <nuno....@ist.utl.pt> wrote:


--
Mahesh

Mahesh Ravishankar

unread,
Oct 7, 2019, 7:34:51 PM10/7/19
to Sanjoy Das, MLIR
On Mon, Oct 7, 2019 at 3:56 PM Sanjoy Das <san...@google.com> wrote:
On Mon, Oct 7, 2019 at 2:58 PM 'Mahesh Ravishankar' via MLIR
<ml...@tensorflow.org> wrote:
>
> I wanted some guidance on how to handle undef values in the SPIR-V dialect. Please see below for details. Some of the details below requires some basic understanding of SPIR-V dialect.
>
> Issue:
>
> The aim here is to model the OpUndef instruction in the SPIR-V dialect. The semantics of this according to the spec is that the instruction creates an SSA value and each use of the SSA value produces a different bit-pattern (similar to LLVM's undef operation with one difference highlighted below). We can model this directly in the SPIR-V dialect, i.e. an spirv::UndefOp that produces an SSA value with the semantics that every use produces a different bit-pattern.
> I find that hard to reason about, since this makes an SSA value produced by the spirv::UndefOp different compared to other SSA values.
>
> Current solution I have in mind :
>
> An alternative way I was thinking of doing this is to define the semantics of spirv::UndefOp as producing an SSA value which represents the same bit-pattern on every use. With this, some special handling is needed during deserialization of a SPIR-V binary into the SPIR-V dialect in MLIR.

This isn't needed for correctness though right?  Since every use of
OpUndef _could have_ had the same value?  It would even be correct to
"deserialize" each OpUndef into a constant zero.

I think for correctness we need to model it has it "need not" have the same value. The paper that Nuno pointed to (and the LLVM Reference guide) talks about some cases where saying that they could have the same value might lead to wrong results. For example 
  xor undef, undef 
is not 0 (which would be the case if undef is the same value.
 

However, I think we'll have a problem going the other way, from the
SPIR-V dialect to OpUndef.  If spirv::UndefOp is defined to produce a
consistent value then it cannot be lowered into OpUndef which will
produce a different value on each use.

I can't say for sure, but I don't think that will be an issue. In SPIR-V dialect if the semantics is that every use of the value produced by spirv::UndefOp is the same, when you serialize it, the Vulkan driver that consumes the SPIR-V binary will treat it as different value at every use. It might result in suboptimal machine code (in which case it is upto the SPIR-V dialect optimization passes to mitigate that), but it shouldn't produce wrong results.
 

-- Sanjoy

> To stay consistent with the SPIR-V spec, a new UndefOp is created on every use of the result <id> generated by an OpUndef instruction. This is consistent with SPIR-V spec and we don't need to treat every use of the result of a spirv::UndefOp as a different bit-pattern.
>
> Questions
> 1) I was looking at what LLVM does for undef. The description seems to be consistent with SPIR-V, but there is a slight difference AFAICS. There is no "undef instruction". The undef value is directly embedded into the IR where needed, so this issue does not arise. Could someone confirm this?
> 2) The LLVM dialect has a llvm.mlir.undef operation that has a specification similar to what the spirv::UndefOp. I think (havent verified), when lowered to LLVM IR an undef value is being created at every use of the result SSA value. This effectively means that the LLVM dialect is modeling undef value as having a different bit pattern on every use. Is there any guidance/opinions of how undef values should/would be modeled in MLIR?
>
> Finally though, I am not really sure there is any observable difference between the two approaches. So this might all be a moot point. Some input here would be very useful to cover any blindspots on my side.
>
> --
> You received this message because you are subscribed to the Google Groups "MLIR" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/067202df-7e32-409f-9418-7ccc82942b9b%40tensorflow.org.


--
Mahesh

Sanjoy Das

unread,
Oct 7, 2019, 7:44:03 PM10/7/19
to Mahesh Ravishankar, MLIR
On Mon, Oct 7, 2019 at 4:34 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
>
>
>
> On Mon, Oct 7, 2019 at 3:56 PM Sanjoy Das <san...@google.com> wrote:
>>
>> On Mon, Oct 7, 2019 at 2:58 PM 'Mahesh Ravishankar' via MLIR
>> <ml...@tensorflow.org> wrote:
>> >
>> > I wanted some guidance on how to handle undef values in the SPIR-V dialect. Please see below for details. Some of the details below requires some basic understanding of SPIR-V dialect.
>> >
>> > Issue:
>> >
>> > The aim here is to model the OpUndef instruction in the SPIR-V dialect. The semantics of this according to the spec is that the instruction creates an SSA value and each use of the SSA value produces a different bit-pattern (similar to LLVM's undef operation with one difference highlighted below). We can model this directly in the SPIR-V dialect, i.e. an spirv::UndefOp that produces an SSA value with the semantics that every use produces a different bit-pattern.
>> > I find that hard to reason about, since this makes an SSA value produced by the spirv::UndefOp different compared to other SSA values.
>> >
>> > Current solution I have in mind :
>> >
>> > An alternative way I was thinking of doing this is to define the semantics of spirv::UndefOp as producing an SSA value which represents the same bit-pattern on every use. With this, some special handling is needed during deserialization of a SPIR-V binary into the SPIR-V dialect in MLIR.
>>
>> This isn't needed for correctness though right? Since every use of
>> OpUndef _could have_ had the same value? It would even be correct to
>> "deserialize" each OpUndef into a constant zero.
>
>
> I think for correctness we need to model it has it "need not" have the same value. The paper that Nuno pointed to (and the LLVM Reference guide) talks about some cases where saying that they could have the same value might lead to wrong results. For example
> xor undef, undef
> is not 0 (which would be the case if undef is the same value.

It is still legal fold `xor undef, undef` to 0, it is just not
_necessarily_ 0. That is, the LLVM program `print(xor undef, undef)`
non-deterministically picks a value to print, and the compiler is
allowed to "refine" this program to print a specific value. This is
like folding `undef == 0` to true.

>> However, I think we'll have a problem going the other way, from the
>> SPIR-V dialect to OpUndef. If spirv::UndefOp is defined to produce a
>> consistent value then it cannot be lowered into OpUndef which will
>> produce a different value on each use.
>
> I can't say for sure, but I don't think that will be an issue. In SPIR-V dialect if the semantics is that every use of the value produced by spirv::UndefOp is the same, when you serialize it, the Vulkan driver that consumes the SPIR-V binary will treat it as different value at every use. It might result in suboptimal machine code (in which case it is upto the SPIR-V dialect optimization passes to mitigate that), but it shouldn't produce wrong results.

Maybe we're talking past each other, but let's say a program in the
SPIR-V dialect does:

x = spirv::UndefOp()
y = xor x, x
print(y)

Then in the SPIR-V dialect, where spirv::UndefOp produces a
well-defined fixed value, the program always prints 0. However, if
you lower this to SPIR-V

x = OpUndef
y = xor x, x
print(y)

then the program might print a non-zero value which is an incorrect
compilation of the source program.

-- Sanjoy

Mahesh Ravishankar

unread,
Oct 7, 2019, 7:58:59 PM10/7/19
to Sanjoy Das, MLIR
Thanks for that. I think you are right. For this program,

x = spirv::UndefOp()
y = xor x, x
print(y)  

the SPIR-V dialect says you should have y == 0. If you just serialize this as is, the serialized binary does have different semantics. I hadn't concretized it this way in my mind, but thought there might be an issue I haven't fully worked through. So one thought I had on top of what I sent out in my first email, was to have an additional restriction in the SPIR-V dialect that there can be only one use of a value from a spirv::UndefOp. That would avoid situations like this. I just wasn't sure if it was needed, but your example seems to show that this restriction is required for correctness.

FYI, on the reverse path, if you take this SPIR-V binary :

x = OpUndef
y = xor x, x
print(y)  

and deserialize it to the SPIR-V dialect (and this happens right now), the correct representation of the above would be.

x = spirv::UndefOp()
z = spirv::UndefOp()
y = xor x, z
print(y)    

The original SPIR-V binary semantics is that it can print 0, but not necessarily. Same holds true for the program in SPIR-V dialect. x could be equal to z, but doesn't need to be.

 

>
>>
>>
>> -- Sanjoy
>>
>> > To stay consistent with the SPIR-V spec, a new UndefOp is created on every use of the result <id> generated by an OpUndef instruction. This is consistent with SPIR-V spec and we don't need to treat every use of the result of a spirv::UndefOp as a different bit-pattern.
>> >
>> > Questions
>> > 1) I was looking at what LLVM does for undef. The description seems to be consistent with SPIR-V, but there is a slight difference AFAICS. There is no "undef instruction". The undef value is directly embedded into the IR where needed, so this issue does not arise. Could someone confirm this?
>> > 2) The LLVM dialect has a llvm.mlir.undef operation that has a specification similar to what the spirv::UndefOp. I think (havent verified), when lowered to LLVM IR an undef value is being created at every use of the result SSA value. This effectively means that the LLVM dialect is modeling undef value as having a different bit pattern on every use. Is there any guidance/opinions of how undef values should/would be modeled in MLIR?
>> >
>> > Finally though, I am not really sure there is any observable difference between the two approaches. So this might all be a moot point. Some input here would be very useful to cover any blindspots on my side.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "MLIR" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
>> > To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/067202df-7e32-409f-9418-7ccc82942b9b%40tensorflow.org.
>
>
>
> --
> Mahesh


--
Mahesh

Sanjoy Das

unread,
Oct 7, 2019, 8:47:54 PM10/7/19
to Mahesh Ravishankar, MLIR
On Mon, Oct 7, 2019 at 4:58 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
> Thanks for that. I think you are right. For this program,
>
> x = spirv::UndefOp()
> y = xor x, x
> print(y)
>
> the SPIR-V dialect says you should have y == 0. If you just serialize this as is, the serialized binary does have different semantics. I hadn't concretized it this way in my mind, but thought there might be an issue I haven't fully worked through. So one thought I had on top of what I sent out in my first email, was to have an additional restriction in the SPIR-V dialect that there can be only one use of a value from a spirv::UndefOp.

This might not be sufficient if SPIR-V has control flow and/or
function calls since those might duplicate uses.

For instance this program (if expressible) in the SPIR-V dialect
always prints the same value in each iteration:

x = spirv::UndefOp()
for (...) {
print(x);
}

but directly translating this to SPIR-V will cause us to (potentially)
print a different value on each iteration. Similarly the following
program, if expressible, always prints 0 in the SPIR-V dialect but
might print non-zero when converted to SPIR-V:

void f(int x) {
print(xor x x)
}

...
f(spirv::UndefOp())
...

Something that might be sufficient: can we implement a "freeze"
operation in spir-v (like in the paper Nuno linked) that forces SPIR-V
to pick a concrete value for undef? Then we could lower
spirv::UndefOp() to Freeze(OpUndef()).

> That would avoid situations like this. I just wasn't sure if it was needed, but your example seems to show that this restriction is required for correctness.
>
> FYI, on the reverse path, if you take this SPIR-V binary :
>
> x = OpUndef
> y = xor x, x
> print(y)
>
> and deserialize it to the SPIR-V dialect (and this happens right now), the correct representation of the above would be.
>
> x = spirv::UndefOp()
> z = spirv::UndefOp()
> y = xor x, z
> print(y)
>
> The original SPIR-V binary semantics is that it can print 0, but not necessarily. Same holds true for the program in SPIR-V dialect. x could be equal to z, but doesn't need to be.

Agreed. Duplicating spirv::UndefOp() on every use is correct but not necessary.

-- Sanjoy

Lei Zhang

unread,
Oct 7, 2019, 9:37:22 PM10/7/19
to Sanjoy Das, Mahesh Ravishankar, MLIR

Thanks,
Lei


On Mon, Oct 7, 2019 at 6:56 PM 'Sanjoy Das' via MLIR <ml...@tensorflow.org> wrote:
On Mon, Oct 7, 2019 at 2:58 PM 'Mahesh Ravishankar' via MLIR
<ml...@tensorflow.org> wrote:
>
> I wanted some guidance on how to handle undef values in the SPIR-V dialect. Please see below for details. Some of the details below requires some basic understanding of SPIR-V dialect.
>
> Issue:
>
> The aim here is to model the OpUndef instruction in the SPIR-V dialect. The semantics of this according to the spec is that the instruction creates an SSA value and each use of the SSA value produces a different bit-pattern (similar to LLVM's undef operation with one difference highlighted below). We can model this directly in the SPIR-V dialect, i.e. an spirv::UndefOp that produces an SSA value with the semantics that every use produces a different bit-pattern.
> I find that hard to reason about, since this makes an SSA value produced by the spirv::UndefOp different compared to other SSA values.
>
> Current solution I have in mind :
>
> An alternative way I was thinking of doing this is to define the semantics of spirv::UndefOp as producing an SSA value which represents the same bit-pattern on every use. With this, some special handling is needed during deserialization of a SPIR-V binary into the SPIR-V dialect in MLIR.

This isn't needed for correctness though right?  Since every use of
OpUndef _could have_ had the same value?  It would even be correct to
"deserialize" each OpUndef into a constant zero.

Yes technically we can, but by doing this at the very beginning we are limiting the optimization opportunity there right? My take on undef values is that they are meant to provide flexibility to compilers so compilers can choose a suitable value interpretation for it as fit.


However, I think we'll have a problem going the other way, from the
SPIR-V dialect to OpUndef.  If spirv::UndefOp is defined to produce a
consistent value then it cannot be lowered into OpUndef which will
produce a different value on each use.

To provide some background here: the SPIR-V dialect acts as the in-memory representation of the binary format and ops defined in the SPIR-V dialect follows the semantics defined in the SPIR-V spec.  We change how ops are represented syntactically to leverage MLIR mechanisms, but we don't adjust op semantics for ops defined in SPIR-V. The serialization/deserialization is just for importing/exporting and format conversion. It is not a step of lowering/raising.
 

-- Sanjoy

> To stay consistent with the SPIR-V spec, a new UndefOp is created on every use of the result <id> generated by an OpUndef instruction. This is consistent with SPIR-V spec and we don't need to treat every use of the result of a spirv::UndefOp as a different bit-pattern.

I view this as still keeping the existing SPIR-V OpUndef semantics. We just require each such op has one user, thus eliminating the complexity of different uses have different bit patterns.

>
> Questions
> 1) I was looking at what LLVM does for undef. The description seems to be consistent with SPIR-V, but there is a slight difference AFAICS. There is no "undef instruction". The undef value is directly embedded into the IR where needed, so this issue does not arise. Could someone confirm this?
> 2) The LLVM dialect has a llvm.mlir.undef operation that has a specification similar to what the spirv::UndefOp. I think (havent verified), when lowered to LLVM IR an undef value is being created at every use of the result SSA value. This effectively means that the LLVM dialect is modeling undef value as having a different bit pattern on every use. Is there any guidance/opinions of how undef values should/would be modeled in MLIR?
>
> Finally though, I am not really sure there is any observable difference between the two approaches. So this might all be a moot point. Some input here would be very useful to cover any blindspots on my side.
>
> --
> You received this message because you are subscribed to the Google Groups "MLIR" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/067202df-7e32-409f-9418-7ccc82942b9b%40tensorflow.org.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Lei Zhang

unread,
Oct 7, 2019, 9:44:24 PM10/7/19
to Mahesh Ravishankar, Sanjoy Das, MLIR

Thanks,
Lei


I'm not sure that's the case? The SPIR-V dialect spv.undef does not define a different semantics as of today; it follows SPIR-V spec. The spec says different uses can see different bit pattern.
  

Lei Zhang

unread,
Oct 7, 2019, 9:55:40 PM10/7/19
to Sanjoy Das, Mahesh Ravishankar, MLIR
Add SPIR-V experts +John Kessenich +David Neto +Alan Baker +Steven Perron for inputs. :) 

Based on my understanding, SPIR-V's undef semantics relates to LLVM's and because many SPIR-V consumers are LLVM based, so interop with LLVM is important here. I was told that LLVM is rethinking its undef semantics and I think there are fuzziness on SPIR-V undef semantics side too. So it is a can of worms. :)

In MLIR right now we don't have a way to define "immediate values" given all are defined as ops and "linked" with SSA values. I agree with Mahesh that having an SSA value that changes its bit pattern is hard to reason about, so would be nice to have a better way to represent it if possible, better to be consistent with LLVM side.

Thanks,
Lei


--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.

Lei Zhang

unread,
Oct 7, 2019, 10:02:49 PM10/7/19
to Sanjoy Das, John Kessenich, David Neto, Alan Baker, Steven Perron, Mahesh Ravishankar, MLIR

Sanjoy Das

unread,
Oct 7, 2019, 10:05:32 PM10/7/19
to Lei Zhang, MLIR, Mahesh Ravishankar
On Mon, Oct 7, 2019 at 6:37 PM Lei Zhang <antia...@google.com> wrote:
Yes technically we can, but by doing this at the very beginning we are limiting the optimization opportunity there right? My take on undef values is that they are meant to provide flexibility to compilers so compilers can choose a suitable value interpretation for it as fit.

Agreed. I was just giving an extreme example, not suggesting we actually do it. 

Sanjoy Das

unread,
Oct 7, 2019, 10:23:04 PM10/7/19
to Lei Zhang, Mahesh Ravishankar, MLIR
The "every use has a different bit pattern" semantic does not
obviously follow from SSA semantics. Implementing it requires
teaching the entire optimizer about it. The paper Nuno linked has
some detailed examples on the implications of having an "undef" value.

-- Sanjoy

Mehdi AMINI

unread,
Oct 7, 2019, 10:57:40 PM10/7/19
to Lei Zhang, Mahesh Ravishankar, Sanjoy Das, MLIR
It all depends if you stay inside the SPIR-V dialect: if you generate your own types then you can probably bake this semantic in, otherwise this falls apart.
If you use your UndefOp to generate a standard i32, and the xor is in the standard dialect, then any pattern in MLIR can use the SSA property on the i32 to fold the xor with 0.

-- 
Mehdi


 

Mehdi AMINI

unread,
Oct 7, 2019, 11:00:36 PM10/7/19
to Lei Zhang, Mahesh Ravishankar, Sanjoy Das, MLIR
To expand, the example of llvm.mlir.undef op is an illustration: it can have this semantic only because it stays in the LLVM dialect type system.

Mahesh Ravishankar

unread,
Oct 8, 2019, 12:58:56 AM10/8/19
to Mehdi AMINI, Lei Zhang, Sanjoy Das, MLIR
@Sanjoy Das : Going back to this example.

x = spirv::UndefOp()
for (...) {
  print(x);

This example seems to suggest that every dynamic instance of undef op *needs* to generate a different bit pattern. Wouldnt it be OK to still generate the same value at every dynamic instance here? I read the semantics as every static use is a different value, so statically you cannot say two undefs have the same value. I didnt think it meant every dynamic instance needs a different value.

 
--
Mahesh

Mahesh Ravishankar

unread,
Oct 8, 2019, 1:29:37 AM10/8/19
to Mehdi AMINI, Lei Zhang, Sanjoy Das, MLIR
This is true for SPIR-V dialect too. The SPIR-V dialect only uses FuncOp from standard dialect. So it can implement any semantics needed to help with optimizations/SPIR-V (de)serialization. But on this point, what would happen in LLVMIR dialect with the example above, i.e.

%0 = llvm.mlir.undef : i32
%1 = llvm.xor %0, %0 : i32

If no analysis is done here in the LLVM dialect, and it is just lowered to LLVM IR, I understand that it implements the LLVM semantics. But if I had to write a peephole optimizer for the LLVM dialect, then I should handle llvm.xor differently based on whether %0 is undef value or not?



--
Mahesh

Nuno Lopes

unread,
Oct 8, 2019, 4:46:19 AM10/8/19
to Mehdi AMINI, Mahesh Ravishankar, MLIR

On Mon, Oct 7, 2019 at 3:28 PM Nuno Lopes <nuno....@ist.utl.pt> wrote:

Hi,

Regarding your 1st question, it's true that there's no undef instruction in
LLVM. Undef is a value, hence it *has* to be able to produce a different
result on each use (due to how RAUW works).
If undef was an instruction, it could have the semantics that all uses
observe the same value.

There are performance considerations to be had when you impose that all uses
see the same value. In particular, you may need to materialize a value and
keep it in a register so that all uses observe the same value.
Having undef take different values on different uses introduces a lot of
complexity in the optimizers.

 

But it also simplifies other optimizers right? Not trying to downplay the other problems, your paper did a good job at it :)

I seemed to me that many peepholes can take local decision and propagate undef without having to "remember" or encode the value they chose to based their early decision on.

It means that `if x == 0 && x == 1` is allowed to evaluate to true when x is undef in the IR, because the two conditions can be evaluated in isolation.

 

True. Though there arent that many peephole optimizations that require this. In the past weve experimented with disabling undef altogether and checking which optimizations became wrong. And its a very small % (I dont have the concrete number, sorry).

 

Nuno

Nuno Lopes

unread,
Oct 8, 2019, 5:02:12 AM10/8/19
to Mahesh Ravishankar, Mehdi AMINI, MLIR, Lei Zhang, Sanjoy Das

I have no clue about SPIR-V, but I just had a quick look to the spec here: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpUndef

 

The semantics of their undef op seems identical to LLVMs: each use *may* observe a different value.

Some examples:

 

x = spirv::UndefOp()

y = xor x, x   // can yield any value

 

for (...) {

  print(x);  // can print any value in each iteration, including all equal or all different

  print(y);  // likewise

}

 

 

BTW, theres another twist in LLVM semantics, which Im not sure SPIR-V follows is that any expression based on undef may yield a different value each time its used. i.e. undefs can flip values even within sub-expressions. In the example above, xor x, x *may* yield a different value in each iteration of the loop.

 

Nuno

Sanjoy Das

unread,
Oct 8, 2019, 9:57:32 AM10/8/19
to Mahesh Ravishankar, Mehdi AMINI, Lei Zhang, MLIR
On Mon, Oct 7, 2019 at 9:58 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
>
> @Sanjoy Das : Going back to this example.
>
> x = spirv::UndefOp()
> for (...) {
> print(x);
> }
>
> This example seems to suggest that every dynamic instance of undef op *needs* to generate a different bit pattern. Wouldnt it be OK to still generate the same value at every dynamic instance here? I read the semantics as every static use is a different value, so statically you cannot say two undefs have the same value. I didnt think it meant every dynamic instance needs a different value.

I think we might be talking past each other again. Let me try to
rewrite the whole argument as I see it:

1. The SPIR-V dialect has an mlir::UndefOp operation that produces a
fixed but arbitrary bit pattern. We can consider making mlir::UndefOp
produce a different bit pattern per use, but that's an invasive and
problematic change for reasons outlined in the paper.

2. It is easy to lower the SPIR-V OpUndef into mlir::UndefOp since
OpUndef's semantics says it *may* produce a different value on each
use and mlir::UndefOp satisfies that contract.

3. However going from mlir::UndefOp to OpUndef is tricky because
OpUndef does not satisfy mlir::UndefOp's contract of producing a fixed
(but arbitrary) value per definition (it may produce a different value
per *use*).

As an example for (3) I wrote this program that picks an arbitrary
value and prints it on every iteration:

// MLIR SPIR-V dialect
x = spirv::UndefOp()
for (...) {
print(x);
}

Naively lowering this to:

// SPIR-V
x = OpUndef()
for (...) {
print(x);
}

will produce a program that *may* produce a different value on each
iteration. This exhibits a behavior that is not exhibited in the
original program which means this is a miscompile.

One way to think about this is: can I write an assert() that is
guaranteed to pass in the source program but that *may fail* in the
target program? For the loop example such an assertion would be:

// MLIR SPIR-V dialect
x = spirv::UndefOp()
total = 0
for (i = 0; i < 5; i++) {
total += x;
}
assert(total % 5 == 0);

-- Sanjoy

Mahesh Ravishankar

unread,
Oct 8, 2019, 12:51:53 PM10/8/19
to Sanjoy Das, Mehdi AMINI, Lei Zhang, MLIR
Agree with (1) and (2). Good example again about (3). I don't really know how to fix the issue though. You suggested using freeze (from the paper). I think the way I was thinking of spirv::UndefOp is already incorporates the freeze semantics. The value returned by an spirv::UndefOp is a freeze of an "undefined" value with every spirv::UndefOp generating a (potentially) different bit pattern. That still has the same issue that the semantics of the code as expressed in the SPIR-V dialect is different from what would be the semantics of the serialized SPIR-V binary. Is the suggestion that we mimic "freeze" in the SPIR-V binary as well? We can do this by doing the following transformation at the time of serialization to make the example above:

x = spirv::UndefOp
y  = spirv::AndOp x, 0
total = 0
for (i = 0; i < 5; i++) {
   total += y
}
assert(total %5 == 0)


--
Mahesh

Mahesh Ravishankar

unread,
Oct 8, 2019, 12:59:58 PM10/8/19
to Mehdi AMINI, Sanjoy Das, Lei Zhang, MLIR


On Tue, Oct 8, 2019 at 7:33 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Mon, Oct 7, 2019 at 10:27 PM Mahesh Ravishankar <ravish...@google.com> wrote:


This is true for SPIR-V too. The SPIR-V dialect only uses FuncOp from standard dialect.

As far as I can tell, SPIR-V dialect is using the standard types for its arithmetic: https://github.com/tensorflow/mlir/blob/master/test/Dialect/SPIRV/arithmetic-ops.mlir

I'm not sure I'd agree with considering a standard dialect i32 SSA value has having a different bit pattern on every use.

I am trying to avoid that as well :) . It could be done I think in theory. It comes down to how semantics of SPIR-V ops are right? An optimization pass that is dealing with only SPIR-V ops *can* do it. I am not sure how it would work when some other dialect is mixed with SPIR-V dialect, etc. Definitely that would open a can of worms. 

 

 
So it can implement any semantics needed to help with optimizations/SPIR-V (de)serialization. But on this point, what would happen in LLVMIR dialect with the example above.

%0 = llvm.mlir.undef : i32
%1 = llvm.xor %0, %0 : i32

If no analysis is done here in the LLVM dialect, and it is just lowered to LLVM IR, I understand that it implements the LLVM semantics. But if I had to write a peephole optimizer for the LLVM dialect, then I should handle llvm.xor differently based on whether %0 is undef value or not?

If you export this, you would have the undef being folded as a constant, not an SSA value I believe.

I agree that if you export this it will get folded. I am talking about semantics before exporting it. I can write an optimization pass in MLIR for LLVM dialect. Then to be consistent with LLVM's undef semantics, the optimization pass will have to treat every use of %0 as a different bit patterns. Basically optimizing llvm.xor %0, %0 to 0 would be illegal.
 

-- 
Mehdi

 
--
Mahesh


--
Mahesh

Sanjoy Das

unread,
Oct 8, 2019, 4:49:51 PM10/8/19
to Mahesh Ravishankar, Mehdi AMINI, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 9:51 AM Mahesh Ravishankar
<ravish...@google.com> wrote:
> Agree with (1) and (2). Good example again about (3). I don't really know how to fix the issue though. You suggested using freeze (from the paper). I think the way I was thinking of spirv::UndefOp is already incorporates the freeze semantics. The value returned by an spirv::UndefOp is a freeze of an "undefined" value with every spirv::UndefOp generating a (potentially) different bit pattern. That still has the same issue that the semantics of the code as expressed in the SPIR-V dialect is different from what would be the semantics of the serialized SPIR-V binary. Is the suggestion that we mimic "freeze" in the SPIR-V binary as well?

Yes, that is what I was suggesting. spirv::UndefOp() can be lowered
into Freeze(OpUndef()) in the SPIR-V binary.

> We can do this by doing the following transformation at the time of serialization to make the example above:
>
> x = spirv::UndefOp
> y = spirv::AndOp x, 0

I assume you meant `sprirv::OrOp x, 0`.

In any case, I'm not sure if this is sufficient, `Or undef, 0` needs
to be `undef` as otherwise the optimizer cannot replace `Or x, 0`
with `x` without proving that x is not undef (which can be difficult).
However, if SPIR-V binaries do not need to be optimized further and
the Or x, 0 => x transform was not important then we could define `Or
0, x` to be freezing `x` and get this property.

I was thinking about using some special intrinsic that SPIR-V
optimizers do not optimize. Perhaps there is a way to inject a "no
op" inline assembly that is opaque to optimizers?

-- Sanjoy

Mahesh Ravishankar

unread,
Oct 8, 2019, 8:00:06 PM10/8/19
to Sanjoy Das, Mehdi AMINI, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 1:49 PM Sanjoy Das <san...@google.com> wrote:
On Tue, Oct 8, 2019 at 9:51 AM Mahesh Ravishankar
<ravish...@google.com> wrote:
> Agree with (1) and (2). Good example again about (3). I don't really know how to fix the issue though. You suggested using freeze (from the paper). I think the way I was thinking of spirv::UndefOp is already incorporates the freeze semantics. The value returned by an spirv::UndefOp is a freeze of an "undefined" value with every spirv::UndefOp generating a (potentially) different bit pattern. That still has the same issue that the semantics of the code as expressed in the SPIR-V dialect is different from what would be the semantics of the serialized SPIR-V binary. Is the suggestion that we mimic "freeze" in the SPIR-V binary as well?

Yes, that is what I was suggesting.  spirv::UndefOp() can be lowered
into Freeze(OpUndef()) in the SPIR-V binary.

> We can do this by doing the following transformation at the time of serialization to make the example above:
>
> x = spirv::UndefOp
> y  = spirv::AndOp x, 0

I assume you meant `sprirv::OrOp x, 0`.

In any case, I'm not sure if this is sufficient, `Or undef, 0` needs
to be `undef` as otherwise the optimizer cannot replace `Or x, 0`
with `x` without proving that x is not undef (which can be difficult).
However, if SPIR-V binaries do not need to be optimized further and
the Or x, 0 => x transform was not important then we could define `Or
0, x` to be freezing `x` and get this property.

I was thinking about using some special intrinsic that SPIR-V
optimizers do not optimize.  Perhaps there is a way to inject a "no
op" inline assembly that is opaque to optimizers?

I am not aware of any such special intrinsic. Maybe @Lei Zhang knows about some way of freezing it.
Since we cannot assume that the compilers within Vulkan drivers (that convert SPIR-V binary to machine code) implement freeze semantics, there might be an issue supporting spirv::UndefOps. The only thing we can be sure of is that converting a SPIR-V binary to SPIR-V dialect would preserve semantics. I am not sure that going from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary we can gaurantee that the semantics of the initial and final SPIR-V binary are the same if the original binary had an OpUndef instruction unless undef is supported the way LLVM does either within MLIR or the SPIR-V dialect in MLIR


--
Mahesh

Mehdi AMINI

unread,
Oct 8, 2019, 8:54:57 PM10/8/19
to Mahesh Ravishankar, Sanjoy Das, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 5:00 PM Mahesh Ravishankar <ravish...@google.com> wrote:

On Tue, Oct 8, 2019 at 1:49 PM Sanjoy Das <san...@google.com> wrote:
On Tue, Oct 8, 2019 at 9:51 AM Mahesh Ravishankar
<ravish...@google.com> wrote:
> Agree with (1) and (2). Good example again about (3). I don't really know how to fix the issue though. You suggested using freeze (from the paper). I think the way I was thinking of spirv::UndefOp is already incorporates the freeze semantics. The value returned by an spirv::UndefOp is a freeze of an "undefined" value with every spirv::UndefOp generating a (potentially) different bit pattern. That still has the same issue that the semantics of the code as expressed in the SPIR-V dialect is different from what would be the semantics of the serialized SPIR-V binary. Is the suggestion that we mimic "freeze" in the SPIR-V binary as well?

Yes, that is what I was suggesting.  spirv::UndefOp() can be lowered
into Freeze(OpUndef()) in the SPIR-V binary.

> We can do this by doing the following transformation at the time of serialization to make the example above:
>
> x = spirv::UndefOp
> y  = spirv::AndOp x, 0

I assume you meant `sprirv::OrOp x, 0`.

In any case, I'm not sure if this is sufficient, `Or undef, 0` needs
to be `undef` as otherwise the optimizer cannot replace `Or x, 0`
with `x` without proving that x is not undef (which can be difficult).
However, if SPIR-V binaries do not need to be optimized further and
the Or x, 0 => x transform was not important then we could define `Or
0, x` to be freezing `x` and get this property.

I was thinking about using some special intrinsic that SPIR-V
optimizers do not optimize.  Perhaps there is a way to inject a "no
op" inline assembly that is opaque to optimizers?

I am not aware of any such special intrinsic. Maybe @Lei Zhang knows about some way of freezing it.
Since we cannot assume that the compilers within Vulkan drivers (that convert SPIR-V binary to machine code) implement freeze semantics, there might be an issue supporting spirv::UndefOps. The only thing we can be sure of is that converting a SPIR-V binary to SPIR-V dialect would preserve semantics. I am not sure that going from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary we can gaurantee that the semantics of the initial and final SPIR-V binary are the same if the original binary had an OpUndef instruction unless undef is supported the way LLVM does either within MLIR or the SPIR-V dialect in MLIR

I can understand that *lowering* to the SPIR-V dialect with undef implies that you can't emit a SPIR-V binary with undef without some sort of freezing (assuming the SPIR-V dialect undef does not match the spec).
However to invalidate the binary -> SPIR-V dialect -> binary flow, you need to perform a transformation on the SPIR-V dialect which would be invalid on the binary undef semantics, what would that be? Isn't the "every use is the same" strictly more restrictive than "any value is possible all the time"?

Maybe something like:

%x = spv.Undef()
if (%x > 0) {
  if (%y == 1) {
    %z = %y / %x 
  }
}

We can speculate the division ahead of the test on %y because even though it could trap if %x == 0, we're covered by the test on %x > 0. But this test is only protecting the division if all the uses are producing the same values, which wouldn't be true in SPIR-V binary.

-- 
Mehdi

Sanjoy Das

unread,
Oct 8, 2019, 10:56:45 PM10/8/19
to Mehdi AMINI, Lei Zhang, MLIR, Mahesh Ravishankar
+1

More generally we can’t analyze values based on control flow: %x under “if (predicate(%x))” does not necessarily satisfy “predicate” if it is undef. 

— Sanjoy 

Mahesh Ravishankar

unread,
Oct 8, 2019, 11:58:45 PM10/8/19
to Mehdi AMINI, Sanjoy Das, Lei Zhang, MLIR
I can't think of a concrete case where the SPIR-V dialect will do something that will change the semantics of the original SPIR-V binary, but I don't know if I can rule it out either. Take your example itself. The SPIR-V binary representation of that snippet would have a single OpUndef and the result <id> used in the two places where %x is referred to. According to the LLVM Reference manual 

"undef’ “variable” can arbitrarily change its value over its “live range”."

Just for the sake of argument, it seems valid for a compiler to assume that %x is 1 at the first use and %x is 0 at the second. So it is safe to replace that code snippet with "unreachable" (or trap). But the binary round tripped through SPIR-V dialect would not have the same semantics. This is a totally cooked up example, but seems to me that it is hard to reason about all the implications of the different semantics in SPIR-V dialect and SPIR-V binary when round tripping it through SPIR-V dialect. 


--
Mahesh

Sanjoy Das

unread,
Oct 9, 2019, 12:19:10 AM10/9/19
to Mahesh Ravishankar, Mehdi AMINI, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 8:58 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
> I can't think of a concrete case where the SPIR-V dialect will do something that will change the semantics of the original SPIR-V binary, but I don't know if I can rule it out either.

Maybe I'm reading too much into this, but I don't think this is the
best way to think about the problem. The only two questions we should
ask are:

- How can we correctly convert SPIR-V binaries to MLIR?
- How can we correctly convert MLIR to SPIR-V binaries?

Ideally these would be well-posed questions because both MLIR and
SPIR-V have well defined semantics.

If either of the two steps are incorrect we have a miscompile in a
SPIR-V -> MLIR -> SPIR-V pipeline, period. It does not matter if we
happen to have an optimization today that manifests this miscompile or
not.

The second order way of reasoning, is this MLIR transform valid for
the "original" SPIR-V binary, unnecessarily confusing I think.

-- Sanjoy

Mahesh Ravishankar

unread,
Oct 9, 2019, 1:37:48 AM10/9/19
to Sanjoy Das, Mehdi AMINI, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 9:19 PM Sanjoy Das <san...@google.com> wrote:
On Tue, Oct 8, 2019 at 8:58 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
> I can't think of a concrete case where the SPIR-V dialect will do something that will change the semantics of the original SPIR-V binary, but I don't know if I can rule it out either.

Maybe I'm reading too much into this, but I don't think this is the
best way to think about the problem.  The only two questions we should
ask are:

 - How can we correctly convert SPIR-V binaries to MLIR?
 - How can we correctly convert MLIR to SPIR-V binaries?

Ideally these would be well-posed questions because both MLIR and
SPIR-V have well defined semantics.

If either of the two steps are incorrect we have a miscompile in a
SPIR-V -> MLIR -> SPIR-V pipeline, period.  It does not matter if we
happen to have an optimization today that manifests this miscompile or
not.

The second order way of reasoning, is this MLIR transform valid for
the "original" SPIR-V binary, unnecessarily confusing I think.

I am sorry if I wasn't being clear. I am agreeing with what you are saying. It seems to me that there is no way of getting the semantics right between MLIR and SPIR-V unless we have a mechanism to handle undef values in MLIR consistent with SPIR-V/LLVM (or SPIR-V has a more robust mechanism for "freeze"). The immediate use case for me is to make sure that the SPIR-V -> MLIR -> SPIR-V works fine. But as you pointed out, if MLIR -> SPIR-V miscompiles, there is no roundtrip. Using a dummy Or operation to mimic freeze does not actually seem to solve the problem. A compiler within the Vulkan driver might optimize the Or operation away. AFAICS, there is no solution for the correct compilation of MLIR -> SPIR-V at this point...


--
Mahesh

Mehdi AMINI

unread,
Oct 9, 2019, 1:52:26 AM10/9/19
to Mahesh Ravishankar, Sanjoy Das, Lei Zhang, MLIR
On Tue, Oct 8, 2019 at 10:37 PM Mahesh Ravishankar <ravish...@google.com> wrote:


On Tue, Oct 8, 2019 at 9:19 PM Sanjoy Das <san...@google.com> wrote:
On Tue, Oct 8, 2019 at 8:58 PM Mahesh Ravishankar
<ravish...@google.com> wrote:
> I can't think of a concrete case where the SPIR-V dialect will do something that will change the semantics of the original SPIR-V binary, but I don't know if I can rule it out either.

Maybe I'm reading too much into this, but I don't think this is the
best way to think about the problem.  The only two questions we should
ask are:

 - How can we correctly convert SPIR-V binaries to MLIR?
 - How can we correctly convert MLIR to SPIR-V binaries?

Ideally these would be well-posed questions because both MLIR and
SPIR-V have well defined semantics.

If either of the two steps are incorrect we have a miscompile in a
SPIR-V -> MLIR -> SPIR-V pipeline, period.  It does not matter if we
happen to have an optimization today that manifests this miscompile or
not.

The second order way of reasoning, is this MLIR transform valid for
the "original" SPIR-V binary, unnecessarily confusing I think.

I am sorry if I wasn't being clear. I am agreeing with what you are saying. It seems to me that there is no way of getting the semantics right between MLIR and SPIR-V unless we have a mechanism to handle undef values in MLIR consistent with SPIR-V/LLVM (or SPIR-V has a more robust mechanism for "freeze"). The immediate use case for me is to make sure that the SPIR-V -> MLIR -> SPIR-V works fine. But as you pointed out, if MLIR -> SPIR-V miscompiles, there is no roundtrip. Using a dummy Or operation to mimic freeze does not actually seem to solve the problem. A compiler within the Vulkan driver might optimize the Or operation away. AFAICS, there is no solution for the correct compilation of MLIR -> SPIR-V at this point...

There is no solution to preserve the undef in the output, is there anything that prevents you from always folding undef to (let's say) the constant 0?

-- 
Mehdi

Sanjoy Das

unread,
Oct 9, 2019, 10:30:24 AM10/9/19
to David Neto, Lei Zhang, John Kessenich, Alan Baker, Steven Perron, Mahesh Ravishankar, MLIR
Is OpUndef the only way to get undef? Or can (e.g.) loading from
uninitialized memory also produce undef?

-- Sanjoy

On Wed, Oct 9, 2019 at 7:24 AM David Neto <dn...@google.com> wrote:
>
> When you are translating *into* SPIR-V, then you can make your life simple by using OpConstantNull instead of OpUndef.
> Then there is no confusion. The worst effect of this, as far as I know, is that it may force downstream compilers to save and restore a register value, which *might* slow down the code. OpUndef, and undef values in general, is a way of saying to the compiler "hey you can ignore the current value of the register you're about to use because my code is going to overwrite it soon anyway".
>
> You can think of OpConstantNull as a "freeze" to a specific value, i.e. zero. :-)
>
> david

Sanjoy Das

unread,
Oct 9, 2019, 11:38:56 AM10/9/19
to John Kessenich, David Neto, Lei Zhang, Alan Baker, Steven Perron, Mahesh Ravishankar, MLIR
On Wed, Oct 9, 2019 at 8:28 AM John Kessenich <johnke...@google.com> wrote:
> Yes, we are discussing having other operations that produce undefined results produce the same kind of undef as OpUndef.

And so is it true that we can't rely on making all the OpUndef's
explicit in the graph so we can't necessarily replace them by a safe
constant value?

> BTW, I don't think turning undef into ConstantNull is satisfactory: The optimization allowed by undef isn't an aside, but is a primary purpose.

OTOH we're talking about serializing an already optimized (?) MLIR
program so maybe there isn't a need to re-optimize it once we have it
as a SPIR-V binary? And we don't need to fold it to `0` necessarily,
we could fold it to any constant "convenient" for optimization.

However, this is a moot point unless we have a way to reliably replace
all undefs, not just the ones that are explicitly represented as
OpUndef.

-- Sanjoy

>
> JohnK

Mehdi Amini

unread,
Oct 9, 2019, 12:24:51 PM10/9/19
to Lei Zhang, Sanjoy Das, John Kessenich, David Neto, Alan Baker, Steven Perron, Mahesh Ravishankar, MLIR
(Actually did it)

On Wed, Oct 9, 2019 at 9:22 AM Mehdi Amini <ami...@google.com> wrote:
Hi,

It seems like some emails answers to the list were lost, we misconfigured it to just reject emails from non-members instead of going through moderation. This should be fixed now.

Sorry for the inconvenience, feel free to resend the messages!

Best,

-- 
Mehdi


Mahesh Ravishankar

unread,
Oct 9, 2019, 2:06:30 PM10/9/19
to Sanjoy Das, Mehdi Amini, John Kessenich, David Neto, Lei Zhang, Alan Baker, Steven Perron, MLIR
On Wed, Oct 9, 2019 at 8:38 AM Sanjoy Das <san...@google.com> wrote:
On Wed, Oct 9, 2019 at 8:28 AM John Kessenich <johnke...@google.com> wrote:
> Yes, we are discussing having other operations that produce undefined results produce the same kind of undef as OpUndef.

And so is it true that we can't rely on making all the OpUndef's
explicit in the graph so we can't necessarily replace them by a safe
constant value?

> BTW, I don't think turning undef into ConstantNull is satisfactory:  The optimization allowed by undef isn't an aside, but is a primary purpose.

OTOH we're talking about serializing an already optimized (?) MLIR
program so maybe there isn't a need to re-optimize it once we have it
as a SPIR-V binary?  And we don't need to fold it to `0` necessarily,
we could fold it to any constant "convenient" for optimization.

However, this is a moot point unless we have a way to reliably replace
all undefs, not just the ones that are explicitly represented as
OpUndef.

-- Sanjoy

>
> JohnK
>
>
> On Wed, Oct 9, 2019 at 8:30 AM Sanjoy Das <san...@google.com> wrote:
>>
>> Is OpUndef the only way to get undef?  Or can (e.g.) loading from
>> uninitialized memory also produce undef?
>>
>> -- Sanjoy
>>
>> On Wed, Oct 9, 2019 at 7:24 AM David Neto <dn...@google.com> wrote:
>> >
>> > When you are translating *into* SPIR-V, then you can make your life simple by using OpConstantNull instead of OpUndef.
>> > Then there is no confusion.  The worst effect of this, as far as I know, is that it may force downstream compilers to save and restore a register value, which *might* slow down the code.  OpUndef, and undef values in general, is a way of saying to the compiler "hey you can ignore the current value of the register you're about to use because my code is going to overwrite it soon anyway".
>> >
>> > You can think of OpConstantNull as a "freeze" to a specific value, i.e. zero.   :-)

Thanks David for the note. I maybe overthinking this, but if we are going just from {something} -> SPIR-V dialect -> SPIR-V binary, then I think we are OK with using a constant value all the time ( @Mehdi Amini : you pointed this out as well). My concern is when we go from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary. If there is an undef operation used in the original binary, and we go from a more general semantics of undef to a narrower semantics of constant value everywhere, and then serialize it back to the SPIR-V binary, I am not able to decide if that implies an incorrect change to the program semantics. Reasoning about it in the way @Sanjoy Das suggested as two separate steps, and as long as they are individually correct, the roundtripping seems to suggest that it is OK. But this relies on some undefined behavior. Given that drivers have a compiler that consumes the SPIR-V binary, which often is an LLVM-based compiler, we might have a situation where the original SPIR-V binary and the final SPIR-V binary do different things. AFAICS, this is still OK, because it is in the realm of undefined behavior. But if this happens often, it will make the path of SPIR-V binary -> SPIR-V dialect -> SPIR-V binary unusable. (might be really overthinking this at this point)


--
Mahesh

Mahesh Ravishankar

unread,
Oct 9, 2019, 6:04:40 PM10/9/19
to David Neto, Sanjoy Das, Mehdi Amini, John Kessenich, Lei Zhang, Alan Baker, Steven Perron, MLIR
Great! Thanks everyone for all the feedback! I think the discussion gives me a better idea of how to go about this.
As it stands, the deserializer in SPIR-V dialect already does the right thing while going from SPIR-V binary to MLIR. I think the consensus is to use a ConstantNull value for OpUndef during serialization. I plan to take the approach for now unless something else comes up.

W..R.T to undef vs undefined, I was referring to how undef values can lead to undefined behavior which optimizers can exploit. Something along the lines of the below (which is from the LLVM reference manual)
  %A = sdiv undef, %X
  %B = sdiv %X, undef
Safe:
  %A = 0
b: unreachable

On Wed, Oct 9, 2019 at 2:55 PM David Neto <dn...@google.com> wrote:
- I later saw that Mehdi already pointed out the "you can always replace with null" trick for the one direction conversion.

- FYI.  As part of the WebGPU effort, some folks surveyed sources of undefined behaviour and undefined values in SPIR-V. See the catalog here https://github.com/gpuweb/gpuweb/issues/34   It's two years old but a good start

- "when we go from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary. If there is an undef operation used in the original binary, and we go from a more general semantics of undef to a narrower semantics of constant value everywhere, and then serialize it back to the SPIR-V binary, I am not able to decide if that implies an incorrect change to the program semantics."
This is not incorrect. Yes, it is a narrowing of possible behaviours. However think of it this way:   Every valid execution of the final form of the module is also a valid execution of the original module.

- undefined behaviour vs. undef value.   These are two different things. Undefined behaviour is "the world can blow up" level of weirdness.  But "undef value" is "you get lots of possibilities for this one value, each time you evaluate/look at it" but that's the whole possible scope of weirdness.
   - example: of undefined behaviour:  writing outside the bounds of a storage buffer is undefined behaviour.  Data race is undefined behaviour
   - example of undef value:  OpUndef result.  Or the value you get from division by zero.  Or bit shift by too many its. Or OpVectorExtractDynamic with an out-of-bounds index.


david




--
Mahesh

David Neto

unread,
Oct 10, 2019, 11:13:15 AM10/10/19
to Mahesh Ravishankar, Sanjoy Das, Mehdi Amini, John Kessenich, Lei Zhang, Alan Baker, Steven Perron, MLIR
- I later saw that Mehdi already pointed out the "you can always replace with null" trick for the one direction conversion.

- FYI.  As part of the WebGPU effort, some folks surveyed sources of undefined behaviour and undefined values in SPIR-V. See the catalog here https://github.com/gpuweb/gpuweb/issues/34   It's two years old but a good start

- "when we go from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary. If there is an undef operation used in the original binary, and we go from a more general semantics of undef to a narrower semantics of constant value everywhere, and then serialize it back to the SPIR-V binary, I am not able to decide if that implies an incorrect change to the program semantics."
This is not incorrect. Yes, it is a narrowing of possible behaviours. However think of it this way:   Every valid execution of the final form of the module is also a valid execution of the original module.

- undefined behaviour vs. undef value.   These are two different things. Undefined behaviour is "the world can blow up" level of weirdness.  But "undef value" is "you get lots of possibilities for this one value, each time you evaluate/look at it" but that's the whole possible scope of weirdness.
   - example: of undefined behaviour:  writing outside the bounds of a storage buffer is undefined behaviour.  Data race is undefined behaviour
   - example of undef value:  OpUndef result.  Or the value you get from division by zero.  Or bit shift by too many its. Or OpVectorExtractDynamic with an out-of-bounds index.


david



David Neto

unread,
Oct 10, 2019, 1:45:10 PM10/10/19
to Mahesh Ravishankar, Sanjoy Das, Mehdi Amini, John Kessenich, Lei Zhang, Alan Baker, Steven Perron, MLIR
Sounds good to me.  And +1 to the example relating undef to undefined behaviour.

I agree with John that there is a potential for perf loss in converting  Undef to ConstantNull.
Also, that the potential for performance gain is the main reason to have Undef in the first place.

That said:
- My team's Clspv compiler lowers LLVM's undef into  SPIR-V OpConstantNull.  We did that to workaround a driver bug, but nobody has complained about performance so far, and I made this the default.

- One of the main places I see an undef value in LLVM code is in constructing a vector by using begining with an undef vector value, then repeatedly inserting scalars to fill all the slots. In this case the existence of the undef is an artifact from the fact that LLVM doesn't have a "create a vector all at once" instruction.  But SPIR-V does have OpCompositeConstruct to do exactly that.  I wrote a pass in Clspv to transform those LLVM chains-of-inserts into a single OpConstantConstruct (effectively).   It actually shortened the code and made it more clear.

- JF Bastien (C++ lead at Apple, formerly NaCl lead at Google) will talk at the upcoming LLVM Dev meeting about automatically initializing all local variables. The idea is to do this for security reasons, but I've seen him report that the perf hit is small.  See https://llvmdevmtg2019.sched.com/event/W2tz/making-ub-hurt-less-security-mitigations-through-automatic-variable-initialization 


I'd love to see updated perf data for the undef -> ConstantNull transform, or more specifically perf data focused on GPU compute.  In case anyone gets bored.  :-)


david

Sean Silva

unread,
Oct 10, 2019, 6:48:45 PM10/10/19
to MLIR
Thanks David for the note. I maybe overthinking this, but if we are going just from {something} -> SPIR-V dialect -> SPIR-V binary, then I think we are OK with using a constant value all the time ( ...@Mehdi Amini : you pointed this out as well). My concern is when we go from SPIR-V binary -> SPIR-V dialect -> SPIR-V binary. If there is an undef operation used in the original binary, and we go from a more general semantics of undef to a narrower semantics of constant value everywhere, and then serialize it back to the SPIR-V binary, I am not able to decide if that implies an incorrect change to the program semantics. Reasoning about it in the way ...@Sanjoy Das suggested as two separate steps, and as long as they are individually correct, the roundtripping seems to suggest that it is OK. But this relies on some undefined behavior. Given that drivers have a compiler that consumes the SPIR-V binary, which often is an LLVM-based compiler, we might have a situation where the original SPIR-V binary and the final SPIR-V binary do different things.

To be clear, the compiler can still be correct even if they do different things, as long as the resulting semantics are a refinement of the original semantics.

Note that in the presence of an op like spirv::UndefOp or OpUndef, the program does not have a single deterministic execution. It could have one of many different possible executions. The "refinement" correctness criterion for a compiler states that the set of possible executions after a transformation must be a subset (or equal to) the original set of possible program executions. 

That is, consider this program:

x = spirv::UndefOp()
print(x&2)

This program could print 0, 1, 2, or 3. So there are 4 possible executions. It is correct to transform the program into:

x = spirv::UndefOp()
print(x&1)

Now the program could print 0 or 1. This is a subset of the original set of possible executions, therefore this transformation is valid.

In particular, it is valid to convert into:

print(0)

Now the program has only 1 possible execution, but it was in the original set so that is okay.


The intuition behind this "refinement" correctness criterion is that the user who wrote the program admits that they do not care about which of the possible executions is chosen at runtime. Thus, as long as we (the compiler) ensure that at least one of the original set of possible executions happens at runtime, then we did our job correctly.


Sorry, this is from memory. I'm sure Sanjoy, Nuno, or someone have a proper citation for this "refinement" criterion.


-- Sean Silva
 
>> >>>> To unsubscribe from this group and stop receiving emails from it, send an email to ml...@tensorflow.org.


--
Mahesh

Mahesh Ravishankar

unread,
Oct 10, 2019, 7:11:01 PM10/10/19
to David Neto, Sanjoy Das, Mehdi Amini, John Kessenich, Lei Zhang, Alan Baker, Steven Perron, MLIR
On Thu, Oct 10, 2019 at 10:45 AM David Neto <dn...@google.com> wrote:
Sounds good to me.  And +1 to the example relating undef to undefined behaviour.

I agree with John that there is a potential for perf loss in converting  Undef to ConstantNull.
Also, that the potential for performance gain is the main reason to have Undef in the first place.

That said:
- My team's Clspv compiler lowers LLVM's undef into  SPIR-V OpConstantNull.  We did that to workaround a driver bug, but nobody has complained about performance so far, and I made this the default.

Thanks David! This is really good information to have. This was the main concern I had. I wasn't sure how setting constant value would impact the driver compilers. So good to know this has already been evaluated in the wild. 
 

- One of the main places I see an undef value in LLVM code is in constructing a vector by using begining with an undef vector value, then repeatedly inserting scalars to fill all the slots. In this case the existence of the undef is an artifact from the fact that LLVM doesn't have a "create a vector all at once" instruction.  But SPIR-V does have OpCompositeConstruct to do exactly that.  I wrote a pass in Clspv to transform those LLVM chains-of-inserts into a single OpConstantConstruct (effectively).   It actually shortened the code and made it more clear.

- JF Bastien (C++ lead at Apple, formerly NaCl lead at Google) will talk at the upcoming LLVM Dev meeting about automatically initializing all local variables. The idea is to do this for security reasons, but I've seen him report that the perf hit is small.  See https://llvmdevmtg2019.sched.com/event/W2tz/making-ub-hurt-less-security-mitigations-through-automatic-variable-initialization 


I'd love to see updated perf data for the undef -> ConstantNull transform, or more specifically perf data focused on GPU compute.  In case anyone gets bored.  :-)

+1!!


--
Mahesh

Mahesh Ravishankar

unread,
Oct 10, 2019, 7:19:55 PM10/10/19
to Sean Silva, MLIR
I totally agree with this. My main concern is how the change made in the SPIR-V dialect on choosing one execution impacts the driver compiler that will consume the binary generated by MLIR. The example I sent above was to indicate how the choice of undef value can lead to undefined behavior which then different compiler implementations can handle differently. So even though we did something right in the SPIR-V dialect, injecting a roundtrip through MLIR in existing workflows might result in different executions, which though correct, I would like to avoid. But given that clspv does the same, that gives me more confidence that its OK.
 
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/284c4c44-f056-4869-8777-d4ea190722ff%40tensorflow.org.


--
Mahesh

Mehdi Amini

unread,
Oct 10, 2019, 7:47:23 PM10/10/19
to Mahesh Ravishankar, Sean Silva, MLIR
Seems like you would like optimizations to preserve program behavior when the program can exhibit some undefined behavior. I'm not sure you can close the loop on this, fundamentally.

-- 
Mehdi


 

Mahesh Ravishankar

unread,
Oct 11, 2019, 1:07:57 AM10/11/19
to Mehdi Amini, Sean Silva, MLIR
+1. If the binary were closer to hardware it wouldn't concern me as muich. I was being cautious since the binary is consumed by another compiler. But if clspv does what is suggested, then it seems fine to me. Sorry, I realize I am being extremely pedantic on this. Just trying to evaluate the impact of the difference. 


--
Mahesh

Lei Zhang

unread,
Oct 11, 2019, 7:34:03 PM10/11/19
to David Neto, Mahesh Ravishankar, Sanjoy Das, Mehdi Amini, John Kessenich, Alan Baker, Steven Perron, MLIR
On Thu, Oct 10, 2019 at 1:45 PM David Neto <dn...@google.com> wrote:
Sounds good to me.  And +1 to the example relating undef to undefined behaviour.

I agree with John that there is a potential for perf loss in converting  Undef to ConstantNull.
Also, that the potential for performance gain is the main reason to have Undef in the first place.

That said:
- My team's Clspv compiler lowers LLVM's undef into  SPIR-V OpConstantNull.  We did that to workaround a driver bug, but nobody has complained about performance so far, and I made this the default.

- One of the main places I see an undef value in LLVM code is in constructing a vector by using begining with an undef vector value, then repeatedly inserting scalars to fill all the slots. In this case the existence of the undef is an artifact from the fact that LLVM doesn't have a "create a vector all at once" instruction.  But SPIR-V does have OpCompositeConstruct to do exactly that.  I wrote a pass in Clspv to transform those LLVM chains-of-inserts into a single OpConstantConstruct (effectively).   It actually shortened the code and made it more clear.

That is a useful pass to have in SPIR-V dialect too. We can also start from OpUndef and build up a composite by doing OpCompositeInsert on it in SPIR-V so might see this pattern in real-world SPIR-V blobs. (IIRC, we can generate such patterns in SPIRV-Tools.)
 

- JF Bastien (C++ lead at Apple, formerly NaCl lead at Google) will talk at the upcoming LLVM Dev meeting about automatically initializing all local variables. The idea is to do this for security reasons, but I've seen him report that the perf hit is small.  See https://llvmdevmtg2019.sched.com/event/W2tz/making-ub-hurt-less-security-mitigations-through-automatic-variable-initialization 


I'd love to see updated perf data for the undef -> ConstantNull transform, or more specifically perf data focused on GPU compute.  In case anyone gets bored.  :-)


david


On Wed, Oct 9, 2019 at 6:04 PM Mahesh Ravishankar <ravish...@google.com> wrote:
Great! Thanks everyone for all the feedback! I think the discussion gives me a better idea of how to go about this.
As it stands, the deserializer in SPIR-V dialect already does the right thing while going from SPIR-V binary to MLIR. I think the consensus is to use a ConstantNull value for OpUndef during serialization. I plan to take the approach for now unless something else comes up.

This sounds good to me too! Thanks Mahesh for driving this and thanks everyone's insightful feedback! :)

David Neto

unread,
Oct 12, 2019, 3:55:09 PM10/12/19
to Lei Zhang, Mahesh Ravishankar, Sanjoy Das, Mehdi Amini, John Kessenich, Alan Baker, Steven Perron, MLIR
This is the rewrite inserts pass in LLVM.


It handles a bunch of non-obvious cases.  Unfortunately I don't have separate per-pass tests for it, as it preceded Alan's infra work for per-pass tests.

David

Lei Zhang

unread,
Oct 17, 2019, 3:43:49 PM10/17/19
to David Neto, Mahesh Ravishankar, Sanjoy Das, Mehdi Amini, John Kessenich, Alan Baker, Steven Perron, MLIR
On Sat, Oct 12, 2019 at 3:55 PM David Neto <dn...@google.com> wrote:
This is the rewrite inserts pass in LLVM.


It handles a bunch of non-obvious cases.  Unfortunately I don't have separate per-pass tests for it, as it preceded Alan's infra work for per-pass tests.

Thanks! I've created https://github.com/tensorflow/mlir/issues/196 to track this.
Reply all
Reply to author
Forward
0 new messages