[LLVMdev] poison and select

330 views
Skip to first unread message

John Regehr

unread,
Sep 9, 2014, 9:53:30 AM9/9/14
to llv...@cs.uiuc.edu
In the section about poison values, the LLVM language reference manual says:

"Values other than phi nodes depend on their operands."

This implies that a select instruction's output can be poisoned by its
not-selected argument value. If select were poisoned only by its
selected argument, we would expect this fact to be mentioned
specifically, as it is for phi.

Next I'll show how poisoning by the not-selected value can introduce a
problem. Take this C program:

int printf(const char *, ...);
int foo(int x0) {
int x3 = x0 >> 27;
int x4 = x3 - 27;
int x2 = x4 ? x4 : (1 >> x4);
int x1 = x2 != 0;
return x1;
}
int arg;
int main (void) {
int x1 = foo(arg);
printf("%x\n", x1);
return 0;
}

This program has no UB and it should print "1'.

Clang at -O2 turns foo() into this:

define i32 @foo(i32 %x0) #0 {
entry:
%shr = ashr i32 %x0, 27
%sub = add nsw i32 %shr, -27
%tobool = icmp ne i32 %sub, 0
%shr1 = lshr i32 1, %sub
%cond = select i1 %tobool, i32 %sub, i32 %shr1
%cmp = icmp ne i32 %cond, 0
%conv = zext i1 %cmp to i32
ret i32 %conv
}

Although the translation is very straightforward, there's a problem:
%shr1 is a poison value and although it is not selected, it propagates
through the rest of the function and poisons the return value.

Obviously things are not going well if the source program contains no UB
and the emitted code returns a poison value. Either there has been a
miscompilation or else the LLVM documentation should be changed to make
it clear that the select instruction only has a value dependence on the
value that is selected.

The immediate problem here is that both Souper and Alive are happy for
foo() to return 0 instead of 1. Souper actually does this optimization
-- legally, of course, according to the reference manual.

Anyway this issue seemed tricky enough to bring up here instead of just
filing a PR.

Thanks,

John
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

David Majnemer

unread,
Sep 10, 2014, 10:42:41 AM9/10/14
to John Regehr, llv...@cs.uiuc.edu
While the documentation may not match the facts on the ground, InstructionSimplify does (in my opinion) the right thing; an undef operand doesn't mean the entire SelectInst folds away to undef:

John Regehr

unread,
Sep 10, 2014, 11:22:12 AM9/10/14
to David Majnemer, llv...@cs.uiuc.edu
> While the documentation may not match the facts on the ground, InstructionSimplify does (in my opinion) the
> right thing; an undef operand doesn't mean the entire SelectInst folds away to undef:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InstructionSimplify.cpp?revision=217342&view=m
> arkup#l2880

It seems pretty clear that this is the right semantics for select.

If we believe the documentation, then select is only useful when we can
prove that it won't introduce a spurious poison value. This will not often
be the case.

I'll go file a bug against the LLVM instruction reference to remind
someone to tweak the doc. My proposed fix would be:

* Values other than phi nodes and select instructions depend on their
operands.

* Phi nodes depend on the operand corresponding to their dynamic
predecessor basic block.

* Select instructions depend on their selected operand.

John

John Regehr

unread,
Sep 10, 2014, 11:38:05 AM9/10/14
to llv...@cs.uiuc.edu

Adve, Vikram Sadanand

unread,
Sep 10, 2014, 4:58:27 PM9/10/14
to <llvmdev@cs.uiuc.edu>
On Sep 10, 2014, at 6:38 PM, <llvmdev...@cs.uiuc.edu>
wrote:

>> While the documentation may not match the facts on the ground,?InstructionSimplify does (in my opinion) the
>> right thing; an undef operand doesn't mean the entire SelectInst folds away to undef:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InstructionSimplify.cpp?revision=217342&view=m
>> arkup#l2880
>
> It seems pretty clear that this is the right semantics for select.
>
> If we believe the documentation, then select is only useful when we can
> prove that it won't introduce a spurious poison value. This will not often
> be the case.
>
> I'll go file a bug against the LLVM instruction reference to remind
> someone to tweak the doc. My proposed fix would be:
>
> * Values other than phi nodes and select instructions depend on their
> operands.
>
> * Phi nodes depend on the operand corresponding to their dynamic
> predecessor basic block.
>
> * Select instructions depend on their selected operand.


Doesn't a select instruction also depend on the predicate (boolean) value?


>
> John



--Vikram Adve
Professor, Department of Computer Science
University of Illinois at Urbana-Champaign
va...@illinois.edu
http://llvm.org

John Regehr

unread,
Sep 10, 2014, 5:38:54 PM9/10/14
to llv...@cs.uiuc.edu
>> * Select instructions depend on their selected operand.
>
> Doesn't a select instruction also depend on the predicate (boolean) value?

Oops, yes, of course. So maybe:

* Select instructions depend on their condition and their selected
operand.

John

Dan Gohman

unread,
Sep 11, 2014, 5:54:44 PM9/11/14
to John Regehr, llv...@cs.uiuc.edu
Original inventor of the poison concept here.

Poison is a flawed concept. I proved it was flawed back in 2011 [0]

[0] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046368.html

And since then, other holes have been found. I'm not aware of any serious proposals for its replacement that do not have similar problems.

Also note that poison is very different from undef. Undef, as far as I know, remains an entirely coherent concept.

Dan

John Regehr

unread,
Sep 11, 2014, 10:21:03 PM9/11/14
to llv...@cs.uiuc.edu
> Poison is a flawed concept. I proved it was flawed back in 2011 [0]

Nice. My colleagues and I will dig through this material and possibly come
back with some ideas. We're going to need some sort of semantics for UB in
LLVM since otherwise these formal-methods-based tools like Souper and
Alive risk not making sense.

Duncan Sands

unread,
Sep 12, 2014, 3:49:21 AM9/12/14
to llv...@cs.uiuc.edu
Hi John,

On 12/09/14 04:17, John Regehr wrote:
>> Poison is a flawed concept. I proved it was flawed back in 2011 [0]
>
> Nice. My colleagues and I will dig through this material and possibly come back
> with some ideas. We're going to need some sort of semantics for UB in LLVM since
> otherwise these formal-methods-based tools like Souper and Alive risk not making
> sense.

it's great to see this back and forth between the formal tools and the language
reference. The IR docs are primarily aimed at humans, but if ambiguities and
problems spotted by the formal tools can be cleared up there without making the
lang ref unreadable then that would be a win for everyone.

Ciao, Duncan.

Dan Gohman

unread,
Sep 12, 2014, 12:22:12 PM9/12/14
to John Regehr, llv...@cs.uiuc.edu
Hi John,

More background is in this post (at which time "poison" was still named "trap") (and I apologize for the verbosity) [0]

[0] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html

The poison concept aimed to
  (a) enable the sign-extension elimination optimization
  (b) permit arbitrary speculation that doesn't destroy any optimization opportunities

Undef does (b) but not (a). Immediate UB does (a) but not (b). Additionally, here are some interesting debatable requirements:

  - Select(a, b, c) should be semantically equivalent to (sext(a) & b)|(~sext(a) & c) (with appropriate bitcasting).
  - Select should be semantically equivalent to its intuitive translation as a branch and a phi (the current poison concept violates this)
  - Branch should be semantically equivalent to its intuitive translation as a switch, and as an indirect branch
  - Constant folding, mem2reg, and reg2mem should always be safe *and* optimization-opportunity-preserving

I personally suggest a compromise on the optimization, and instead a focus on helping programmers write better code. C++11 range-based for loops are easier for humans to read, and in some cases avoid ol' "int" to step through arrays. Quoting the Zen of Python, "let's do more of those!"

John Regehr

unread,
Sep 18, 2014, 10:48:38 PM9/18/14
to Dan Gohman, llv...@cs.uiuc.edu
Today I ran into another aspect of the poison problem...

Basically, SimplifyCFG wants to take

expr1 && expr2

and flatten it into

x = expr1
y = expr2
x&y

This isn't safe when expr2 might execute UB. The consequence is that no
LLVM shift instruction is safe to speculatively execute, nor is any
nsw/nuw/exact variant, unless the operands can be proven to be in bounds.

Real example here:

http://llvm.org/bugs/show_bug.cgi?id=20997

David Majnemer

unread,
Sep 19, 2014, 3:20:04 AM9/19/14
to John Regehr, llv...@cs.uiuc.edu
I believe Dan struck at the heart of the issue and the consequences have just become apparent (at least to me).

Consider:
define i32 @f(i32 %a) {
  %cmp = icmp sgt i32 %a, 0
  %sub = sub nsw i32 2147483647, %a
  %sel = select i1 %cmp, i32 %sub, i32 0
  ret i32 %sel
}

If %a is -1, %sub will wrap and be poison. %cmp will be false.
Is %sel poison or not?  We have no definition for what select actually does.

If we assume select boils down to nothing but arithmetic, then %sel may be poisoned by *any* of its operands.

If we assume select boils down to control flow (branch + PHI), then we are violating the rules by which it operates in many places in LLVM.
Consider the following input:
%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%sel = select i1 %cmp1, i1 %cmp2, i1 false

We currently optimize this to:
%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%and = and i1 %cmp1, %cmp2

If %a is 2147483647: %add is poison, %cmp1 is false, %cmp2 is poison, %sel is false but %and is poison!

The above transform is not an oddity, InstCombine does this sort of thing *a lot*.

If select is arithmetic, many transforms are "correct" but we can form select far less often than we thought we could.
If select is control flow, many transforms are "wrong" but we might be able to form more selects.

What is select? How does it work? How should it work?

Sanjoy Das

unread,
Nov 14, 2014, 5:15:19 PM11/14/14
to David Majnemer, John Regehr, llv...@cs.uiuc.edu
Hi all,

I was initially going to send this email to a thread on llvm-commits
where David explained the issue with poison and select to me, but then
some quick googling led me to this thread.

I've been thinking about a possible semantics for poison values in a
certain way:

The key semantic difference between undef and poison, as I understand
it, is that when justifying an execution trace the compiler needs to
commit to a legal value for every undef flowing through the program --
a N bit undef can be any N bit value, but it has to be *some* N bit
value. On the other hand, the compiler does not need to commit to a N
bit poison value being *any* legal N bit value -- it may do
optimizations (eg. (t < (t + 1)) ==> true) that cannot be justified by
any assignment of legal values to the poison flowing through the
program. Since there may not be legal assignments to poison values
that justify a certain execution trace, we have to prohibit any action
that leads to "observing" poison values. So we say "observing" poison
values is UB and "cannot happen". Now the question is "what is
observation?".

One way to formalize the above: make any side-effecting operation
classically data-dependent on a poison value undefined behavior. I
think this is what the spec tries to do; but the current approach has
issues as pointed out in this thread, especially in relation to
selects.

Since UB is a dynamic property, I think we should make this notion of
dependence dynamic too: a side effecting operation has UB if any of
its operands (including values it is control dependent on via
branches) is `dynamic-dependent` on a poison value (non-side effecting
operations like add and lshr never have UB).

A value X is *not* `dynamic-dependent` on value Y (not necessarily an
operand of X) if for all values of Y, X computes the same value. So
(X - X) is not `dynamic-dependent` on X, 42 is not dynamic-dependent
on 43. Otherwise X is `dynamic-dependent` on Y. The idea here is
that if an expression computes the same value for all possible values
of one of its inputs (transitively), the value computed by the
expression cannot be used to "observe" the value of the said operand;
even if there is a static dependence chain.


It does not make a whole lot of sense to talk about dynamic dependence
for instructions like "load" which are not pure computation [1], and
these should be classified under "side effecting operations". We will
have to carefully specify the `dynamic-dependent` relation for this
scheme to work (if it does work!), but I'm just trying to get the
general across right now. For reg2mem to be safe, for instance, we
need to special-case stores to and loads from non-escaping allocas
(see summary below).


Now, the interesting thing about this dynamic dependence thing is that
it is a runtime, exact, property of a program. So "select %c, %a, %b"
is dynamic-dependent on "%a" only in executions where "%c" is true.
In the example in previous email, for example,

%add = add nsw i32 %a, 1
%cmp1 = icmp eq i32 %a, 0
%cmp2 = icmp slt i32 %add, 0
%and = and i1 %cmp1, %cmp2

%and is *not* `dynamic-dependent` on %cmp2 in the runs in which %cmp1
is false.

Another more complex example, involving conversion of br-PHI to a
select:

%a = maybe poison
br i1 %a, label %left, label %right
left:
%l = 42
br label %merge
right:
%r = < unknown load > can be speculated
br label %merge
merge:
%m = phi %l %r
store %m to heap

If %c is poison, then the store has UB iff %r != %l (unless it is
meaningful to speculatively talk about %r's value, we can't form a
select anyway). In that case, it is okay to transform the br-phi to a
select. If %c is poison, and %r == %l, then the store isn't UB (since
the phi does not dynamically depend on %c). In that case, the select
we form, "select %c, %l, %r" doesn't observe %c either and so storing
it is not UB. That said, I admit this is counter-intuitive -- whether
a program is undefined or not is a function of state of the system
that the program may not even observe in the undefined run. If
this is a fundamental flaw in this scheme or not I'm not sure.


* Summary / TL;DR *

The compiler may justify executions using assignments to
poison values that don't corresponding to any legal value. For this
reason, escaping any *information* based off a poison value is UB --
such information is unsound, in some sense. This notion of
"information dependence" is more precise than classic data dependence
(e.g. (xor A A) does not give us any information about A). The notion
of escaping is also more precise than "side-effecting operation" --
storing to a non-escaping alloca does not escape the stored value, for
example.

Does this make any sense at all? Have there been past attempts at
formalizing poison along these lines?

[1]: in this scheme load cannot result in poison because the store
that stored poison is UB and "could not have happened".

Thank you for your time!
-- Sanjoy

Sanjoy Das

unread,
Nov 23, 2014, 3:07:09 AM11/23/14
to David Majnemer, John Regehr, llv...@cs.uiuc.edu
Hi,

> Does this make any sense at all? Have there been past attempts at
> formalizing poison along these lines?

Answering my own question, what I was considering is a slight variant
of the scheme mentioned in an earlier email by Dan:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html

"
- Instead of trying to define dependence in LangRef, just say that if
changing the value returned from an overflowing add nsw would
affect the observable behavior of the program, then the behavior of
the program is undefined. This would reduce the amount of text in
LangRef, but it would be a weaker definition, and it would require
sign-extension optimizers and others to do significantly more work
to establish that their transformations are safe.
"

That said, this scheme does not work as expected -- the bitness of the
result of an overflowing add prevents some optimizations:

%a = add nsw i32 %v, 1
%c = icmp slt i32 %v, %a
store %c, < global variable >

In the case of signed overflow, %v is INT32_MAX. For all values of
%a, the observable behavior of the program is the same (we store false
to < global variable > for all values of %a); and so we cannot exploit
undefined behavior and simplify '%c' to true.

Similar issues arise if you, for example, zero extend a poison value,
and then right shift out all the "poison bits" -- the resulting value
is all zeroes and do not capture overflowing semantics.

Thanks,

David Majnemer

unread,
Nov 23, 2014, 5:19:48 PM11/23/14
to Sanjoy Das, John Regehr, llv...@cs.uiuc.edu
I think that there are fundamentally two different solutions to the select-poison issue with their own pros and cons:

Either:

We allow instructions to not care about poison operands because their output will not contain any information tainted by the poison.  This would mean that select is allowed to ignore a poisoned operand if it doesn't select it.  It would also mean that a poison condition value guarantees that you get *one* of the two operands *but* that the select instruction itself has been poisoned.

AFAICT, this doesn't lose out on any optimizations.


Or:

We introduce a 'vile' attribute to the select instruction.  A 'vile' select is poison even if the operand which wasn't selected if poison while 'normal' selects behave more like br/phi instructions.

This is a more conservative and, IMO, uglier approach which is sound but would require auditing LLVM to make sure that it only transforms 'vile' selects into arithmetic.  I think it would be very hard to optimize a normal select into a 'vile' select which would mean that we might lose out on some optimizations.


If there is a reason we shouldn't go with approach #1?  I can't immediately come up with IR that we would stop optimizing.

Reid Kleckner

unread,
Nov 24, 2014, 2:15:23 PM11/24/14
to David Majnemer, John Regehr, llv...@cs.uiuc.edu
I think the simpler way of phrasing 1 is: "poison has the same propagation rules as undef". This means that poison is a property of bits, not of values, and basic arithmetic like 'and 0', 'or 1', and 'mul 0' can mask it away. This makes a CFG diamond equivalent to a select and select equivalent to arithmetic, which seems like general goodness. I wonder if we lose optimization power around these kinds of masking arithmetic instructions, though.

Reply all
Reply to author
Forward
0 new messages