[llvm-dev] Is it valid to dereference a pointer that have undef bits in its offset?

51 views
Skip to first unread message

Juneyoung Lee via llvm-dev

unread,
Sep 20, 2020, 6:47:28 PM9/20/20
to llvm-dev
Hello all,

Is it valid to dereference a pointer that has undef bits in its offset?

For example,

%p = alloca [8 x i8]
%p2 = gep %p, (undef & 8)
store 0, %p2

undef & 8 is always less than 8, so technically it will store zero to one of the array's elements.

The reason is that I want to improve no-undef analysis by suggesting that a pointer that is passed to load/store is well-defined, by making it raise UB when a pointer with undef bits is given.

A suggested patch is here: https://reviews.llvm.org/D87994

I wonder whether there is a case using this to do something that I'm not aware.

Thanks,
Juneyoung

Juneyoung Lee via llvm-dev

unread,
Sep 20, 2020, 6:54:30 PM9/20/20
to llvm-dev
> %p2 = gep %p, (undef & 8)
A silly typo: undef & 8 -> undef & 7
--

Juneyoung Lee
Software Foundation Lab, Seoul National University

Eli Friedman via llvm-dev

unread,
Sep 21, 2020, 1:32:43 PM9/21/20
to Juneyoung Lee, llvm...@lists.llvm.org

I think it’s reasonable to expect that IR generated by frontends doesn’t do this.

 

Not sure about transforms; I can imagine that we might speculate a load without proving all the bits are well-defined.

 

-Eli

Johannes Doerfert via llvm-dev

unread,
Sep 21, 2020, 1:44:14 PM9/21/20
to Eli Friedman, Juneyoung Lee, llvm...@lists.llvm.org
My feeling tells me we should allows this.
No proper justification handy but your example doesn't strike me as UB.

~ Johannes


On 9/21/20 12:32 PM, Eli Friedman via llvm-dev wrote:
> I think it’s reasonable to expect that IR generated by frontends doesn’t do this.
>
> Not sure about transforms; I can imagine that we might speculate a load without proving all the bits are well-defined.
>
> -Eli
>
> From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Juneyoung Lee via llvm-dev
> Sent: Sunday, September 20, 2020 3:54 PM
> To: llvm-dev <llvm...@lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] Is it valid to dereference a pointer that have undef bits in its offset?
>
>> %p2 = gep %p, (undef & 8)
> A silly typo: undef & 8 -> undef & 7
>

> On Mon, Sep 21, 2020 at 7:47 AM Juneyoung Lee <juneyo...@sf.snu.ac.kr<mailto:juneyo...@sf.snu.ac.kr>> wrote:
> Hello all,
>
> Is it valid to dereference a pointer that has undef bits in its offset?
>
> For example,
>
> %p = alloca [8 x i8]
> %p2 = gep %p, (undef & 8)
> store 0, %p2
>
> undef & 8 is always less than 8, so technically it will store zero to one of the array's elements.
>
> The reason is that I want to improve no-undef analysis by suggesting that a pointer that is passed to load/store is well-defined, by making it raise UB when a pointer with undef bits is given.
>
> A suggested patch is here: https://reviews.llvm.org/D87994
>
> I wonder whether there is a case using this to do something that I'm not aware.
>
> Thanks,
> Juneyoung
>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University
>

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

unread,
Sep 21, 2020, 2:41:13 PM9/21/20
to Johannes Doerfert, Eli Friedman, Juneyoung Lee, llvm...@lists.llvm.org
I think we need to allow this.  Otherwise, we have to prove that
addresses are non-undef before we can hoist or sink a memory
instruction.  Today, aliasing can use things like known bits, and if we
imposed a no-undef in address requirement, we'd either need to replace
such reasoning in AA, or have passes which wish to hoist/sink check the
property afterwards.

Or to say it differently, I think it's reasonable for %p2 and %p3 to be
provably no alias and dereferenceable, and for %v and %v2 to be safe to
speculate.

%p = alloca [16 x i8]
%p2 = gep %p, (undef & 7)
%v = load %p2
%p3 = gep %p, 8
%v2 = load %p3

Keep in mind that the undef doesn't have to be literal and can be
arbitrarily obscured (e.g. behind a function call).  The alternative
interpretation is extremely limiting.

Philip

Johannes Doerfert via llvm-dev

unread,
Sep 21, 2020, 4:51:20 PM9/21/20
to Philip Reames, Eli Friedman, Juneyoung Lee, llvm...@lists.llvm.org
To be fair, if the address has to be `noundef` the example would just be
UB. That said, I still believe it "is not".

Juneyoung Lee via llvm-dev

unread,
Sep 21, 2020, 10:41:42 PM9/21/20
to Johannes Doerfert, Eli Friedman, Philip Reames, llvm...@lists.llvm.org
Thank you for the infos; it seems making it raise UB is problematic.

Would clarifying it in LangRef be good? I can update the patch to contain the information instead.

Another concern is then, how can we efficiently encode an assumption that a pointer variable in IR does not have undef bits?
Certainly, in the front-end language, (most of) pointers won't have undef bits, and it would be great if the information is still available in IR.
A pointer argument can be encoded using noundef, but, e.g., for a pointer that is loaded from memory, such information disappears.
I think this information is helpful reducing the cost of fixing existing undef/poison-related optimizations, because we can conclude that we don't need to insert freeze in more cases.

Juneyoung

Johannes Doerfert via llvm-dev

unread,
Sep 21, 2020, 10:47:09 PM9/21/20
to Juneyoung Lee, Eli Friedman, Philip Reames, llvm...@lists.llvm.org

On 9/21/20 9:41 PM, Juneyoung Lee wrote:
> Thank you for the infos; it seems making it raise UB is problematic.
>
> Would clarifying it in LangRef be good? I can update the patch to contain
> the information instead.

Yes, please.


> Another concern is then, how can we efficiently encode an assumption that a
> pointer variable in IR does not have undef bits?
> Certainly, in the front-end language, (most of) pointers won't have undef
> bits, and it would be great if the information is still available in IR.
> A pointer argument can be encoded using noundef, but, e.g., for a pointer
> that is loaded from memory, such information disappears.
> I think this information is helpful reducing the cost of fixing existing
> undef/poison-related optimizations, because we can conclude that we don't
> need to insert freeze in more cases.

I thought we solved that already:

    `call void llvm.assume(i1 true) ["noundef"(type* %ptr),
"noundef"(type2* %ptr2)]`

See http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Is that enough for your needs?


~ Johannes

Juneyoung Lee via llvm-dev

unread,
Sep 22, 2020, 12:56:56 AM9/22/20
to Johannes Doerfert, Eli Friedman, Philip Reames, llvm...@lists.llvm.org

>      `call void llvm.assume(i1 true) ["noundef"(type* %ptr), 
> "noundef"(type2* %ptr2)]`

Maybe I can try this first, thanks.
Reply all
Reply to author
Forward
0 new messages