[LLVMdev] Proposal: Loads/stores with deterministic trap/unwind behavior

49 views
Skip to first unread message

Peter Collingbourne

unread,
Mar 31, 2014, 9:58:03 PM3/31/14
to llv...@cs.uiuc.edu
Hi,

I wanted to propose an IR extension that would allow us to support zero-cost
exception handling for non-call operations that may trap. I wanted to start
with loads and stores through a null pointer, and later we might extend this to
div/rem/mod zero. This feature is obviously useful for implementing languages
such as Java and Go which deterministically translate such operations into
exceptions which may be caught by the user.

There are a couple of somewhat orthogonal features that this would entail:

1) Deterministic handling for loads and stores through a null pointer.
2) Ability to unwind a load/store to a specific basic block, like invoke.

At the moment, we do not exactly have 1), as the optimizer considers
non-volatile loads/stores through a null pointer to have undefined
behavior. Volatile loads/stores are closer, but they come with their own
set of baggage that can inhibit optimization. (For example, if we can prove
that a load would always succeed, 'volatile' prevents us from reordering
the load or deleting it if it is dead.) So I propose to add an attribute to
'load' and 'store', which we can call, say, 'nullcheck', with the following
additional semantics:

- If the memory address is between zero and a target-defined value (i.e. the
size of the zero page) the instruction is guaranteed to trap in a
target-defined manner.

- The optimizer may only delete or reorder nullcheck instructions if the
program cannot observe such a transformation by installing a signal handler
for the trap. Therefore, the optimizer would be able to drop the attribute
if it can prove that the address will always be non-null.

To support 2), I propose a couple of new instructions. I haven't come up with
great names for these instructions, but:

- 'iload' is to 'load' as 'invoke' is to 'call'. That is, the instruction is
a terminator and has normal and unwind destinations. e.g.

%v = iload i8* %ptr to label %try.cont unwind label %lpad

- Similarly, 'istore' is to 'store' as 'invoke' is to 'call'.

istore i8 %v, i8* %ptr to label %try.cont unwind label %lpad

These instructions always have 'nullcheck' semantics, plus:

- If the instruction traps and the program has installed a signal handler
for the trap which unwinds, the unwind is guaranteed to land at the
landing pad.

I've been working on an implementation of 'iload' and 'istore' which are
in the attached patches, if you are interested. (They aren't ready to go
in yet.) I have asm parsing/printing for both, and code generation for
'iload'. Would be interested in getting feedback on code generation as this
is my first serious foray into the backend -- I haven't tried running the
generated code yet and the DAG builder is a mashup of the DAG builders for
'invoke' and 'load', but I've eyeballed the asm it generates (e.g. llc produces
iload-exception.s for the attached iload-exception.ll) and it looks reasonable.

Thanks,
--
Peter
0001-Create-UnwindPoint-as-a-base-class-of-InvokeInst.patch
0002-Add-ILoadInst-and-IStoreInst-classes.patch
0003-Add-SelectionDAG-support-for-iload.patch
iload-exception.ll
iload-exception.s

Rafael Espíndola

unread,
Apr 1, 2014, 12:16:37 PM4/1/14
to Peter Collingbourne, LLVM Developers Mailing List
> - If the memory address is between zero and a target-defined value (i.e. the
> size of the zero page) the instruction is guaranteed to trap in a
> target-defined manner.
>
> - The optimizer may only delete or reorder nullcheck instructions if the
> program cannot observe such a transformation by installing a signal handler
> for the trap. Therefore, the optimizer would be able to drop the attribute
> if it can prove that the address will always be non-null.

We should probably say this at an IR level. For example,

* A painter to the the stack is assumed non trapping.
* A pointer returned by malloc (calloc, strdup, etc) is assumed non trapping.
* All geps of such pointers are assumed non trapping.
* null is required to trap.
* it is target dependent if a ptrtoint of a non zero constant traps or not.

> To support 2), I propose a couple of new instructions. I haven't come up with
> great names for these instructions, but:
>
> - 'iload' is to 'load' as 'invoke' is to 'call'. That is, the instruction is
> a terminator and has normal and unwind destinations. e.g.
>
> %v = iload i8* %ptr to label %try.cont unwind label %lpad
>
> - Similarly, 'istore' is to 'store' as 'invoke' is to 'call'.
>
> istore i8 %v, i8* %ptr to label %try.cont unwind label %lpad
>
> These instructions always have 'nullcheck' semantics, plus:
>
> - If the instruction traps and the program has installed a signal handler
> for the trap which unwinds, the unwind is guaranteed to land at the
> landing pad.
>
> I've been working on an implementation of 'iload' and 'istore' which are
> in the attached patches, if you are interested. (They aren't ready to go
> in yet.) I have asm parsing/printing for both, and code generation for
> 'iload'. Would be interested in getting feedback on code generation as this
> is my first serious foray into the backend -- I haven't tried running the
> generated code yet and the DAG builder is a mashup of the DAG builders for
> 'invoke' and 'load', but I've eyeballed the asm it generates (e.g. llc produces
> iload-exception.s for the attached iload-exception.ll) and it looks reasonable.

It basically assumes that runtime environment of a language using
iload and istore has a signal handler that converts the segmentation
fault to a null pointer exception that is propagated using the c++
abi?

Sounds reasonable overall.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Krzysztof Parzyszek

unread,
Apr 1, 2014, 12:35:16 PM4/1/14
to llv...@cs.uiuc.edu
How do you plan to actually create the exception? You cannot simply
throw an exception (via "throw" statement) from a signal handler because
the unwinding won't happen properly. To make it work you'll need a
runtime support to indicate the origin of the exception (i.e. the
address of the offending load/store). Alternatively, you could insert
runtime checks at the loads/stores, but then it won't be "zero-cost".

-Krzysztof
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Peter Collingbourne

unread,
Apr 1, 2014, 2:38:40 PM4/1/14
to Krzysztof Parzyszek, llv...@cs.uiuc.edu
On Tue, Apr 01, 2014 at 11:35:16AM -0500, Krzysztof Parzyszek wrote:
> How do you plan to actually create the exception?

The runtime library can invoke the unwinder directly (i.e. using
_Unwind_RaiseException).

> You cannot simply
> throw an exception (via "throw" statement) from a signal handler because
> the unwinding won't happen properly. To make it work you'll need a
> runtime support to indicate the origin of the exception (i.e. the
> address of the offending load/store). Alternatively, you could insert
> runtime checks at the loads/stores, but then it won't be "zero-cost".

I'm pretty sure that it is possible to unwind through signal handlers. This
is, for example, how gccgo implements some run-time panics.

Thanks,
--
Peter

Peter Collingbourne

unread,
Apr 1, 2014, 2:48:27 PM4/1/14
to Rafael Espíndola, LLVM Developers Mailing List
On Tue, Apr 01, 2014 at 12:16:37PM -0400, Rafael Espíndola wrote:
> > - If the memory address is between zero and a target-defined value (i.e. the
> > size of the zero page) the instruction is guaranteed to trap in a
> > target-defined manner.
> >
> > - The optimizer may only delete or reorder nullcheck instructions if the
> > program cannot observe such a transformation by installing a signal handler
> > for the trap. Therefore, the optimizer would be able to drop the attribute
> > if it can prove that the address will always be non-null.
>
> We should probably say this at an IR level. For example,
>
> * A painter to the the stack is assumed non trapping.
> * A pointer returned by malloc (calloc, strdup, etc) is assumed non trapping.
> * All geps of such pointers are assumed non trapping.
> * null is required to trap.
> * it is target dependent if a ptrtoint of a non zero constant traps or not.

That sounds about right. I would also add for the last point that it is target
dependent whether geps of a null pointer constant trap.

> > To support 2), I propose a couple of new instructions. I haven't come up with
> > great names for these instructions, but:
> >
> > - 'iload' is to 'load' as 'invoke' is to 'call'. That is, the instruction is
> > a terminator and has normal and unwind destinations. e.g.
> >
> > %v = iload i8* %ptr to label %try.cont unwind label %lpad
> >
> > - Similarly, 'istore' is to 'store' as 'invoke' is to 'call'.
> >
> > istore i8 %v, i8* %ptr to label %try.cont unwind label %lpad
> >
> > These instructions always have 'nullcheck' semantics, plus:
> >
> > - If the instruction traps and the program has installed a signal handler
> > for the trap which unwinds, the unwind is guaranteed to land at the
> > landing pad.
> >
> > I've been working on an implementation of 'iload' and 'istore' which are
> > in the attached patches, if you are interested. (They aren't ready to go
> > in yet.) I have asm parsing/printing for both, and code generation for
> > 'iload'. Would be interested in getting feedback on code generation as this
> > is my first serious foray into the backend -- I haven't tried running the
> > generated code yet and the DAG builder is a mashup of the DAG builders for
> > 'invoke' and 'load', but I've eyeballed the asm it generates (e.g. llc produces
> > iload-exception.s for the attached iload-exception.ll) and it looks reasonable.
>
> It basically assumes that runtime environment of a language using
> iload and istore has a signal handler that converts the segmentation
> fault to a null pointer exception that is propagated using the c++
> abi?

Exactly.

Thanks,
--
Peter

David Chisnall

unread,
Apr 1, 2014, 4:03:10 PM4/1/14
to Peter Collingbourne, llv...@cs.uiuc.edu
On 1 Apr 2014, at 19:38, Peter Collingbourne <pe...@pcc.me.uk> wrote:

> I'm pretty sure that it is possible to unwind through signal handlers. This
> is, for example, how gccgo implements some run-time panics.

It' highly OS dependent and not something to rely on for any vaguely-portable code. We do support it in FreeBSD, but the extra logic caused a slowdown that users objected to when we introduced it and so I can well imagine that other systems would prefer the faster approach - it has some quite nasty interactions with the run-time linker. From what I recall, at least one of the other UNIX-like systems that we looked at did claim to support it, but got the locking wrong so it worked fine 99% of the time and deadlocked the remaining 1%...

David

Peter Collingbourne

unread,
Apr 1, 2014, 7:54:54 PM4/1/14
to David Chisnall, llv...@cs.uiuc.edu
On Tue, Apr 01, 2014 at 09:03:10PM +0100, David Chisnall wrote:
> On 1 Apr 2014, at 19:38, Peter Collingbourne <pe...@pcc.me.uk> wrote:
>
> > I'm pretty sure that it is possible to unwind through signal handlers. This
> > is, for example, how gccgo implements some run-time panics.
>
> It' highly OS dependent and not something to rely on for any vaguely-portable code. We do support it in FreeBSD, but the extra logic caused a slowdown that users objected to when we introduced it and so I can well imagine that other systems would prefer the faster approach - it has some quite nasty interactions with the run-time linker. From what I recall, at least one of the other UNIX-like systems that we looked at did claim to support it, but got the locking wrong so it worked fine 99% of the time and deadlocked the remaining 1%...

Right, this feature would obviously require a certain level of OS support
and I'd expect a portable compiler to be able to pick a compatible way to
implement null checks.

Thanks,
--
Peter

Joerg Sonnenberger

unread,
Apr 2, 2014, 8:18:03 AM4/2/14
to llv...@cs.uiuc.edu
On Tue, Apr 01, 2014 at 09:03:10PM +0100, David Chisnall wrote:
> On 1 Apr 2014, at 19:38, Peter Collingbourne <pe...@pcc.me.uk> wrote:
>
> > I'm pretty sure that it is possible to unwind through signal handlers. This
> > is, for example, how gccgo implements some run-time panics.
>
> It' highly OS dependent and not something to rely on for any
> vaguely-portable code.

libc++abi's unwind doesn't really support unwinding through signal
handlers, at least it wouldn't do anything special for them. I don't
have the faintest idea what the correct semantic would even be. Does it
restore the saved signal mask? Can you even unwind past the frame? In
general you won't as unwind instructions are not precise/

Joerg

Joe Ranieri

unread,
Apr 2, 2014, 10:44:33 AM4/2/14
to llvmdev@cs.uiuc.edu Mailing List
On Wed, Apr 2, 2014 at 8:18 AM, Joerg Sonnenberger
<jo...@britannica.bec.de> wrote:
> libc++abi's unwind doesn't really support unwinding through signal
> handlers, at least it wouldn't do anything special for them. I don't
> have the faintest idea what the correct semantic would even be. Does it
> restore the saved signal mask? Can you even unwind past the frame? In
> general you won't as unwind instructions are not precise/
>
> Joerg

I've done a similar scheme in the past using Mach exception handlers.
Basically there's a thread running that gets told when bad memory
accesses happen and it has access to all of that thread's register
state. Given the state, it determines whether or not an exception
should be thrown (e.g. is the current instruction pointer is in user
code). Then it changes the thread's register state in order to push a
new stack frame and call the function that raises the exception. Once
all of the setup is done, the thread is resumed and things go on their
merry way without crashing.

I'd love to see this become feasible with LLVM as we found this a
significant win in terms of reducing binary size.

-- Joe Ranieri

Andrew Trick

unread,
Apr 5, 2014, 3:21:17 AM4/5/14
to Peter Collingbourne, llv...@cs.uiuc.edu
Hi Peter. All due respect, I don’t think it’s right to introduce new load/store instructions with equivalent semantics to and+icmp+br+load/store.

‘invoke’ is different. It is needed because there is no way for the caller to explicitly check for an exception.

We do introduce intrinsics that encapsulate things like overflow checks. This is done to eliminate control flow edges that tend to inhibit LLVM’s optimizer and instruction selection. But you’re not removing the control flow, so this technique does not apply. Null checks should actually be exposed in IR so general optimizations can remove redundant checks.

Ideally this would just be a machine code pass that can hoist a load/store above a branch and nuke the compare. However, I can see how it’s easier to relate the compare operand to the address arithmetic at IR level.

To do this at IR level, you could introduce a pre-CodeGen pass that converts cmp+br+load/store into a checked load intrinsic. Since this intrinsic only exists for instruction selection, the optimizer doesn’t need to know about it.

The intrinsic would need to be lowered to an MI pseudo-instruction that feels like a load/store to the backend, but is a terminator. During code emission you could grab the address of that pseudo load/store and its resulting branch target to inform the runtime.

I can envision having a front end generate an intrinsic for explicit null checks so that an optimization pass can do something special with these, like reorder null checks and let the runtime recover. But I would not include the load/store semantics in the check. And this is entirely independent from the codegen issue of whether to use trapping loads and stores.

-Andy

>
> Thanks,
> --
> Peter
> <0001-Create-UnwindPoint-as-a-base-class-of-InvokeInst.patch><0002-Add-ILoadInst-and-IStoreInst-classes.patch><0003-Add-SelectionDAG-support-for-iload.patch><iload-exception.ll><iload-exception.s>_______________________________________________

Peter Collingbourne

unread,
Apr 6, 2014, 11:52:47 PM4/6/14
to Andrew Trick, llv...@cs.uiuc.edu

I don't think the new instructions have equivalent semantics. If the null check
fails with the iload/istore instructions, we need to throw the appropriate
language-specific exception and evaluate it against the landing pad. There
may also need to be an active exception that can be resumed.

As far as I can tell, the frontend would need to emit IR that calls the
language runtime to manually throw the exception. This IR would need to be
recognized by the IR optimization that converts the icmp+br+load/store to
a checked load/store. It seems to me that it would be simpler to just start
with the checked load/store.

> ‘invoke’ is different. It is needed because there is no way for the caller to explicitly check for an exception.
>
> We do introduce intrinsics that encapsulate things like overflow checks. This is done to eliminate control flow edges that tend to inhibit LLVM’s optimizer and instruction selection. But you’re not removing the control flow, so this technique does not apply. Null checks should actually be exposed in IR so general optimizations can remove redundant checks.

My idea for removing redundant checks is to teach the IR optimizer to
treat iloads/istores as if they were null checks. Is there any reason
why this wouldn't work?

> Ideally this would just be a machine code pass that can hoist a load/store above a branch and nuke the compare. However, I can see how it’s easier to relate the compare operand to the address arithmetic at IR level.
>
> To do this at IR level, you could introduce a pre-CodeGen pass that converts cmp+br+load/store into a checked load intrinsic. Since this intrinsic only exists for instruction selection, the optimizer doesn’t need to know about it.

I did initially consider implementing the checked load/store as an
intrinsic. But there are relatively boring reasons why this won't work at
present. For example, there is no current way to represent a struct load
using an intrinsic, as there is no mangling for struct types. Also, named
structs would need a mangling that is resistant to renames. Rather than
solve these problems, I decided to avoid intrinsics entirely.

> The intrinsic would need to be lowered to an MI pseudo-instruction that feels like a load/store to the backend, but is a terminator. During code emission you could grab the address of that pseudo load/store and its resulting branch target to inform the runtime.

As far as I know, a load can lower to multiple machine instructions. This
will definitely be the case for the Go frontend that I'm working with, as
its IR tends to use struct loads/stores quite frequently. So I'm not sure
if this will work. I think it needs to look a lot like how the lowering for
invokes currently looks, with a pair of EH_LABELs around a set of ordinary
load/store MIs -- which is how I implemented it.

Thanks,
--
Peter

Filip Pizlo

unread,
Apr 7, 2014, 12:06:49 AM4/7/14
to Peter Collingbourne, Andrew Trick, llv...@cs.uiuc.edu
Hey Peter,

I've now had multiple opportunities to evaluate the benefit of virtual-memory-based null checks versus explicit branches.  Here are some of the papers where we did this.  In all cases, we found that explicit null check branches are just as fast as using a virtual memory trap.

http://www.filpizlo.com/papers/baker-ccpe09-accurate.pdf    (Tomas stumbled on this observation while comparing OpenVM's C backend to its C++ backend.)
http://www.filpizlo.com/papers/pizlo-eurosys2010-fijivm.pdf   (I designed Fiji VM around never using virtual-memory-based optimizations and then I evaluated the cost of null check branches.  They were free.)

I've also done this experiment in WebKit.  In JavaScript the equivalent thing is a cell check, and we turn them into explicit branches.  We of course emit a minimal set of such branches through GVN, LICM, etc.  After that, these branches are essentially free.

In all cases, I've found that on modern hardware, the explicit null check branches have essentially zero cost.  Redundant null checks are trivial to eliminate using any serious compiler infrastructure, LLVM included.  The remaining null checks are trivially predicted by the hardware.

To me it feels like you're proposing to add new instructions to the IR to support something that isn't a worthwhile optimization.

(Also, this is specific to languages where the only non-pointer value that we might attempt to load from is null.  It wouldn't work for languages like JavaScript/Ruby/Python/etc where we may attempt to load from an integer or double that uses a high-bit type tag for instance.  So this is an optimization that really only applies to Java and Go, and in Java it's not clear that your IR gives you what you want - Java is basically always JITed and you'd want to deoptimize on null.)

-Phil

Peter Collingbourne

unread,
Apr 7, 2014, 12:53:29 AM4/7/14
to Filip Pizlo, llv...@cs.uiuc.edu
Hi Filip,

Thanks for sharing these insights.

Since I've already started on an implementation, what I think I'll do is
experiment locally and see if these new instructions give a performance
improvement in the context of the compiler I'm working on -- I'll probably
need to implement explicit null checks anyway for the sake of supporting
targets that don't support unwinding through signal handlers. Once I have
some numbers I'll re-evaluate.

Thanks,
Peter

Filip Pizlo

unread,
Apr 7, 2014, 12:55:07 AM4/7/14
to Peter Collingbourne, llv...@cs.uiuc.edu
When you do it, it would be awesome if you shared your performance results with everyone.  It's possible that your environment leads to a different kind of trade-off.

-Phil

Andrew Trick

unread,
Apr 7, 2014, 4:07:56 AM4/7/14
to Peter Collingbourne, llv...@cs.uiuc.edu
I agree with Filip that implementing null checks as trapping loads is essentially a minor code-size optimization that is typically not worthwhile. However, my point is that this decision should not be made at IR level.

It could make sense to have a null check intrinsic that encapsulates and+cmp+br+invoke. I just don't like the idea of it subsuming the load/store. I think that will complicate both null check optimization and load/store optimization.

Whether the null check can be profitably implemented as a trapping load/store is specific to the target platform. This decision should be independent of optimization in the presence of null checks and independent of optimization of the null checks themselves. It is purely a codegen issue.

Imagine someone else porting your frontend to a new platform. They should not be required to implement trapping loads/stores to achieve a working system. And either way, they should benefit from the same platform independent optimization.

>> ‘invoke’ is different. It is needed because there is no way for the caller to explicitly check for an exception.
>>
>> We do introduce intrinsics that encapsulate things like overflow checks. This is done to eliminate control flow edges that tend to inhibit LLVM’s optimizer and instruction selection. But you’re not removing the control flow, so this technique does not apply. Null checks should actually be exposed in IR so general optimizations can remove redundant checks.
>
> My idea for removing redundant checks is to teach the IR optimizer to
> treat iloads/istores as if they were null checks. Is there any reason
> why this wouldn't work?

I think it would be poor IR design. The optimizer should be able to assume loads are loads and stores are stores. People writing optimization passes will not remember that there is now another special load/store operation they should consider.

An intrinsic makes it fairly clear that optimization passes will typically ignore this operation. Ideally, language-specific intrinsics are lowered early into canonical IR and platform-specific intrinsics are generated late from canonical IR to avoid complicating optimization.

>> Ideally this would just be a machine code pass that can hoist a load/store above a branch and nuke the compare. However, I can see how it’s easier to relate the compare operand to the address arithmetic at IR level.
>>
>> To do this at IR level, you could introduce a pre-CodeGen pass that converts cmp+br+load/store into a checked load intrinsic. Since this intrinsic only exists for instruction selection, the optimizer doesn’t need to know about it.
>
> I did initially consider implementing the checked load/store as an
> intrinsic. But there are relatively boring reasons why this won't work at
> present. For example, there is no current way to represent a struct load
> using an intrinsic, as there is no mangling for struct types. Also, named
> structs would need a mangling that is resistant to renames. Rather than
> solve these problems, I decided to avoid intrinsics entirely.

Ok. This seems worth solving. Maybe it should be discussed in another thread.

>
>> The intrinsic would need to be lowered to an MI pseudo-instruction that feels like a load/store to the backend, but is a terminator. During code emission you could grab the address of that pseudo load/store and its resulting branch target to inform the runtime.
>
> As far as I know, a load can lower to multiple machine instructions. This
> will definitely be the case for the Go frontend that I'm working with, as
> its IR tends to use struct loads/stores quite frequently. So I'm not sure
> if this will work. I think it needs to look a lot like how the lowering for
> invokes currently looks, with a pair of EH_LABELs around a set of ordinary
> load/store MIs -- which is how I implemented it.

That sounds nice. But I'm not sure that's sufficient to prevent the MI level loads being optimized away or move/combined when you still need the null check in the original position.

-Andy

Jeremy Lakeman

unread,
Apr 9, 2014, 5:06:16 AM4/9/14
to Andrew Trick, LLVM Developers Mailing List
On Mon, Apr 7, 2014 at 5:37 PM, Andrew Trick <atr...@apple.com> wrote:

I agree with Filip that implementing null checks as trapping loads is essentially a minor code-size optimization that is typically not worthwhile. However, my point is that this decision should not be made at IR level.

It could make sense to have a null check intrinsic that encapsulates and+cmp+br+invoke. I just don't like the idea of it subsuming the load/store. I think that will complicate both null check optimization and load/store optimization.

Whether the null check can be profitably implemented as a trapping load/store is specific to the target platform. This decision should be independent of optimization in the presence of null checks and independent of optimization of the null checks themselves. It is purely a codegen issue.

Imagine someone else porting your frontend to a new platform. They should not be required to implement trapping loads/stores to achieve a working system. And either way, they should benefit from the same platform independent optimization.

Integer type legalisation, vector math... There's plenty of existing language features you don't have to implement, just run passes to simplify the IR first.
I'm also thinking particularly of how ecmascripten and pnacl are implemented.

If the IR changes (and I'm not taking a position on that issue), we just need a pass to transform the IR for back ends that don't wish to implement it.

Andrew Trick

unread,
Apr 9, 2014, 12:38:14 PM4/9/14
to Jeremy Lakeman, LLVM Developers Mailing List
Good point. Let me put it this way, I don’t want trapping loads/stores to be an IR feature, but I think someone should be able to add them as a codegen feature. Every runtime needs to be able to handle explicit null checks. Only in a rare platform-specific case would someone want to also support implicit null checks via trapping loads/stores.

Actually, the most important thing to keep in mind here is that we don’t want to add new instructions to the LLVM language unless there is a very widespread, compelling need, and we want optimization passes to be explicitly aware of them by default. That’s where intrinsics come in.

Now, the question is, what should a null check intrinsic look like? We could introduce two:
(1) llvm.nullcheck that just encapsulates the check+trap
(2) llvm.load/store.nullchecked does check+load/store+trap.

At the IR optimization level, I think it’s potentially useful to have llvm.nullcheck. Although I think it should be lowered to canonical IR instructions and control flow early in the optimization pipeline.

I can see how a llvm.load/store.nullchecked could facilitate machine code generation. A frontend would be free to generate this intrinsic directly, but I think it would inhibit optimization. I better option would be for a codegen pass to fold loads/stores into null checks after the null checks have been optimized. Although even that has it’s drawbacks, as AliasAnalysis would need to be taught about the intrinsic. In terms of the best phase ordering, this optimization belongs in the instruction scheduler, but that’s not really practical.

-Andy

Philip Reames

unread,
Apr 10, 2014, 12:40:07 PM4/10/14
to llv...@cs.uiuc.edu, pe...@pcc.me.uk
I'll respond to the rest of this thread separately, but want to respond
to one bit in particular.

On 04/06/2014 08:52 PM, Peter Collingbourne wrote:
> I did initially consider implementing the checked load/store as an
> intrinsic. But there are relatively boring reasons why this won't work
> at present. For example, there is no current way to represent a struct
> load using an intrinsic, as there is no mangling for struct types.
> Also, named structs would need a mangling that is resistant to
> renames. Rather than solve these problems, I decided to avoid
> intrinsics entirely.
I have out of tree code which adds support for intrinsics with struct
type. I'm happy to share this.

More over, if you identify a weakness in the intrinsic mechanism, we
should fix it. :)

(Says the guy with a couple of such fixes out of tree... really should
get around to upstreaming that...)

Philip

Philip Reames

unread,
Apr 10, 2014, 12:54:02 PM4/10/14
to Andrew Trick, Peter Collingbourne, llv...@cs.uiuc.edu
Speaking as someone working on another language where the performance of
null checks are a major concern, I find myself agreeing with Andy and
Filip here.

I'm somewhat okay with the addition of the "nullcheck" attribute. I'm
not convinced it's really necessary, but the burden seems low. I find
myself in strong opposition to the new load/store variants. As others
have pointed out, there's not a clear win here and there's a *serious*
cost to modifying the IR - both current and ongoing.

Worth noting, both of your load and store instructions could be
implemented as intrinsics with minimal (if any) loss in expressiveness
or performance. There are function attributes which express nearly all
the aliasing properties you'd need. The only real complication is that
we don't really support invoking intrinsics today. I've got some hacky
changes out of tree that address that issue, but it needs to be
discussed on the mailing list before sharing.

Just to note: I'm not taking a position w.r.t. the profitability of
implicit null checks as a whole. We plan on running this experiment at
some point, but it's fairly low priority for us. There's definitely
more profitable things to get to first.

IMHO, a better area to invest in would be in null check elimination.
We've found a couple of cases where "obviously" redundant null checks
get missed by the optimizer. We haven't started digging into this in
great detail, but that seems like it would be a much more profitable
optimization opportunity.

Philip

Philip Reames

unread,
Apr 10, 2014, 1:01:19 PM4/10/14
to Joe Ranieri, llvmdev@cs.uiuc.edu Mailing List

On 04/02/2014 07:44 AM, Joe Ranieri wrote:
> On Wed, Apr 2, 2014 at 8:18 AM, Joerg Sonnenberger
> <jo...@britannica.bec.de> wrote:
>> libc++abi's unwind doesn't really support unwinding through signal
>> handlers, at least it wouldn't do anything special for them. I don't
>> have the faintest idea what the correct semantic would even be. Does it
>> restore the saved signal mask? Can you even unwind past the frame? In
>> general you won't as unwind instructions are not precise/
>>
>> Joerg
> I've done a similar scheme in the past using Mach exception handlers.
> Basically there's a thread running that gets told when bad memory
> accesses happen and it has access to all of that thread's register
> state. Given the state, it determines whether or not an exception
> should be thrown (e.g. is the current instruction pointer is in user
> code). Then it changes the thread's register state in order to push a
> new stack frame and call the function that raises the exception. Once
> all of the setup is done, the thread is resumed and things go on their
> merry way without crashing.
>
> I'd love to see this become feasible with LLVM as we found this a
> significant win in terms of reducing binary size.
Unless I'm misunderstanding, this is possible today and wouldn't be too
hard to implement.

The trick is to preserve the control flow possibilities at the IR level
so that there's a known exceptional resume point after the potentially
faulting instruction. Any potentially faulting operation could be
represented as an invoke to a properly modelled intrinsic.

In SelectionDAGBuilder, you could emit the actual instructions rather
than lowering the call part of the invoke normally. (I think. Haven't
implemented that and this area is filled with corner cases...)

Philip
Reply all
Reply to author
Forward
0 new messages