[LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

518 views
Skip to first unread message

Reid Kleckner

unread,
Nov 10, 2014, 5:15:56 PM11/10/14
to LLVM Developers Mailing List, John McCall
Moving this month old RFC to llvmdev. Not sure why I sent this to cfe-dev in the first place...

---

Based on code review discussion from John, he thinks filter expressions should be emitted into the body of the function with the try, rather than being outlined by the frontend.

Instead of having the frontend create filter functions, we would use labels in place of typeinfo. The IR would look like "landingpad ... catch label %filter0 ..." instead of "landingpad ... catch ... @filter_func0 ...". There would be a backend pass similar to SjLjEHPrepare that would outline the filter function and cleanup actions. Once we do the outlining, there is no turning back, because the outlined function has to know something about the stack layout of the parent function. If the parent function is inlined, we would have to duplicate the filter function along with it.

Given that we want this kind of outlining to handle cleanups, it shouldn't be difficult to use the same machinery for filter expressions.

The IR sample for safe_div at the end of my RFC would look like this instead:

define i32 @safe_div(i32 %n, i32 %d) {
entry:
  %d.addr = alloca i32, align 4
  %n.addr = alloca i32, align 4
  %r = alloca i32, align 4
  store i32 %d, i32* %d.addr, align 4
  store i32 %n, i32* %n.addr, align 4
  invoke void @try_body(i32* %r, i32* %n.addr, i32* %d.addr)
          to label %__try.cont unwind label %lpad

filter:
  %eh_code = call i32 @llvm.eh.seh.exception_code() ; or similar
  %cmp = icmp eq i32 %eh_code, 0xC0000094
  %r = zext i1 %cmp to i32
  call void @llvm.eh.seh.filter(i32 %r)

lpad:
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*)
          catch label %filter
  store i32 0, i32* %r, align 4
  br label %__try.cont

__try.cont:
  %2 = load i32* %r, align 4
  ret i32 %2
}

define internal void @try_body(i32* %r, i32* %n, i32* %d) {
entry:
  %0 = load i32* %n, align 4
  %1 = load i32* %d, align 4
  %div = sdiv i32 %0, %1
  store i32 %div, i32* %r, align 4
  ret void
}

On Wed, Oct 1, 2014 at 10:43 AM, Reid Kleckner <r...@google.com> wrote:
I want to add SEH support to Clang, which means we need a way to represent it in LLVM IR.

Briefly, this is how I think we should represent it:
1. Use a different landingpad personality function for SEH (__C_specific_handler / _except_handlerN)
2. Use filter function pointers in place of type_info globals
3. Outline cleanups such as destructors and __finally on Windows, and provide a function pointer to the landingpad cleanup clause

See the example IR at the end of this email. Read on if you want to understand why I think this is the right representation.

---

Currently LLVM's exception representation is designed around the Itanium exception handling scheme documented here:

LLVM's EH representation is described here, and it maps relatively cleanly onto the Itanium design:

First, a little background about what __try is for. It's documented here:

The __try construct exists to allow the user to recover from all manner of faults, including access violations and integer division by zero. Immediately, it's clear that this is directly at odds with LLVM IL semantics. Regardless, I believe it's still useful to implement __try, even if it won't behave precisely as it does in other compilers in the presence of undefined behavior.

---

The first challenge is that loads in C/C++ can now have exceptional control flow edges. This is impossible to represent in LLVM IR today because only invoke instructions can transfer control to a landing pad. The simplest way to work around this is to outline the body of the __try block and mark it noinline, which is what I propose to do initially.

Long term, we could lower all potentially trapping operations to intrinsics that we 'invoke' at the IR level. See also Peter Collingbourne's proposal for iload and istore instructions here (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-April/071732.html).

With the outlining approach, in addition to noinline, we need to invent another function attribute to prevent functionattrs from inferring nounwind and readonly, or the optimizers will delete the invoke unwind edge or entire call site.

---

The next challenge is actually catching the exception. The __except construct allows the user to evaluate a (mostly) arbitrary expression to decide if the exception should be caught. Code generated by MSVC catches these exceptions with an exception handler provided by the CRT. This handler is analogous to personality functions (__gxx_personality_v0) in LLVM and GCC, so that's what I call it.

Notably, SEH and C++ exceptions use *different* personality functions in MSVC, and each function can only have one personality function. This is the underlying reason why one cannot mix C++ EH (even C++ RAII!) and __try in the same function with MSVC. Hypothetically, there is no reason why one could not have a more powerful personality function that handles both types of exception, but I intend to use the personality function from the CRT directly for the time being.

On x86_64, the SEH personality function is called __C_specific_handler. On x86, it is __except_handler4 (or 3), but it's similar to __C_specific_handler and doesn't change how we should represent this in IR.

The personality function interprets some side tables similar to the Itanium LSDA to find the filter function pointers that must be evaluated to decide which except handler to run at a given PC. The filter expressions are emitted as separate function bodies that the personality function calls. If a filter function returns '1', that indicates which except block will perform the recovery, and phase 1 of unwinding ends, similar to Itanium EH.

I propose we represent this in IR by:
1. changing the personality function to __C_specific_handler, __except_handler4, or in the future something provided by LLVM
2. replacing the type_info globals we currently put in landing pads with pointers to filter functions

Then, in llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp where we currently emit the LSDA for Itanium exceptions, we can emit something else keyed off which kind of personality function we're using.

---

SEH also allows implementing cleanups with __finally, but cleanups in general are implemented with a fundamentally different approach.

During phase 2 of Itanium EH unwinding, control is propagated back up the stack. To run a cleanup, the stack is cleared, control enters the landing pad, and propagation resumes when _UnwindResume is called.

On Windows, things work differently. Instead, during phase 2, each personality function is invoked a second time, wherein it must execute all cleanups *without clearing the stack* before returning control to the runtime where the runtime continues its loop over the stack frames. You can observe this easily by breaking inside a C++ destructor when an exception is thrown and taking a stack trace.

MinGW's Win64 "SEH" personality function finesses this issue by taking complete control during phase 2 and following the Itanium scheme of successive stack unwinding. It has the drawback that it's not really ABI compatible with cleanups emitted by other compilers, which I think should be a goal for our implementation.

It might be possible to do something similar to what MinGW does, and implement our own __gxx_personality* style personality function that interprets the same style of LSDA tables, but we *need* to be able to establish a new stack frame to run cleanups. We cannot unwind out to the original frame that had the landing pad.

In the long term, I think we need to change our representation to implement this. Perhaps the cleanup clause of a landing pad could take a function pointer as an operand. However, in the short term, I think we can model this by always catching the exception and then re-raising it. Obviously, this isn't 100% faithful, but it can work.

---

Here’s some example IR for how we might lower this C code:

#define GetExceptionCode() _exception_code()
enum { EXCEPTION_INT_DIVIDE_BY_ZERO = 0xC0000094 };
int safe_div(int n, int d) {
  int r;
  __try {
    r = n / d;
  } __except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO) {
    r = 0;
  }
  return r;
}

define internal void @try_body(i32* %r, i32* %n, i32* %d) {
entry:
  %0 = load i32* %n, align 4
  %1 = load i32* %d, align 4
  %div = sdiv i32 %0, %1
  store i32 %div, i32* %r, align 4
  ret void
}

define i32 @safe_div(i32 %n, i32 %d) {
entry:
  %d.addr = alloca i32, align 4
  %n.addr = alloca i32, align 4
  %r = alloca i32, align 4
  store i32 %d, i32* %d.addr, align 4
  store i32 %n, i32* %n.addr, align 4
  invoke void @try_body(i32* %r, i32* %n.addr, i32* %d.addr)
          to label %__try.cont unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*)
          catch i8* bitcast (i32 (i8*, i8*)* @"\01?filt$0@0@safe_div@@" to i8*)
  store i32 0, i32* %r, align 4
  br label %__try.cont

__try.cont:                                       ; preds = %__except, %entry
  %2 = load i32* %r, align 4
  ret i32 %2
}

define internal i32 @"\01?filt$0@0@safe_div@@"(i8* %exception_pointers, i8* %frame_pointer) {
entry:
  %0 = bitcast i8* %exception_pointers to i32**
  %1 = load i32** %0, align 8
  %2 = load i32* %1, align 4
  %cmp = icmp eq i32 %2, -1073741676
  %conv = zext i1 %cmp to i32
  ret i32 %conv
}

declare i32 @__C_specific_handler(...)


Reid Kleckner

unread,
Nov 10, 2014, 6:45:01 PM11/10/14
to LLVM Developers Mailing List, John McCall
Hm, this idea won't work. If we point to labels from landingpadinst then passes like SimplifyCFG will consider the blocks to be unreachable. I realized this by looking at llvm-dis output after hacking in asmparser support for this syntax. :)

I'll have to think longer.

Kaylor, Andrew

unread,
Nov 12, 2014, 8:14:24 PM11/12/14
to Reid Kleckner, LLVM Developers Mailing List

Hi Reid,

 

I’ve been following your proposal, and I’d be interested in helping out if I can.  My main interest right now is in enabling C++ exception handling in clang for native (i.e. not mingw/cygwin) Windows targets (both 32-bit and 64-bit), but if I understand things correctly that will be closely related to your SEH work under the hood.

 

I’m still trying to get up to speed on what is and is not implemented, but I think I’m starting to get a clear picture.  My understanding is that LLVM has the necessary support to emit exception handling records that Windows will be able to work with (for Win64 EH) but some work may be required to get the IR properly wired up, and that there’s basically nothing in place to support Win32 EH and nothing in clang to generate the IR for either case.  Is that more or less accurate?

 

I’ve been looking at the work Kai Nacke did in ldc to implement exception handling there, but it isn’t clear to me yet how relevant that is to clang.

 

Can you tell me more about what your plans are?  Specifically, do you intend to support both 32 and 64 bit targets?  And were you also planning to work toward C++ exception handling support in clang once you had the general SEH support in place?

 

Finally, and most importantly, what can I do to help?

 

Thanks,

Andy

Reid Kleckner

unread,
Nov 13, 2014, 3:05:31 PM11/13/14
to Kaylor, Andrew, LLVM Developers Mailing List
Cool! Apologies for the following stream of consciousness brain dump...

On Wed, Nov 12, 2014 at 5:07 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

Hi Reid,

 

I’ve been following your proposal, and I’d be interested in helping out if I can.  My main interest right now is in enabling C++ exception handling in clang for native (i.e. not mingw/cygwin) Windows targets (both 32-bit and 64-bit), but if I understand things correctly that will be closely related to your SEH work under the hood.


Great! I agree, any changes to LLVM IR made to support SEH will also be needed to support C++ exceptions on Windows, in particular the outlining.

In the current LLVM model, all the exception handling code lives in the landing pad. The Windows unwinder doesn't actually return control to the landingpad until very late. Instead, it creates new stack frames to invoke the cleanup, catch handler (C++ EH only), or filter function (SEH only). This is why we need to have outlining somewhere. The question is, where should we do it? Personally, I want to do this on LLVM IR during CodeGenPrepare.

The major challenge that outlining anywhere presents is that now the outlined code has to "know" something about the frame layout of the function it was outlined from in order to access local variables. I think we can add `i8* @llvm.eh.get_capture_block(i8* %function, i8* %parent_rbp)` and `void @llvm.eh.set_capture_block(i8* %captures)` intrinsics to make this work. Any SSA values or allocas captured by the outlined landing pad code will be demoted to memory and stored in the capture block, and the layout will be encoded in a struct used by the outlined handlers and the parent function. However, once you do this, you cannot inline the IR without some heroics. It probably isn't that important to be able to inline functions with try/catch, but a good acid test for any new LLVM IR construct is "will it inline?", and this construct fails. I think we can live with this construct as long as we only introduce it after CodeGenPrepare.

The remaining wrinkle in the capture block scheme is stack realignment prologues. In this case, we have three pointers to the stack: the SP, the base pointer (esi/rbx), and the frame pointer (ebp/rbp). Is the capture block stored at a known constant offset from ebp/rbp or esi/rbx? Or do we load and store a dynamic offset saved somewhere near ebp/rbp? This needs study.
 

I’m still trying to get up to speed on what is and is not implemented, but I think I’m starting to get a clear picture.  My understanding is that LLVM has the necessary support to emit exception handling records that Windows will be able to work with (for Win64 EH) but some work may be required to get the IR properly wired up, and that there’s basically nothing in place to support Win32 EH and nothing in clang to generate the IR for either case.  Is that more or less accurate?


We can emit valid pdata and xdata sections on Win64, and this supports basic stack unwinding. On top of that, we currently follow mingw64 and use Itanium-style LSDA tables and the __gxx_personality_seh0 personality function to run EH handlers. This means the standard exception handling IR emitted by clang and other frontends "just works" on Windows, and I want to keep it that way. I think most of the changes should be on the LLVM side to lower the standard EH IR down to something that is more compatible with MSVC EH.
 

I’ve been looking at the work Kai Nacke did in ldc to implement exception handling there, but it isn’t clear to me yet how relevant that is to clang.

 

Can you tell me more about what your plans are?  Specifically, do you intend to support both 32 and 64 bit targets?  And were you also planning to work toward C++ exception handling support in clang once you had the general SEH support in place?


I want to do Win64 first because it is easier and better documented, and then look at 32-bit next. 32-bit SEH does things like "take the address of a BB label from the middle of the parent function and 'call' it with a special ebp value passed in", but that is basically equivalent to the Win64 way of doing things with a very special calling convention.

I know some people are also interested in ARM (WoA), which should be similar to Win64, as it also uses pdata/xdata style unwind info.
 

Finally, and most importantly, what can I do to help?


I think there are some separable tasks here. 

The EH capture block intrinsics can probably be built in isolation from the outlining. We can probably make `get_capture_block` work with the result of `@llvm.frameaddress(i32 0)`. The inliner also has to be taught to avoid inlining functions that set up a capture block.

Doing outlining will be similar what `llvm::CloneAndPruneFunctionInto` does, except it will start at the landing pad instead of the entry block. Instead of mapping from parameters to arguments, the outliner would map the selector to a constant and propagate that value forwards, pruning conditional branches as it goes. The `resume` instruction would end outlining and become a `ret`. Any cloned `ret` instructions are the result of cloning something that is statically reachable but dynamically unreachable. We can transform them to `unreachable` and run standard cleanup passes to propagate that backwards.

32-bit x86 EH will require installing an alloca onto the fs:00 chain of EH handlers. I suppose this could be emitted during CodeGenPrepare as regular LLVM IR instructions, since we have a way of writing `load/store fs:00` with address space 257. This alloca should probably be the same as the capture block, since it has to be at some known offset from ebp.

Kaylor, Andrew

unread,
Nov 13, 2014, 6:29:59 PM11/13/14
to Reid Kleckner, LLVM Developers Mailing List

Thanks for the additional information.

 

Right now I’m experimenting with a mix of code compiled with MSVC and code compiled with clang, trying to get a C++ exception thrown and caught by the MSVC-compiled code across a function in the clang-compiled code.  My goal here is to isolate a small part of what needs to be done in a way that lends itself to tinkering.  I think this might lead me to the outlining of EH blocks that you describe below.

 

If the clang code doesn’t have and exception handler (and it can’t since clang won’t compile that right now) and doesn’t need to do any clean-up, this works fine.  If the clang code does need to do cleanup, clang currently emits the same landingpad stuff that it would emit for mingw and since I’m trying to link with the MSVC environment I end up with unresolved externals.  So I’m playing around with the clang-generated IR to see if I can turn it into something that will handle the cleanup and let the exception pass.  I’ve got it calling my custom SEH-style personality function and it’s trivial to get that to let the exception pass without doing the cleanup.  Now I just need to figure out how to get it to execute the cleanup code.

 

I haven’t spent a lot of time on this yet, so if this overlaps with what you’ve been doing I can step back and approach it from a different direction.  Otherwise, I’ll proceed and see if I can make use of your suggestions below with regard to outlining, probably starting with manual changes to the IR that simulate the process.

 

-Andy

 

 

From: Reid Kleckner [mailto:r...@google.com]
Sent: Thursday, November 13, 2014 11:51 AM
To: Kaylor, Andrew
Cc: LLVM Developers Mailing List; John McCall
Subject: Re: [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

 

Cool! Apologies for the following stream of consciousness brain dump...

Reid Kleckner

unread,
Nov 13, 2014, 7:25:02 PM11/13/14
to Kaylor, Andrew, LLVM Developers Mailing List
Focusing on cleanups is probably a good way to start. The trouble is that your personality function can't just reset rsp and jump to the landing pad, or it will trash the state of the unwinder that's still on the stack. Everything in the landing pad basically has to be outlined. If the outlining happens at the IR level, we need some way to represent that, and I don't really have it nailed down.

Here's an idea, just to brainstorm:

define void @parent() {
  invoke ... unwind to %lpad
  ...
lpad:
  %eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*)
      cleanup
      catch i8* @typeid1
      catch i8* @typeid2
  %label = call i8* (...)* @llvm.eh.outlined_handlers(
      void (i8*, i8*)* @my_cleanup,
      i8* @typeid1, i8* (i8*, i8*)* @my_catch1,
      i8* @typeid2, i8* (i8*, i8*)* @my_catch2)
  indirectbr i8* %label

endcatch:
  ...
}

define void @my_cleanup(i8*, i8*) {
  ...
  ret void ; unwinder will keep going for cleanups
}

define i8* @my_catch1(i8*, i8*) {
  ret i8* blockaddress(@parent, %endcatch) ; merge back into normal flow at endcatch
}

define i8* @my_catch2(i8*, i8*) {
  ret i8* blockaddress(@parent, %endcatch) ; merge back into normal flow at endcatch
}

I guess @llvm.eh.outlined_handlers wouldn't be valid outside a landing pad, and would only be introduced during CodeGenPrepare to allow the best optimization of the handlers in the context of the parent function.

Kaylor, Andrew

unread,
Nov 13, 2014, 8:23:06 PM11/13/14
to Reid Kleckner, LLVM Developers Mailing List

I don’t really have a good enough feeling for the landingpad syntax yet to comment on the most natural way to extend it yet, but creating a synthetic cleanup function to call from the personality function is what I was thinking.

 

With the current (trunk +/- a couple of weeks) clang, compiling for an “x86_64-pc-windows-msvc” target, I’m seeing a landingpad that looks like this:

 

lpad:                                             ; preds = %if.end, %if.then

  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*)

          cleanup

  %3 = extractvalue { i8*, i32 } %2, 0

  store i8* %3, i8** %exn.slot

  %4 = extractvalue { i8*, i32 } %2, 1

  store i32 %4, i32* %ehselector.slot

  call void @"\01??1Bob@@QEAA@XZ"(%class.Bob* %bob) #3  ; Calling the destructor for a class named “Bob”

  br label %eh.resume

 

Replacing __gxx_personality_v0 with the name of my custom personality function (which has the SEH signature) and scrubbing out the terminate and resume calls for the time being, I see my personality function being called twice -- first for the C++ exception (Exception code == 0xe06d7363) and once for the unwind.  So now I just need to figure out how to get a pointer to a cleanup function into the DispatcherContext->HandlerData, which must be where the extra stuff in the landingpad comes in, right?

 

Anyway, I think I’m making progress. :-)

Reid Kleckner

unread,
Nov 13, 2014, 8:30:56 PM11/13/14
to Kaylor, Andrew, LLVM Developers Mailing List
On Thu, Nov 13, 2014 at 5:19 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

I don’t really have a good enough feeling for the landingpad syntax yet to comment on the most natural way to extend it yet, but creating a synthetic cleanup function to call from the personality function is what I was thinking.


Pretty much.
 

With the current (trunk +/- a couple of weeks) clang, compiling for an “x86_64-pc-windows-msvc” target, I’m seeing a landingpad that looks like this:

 

lpad:                                             ; preds = %if.end, %if.then

  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*)

          cleanup

  %3 = extractvalue { i8*, i32 } %2, 0

  store i8* %3, i8** %exn.slot

  %4 = extractvalue { i8*, i32 } %2, 1

  store i32 %4, i32* %ehselector.slot

  call void @"\01??1Bob@@QEAA@XZ"(%class.Bob* %bob) #3  ; Calling the destructor for a class named “Bob”

  br label %eh.resume

 

Replacing __gxx_personality_v0 with the name of my custom personality function (which has the SEH signature) and scrubbing out the terminate and resume calls for the time being, I see my personality function being called twice -- first for the C++ exception (Exception code == 0xe06d7363) and once for the unwind.  So now I just need to figure out how to get a pointer to a cleanup function into the DispatcherContext->HandlerData, which must be where the extra stuff in the landingpad comes in, right?


It's got some docs here:

The two values are the exception pointer and the selector value. The selector value is an artifact of the way we model the Itanium EH scheme, and you can basically set it to zero if you only want to deal with cleanups for the time being. The exception pointer is presumably pulled from the arguments to the personality routine. Again, cleanups don't need it, so you can probably zero it too.

Anyway, I think I’m making progress. :-)


Nice! 

Bob Wilson

unread,
Nov 17, 2014, 8:25:50 PM11/17/14
to Reid Kleckner, LLVM Developers Mailing List
I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working?

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reid Kleckner

unread,
Nov 17, 2014, 8:52:58 PM11/17/14
to Bob Wilson, LLVM Developers Mailing List
On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.w...@apple.com> wrote:
I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working?

The "big dynamic alloca" approach does work, at least conceptually. It's more or less what MSVC does. They emit the normal code, then the epilogue, then a special prologue that resets ebp/rbp, and then continue with normal emission. Any local variables declared in the __except block are allocated in the parent frame and are accessed via ebp. Any calls create new stack adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly complicate a part of LLVM that is already poorly understood and hard to hack on. I think adding a pair of intrinsics that can't be inlined will be far less disruptive for the rest of LLVM. This is actually already the status quo for SjLj exceptions, which introduce a number of uninlinable intrinsic calls (although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout information you want to teach CodeGen to save. If we add the set_capture_block / get_capture_block intrinsics, then we only need to save the frame offset of *one* alloca. This is easy, we can throw it into a side table on MachineModuleInfo. If we don't go this way, we need to save just the right amount of CodeGen state to get stack offsets in some other function.

Having a single combined MachineFunction also means that MI passes will have to learn more about SEH. For example, we need to preserve the ordering of basic blocks so that we don't end up with discontiguous regions of code.

Bob Wilson

unread,
Nov 18, 2014, 1:56:52 PM11/18/14
to Reid Kleckner, LLVM Developers Mailing List
This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works?

For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.


Having a single combined MachineFunction also means that MI passes will have to learn more about SEH. For example, we need to preserve the ordering of basic blocks so that we don't end up with discontiguous regions of code.

Yes, you would probably need to do that. It doesn’t seem like that would be fundamentally difficult, but I haven’t thought through the details and I can imagine that it would take a fair bit of work.

Reid Kleckner

unread,
Nov 18, 2014, 2:12:16 PM11/18/14
to Bob Wilson, LLVM Developers Mailing List
On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.w...@apple.com> wrote:

On Nov 17, 2014, at 5:50 PM, Reid Kleckner <r...@google.com> wrote:

On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.w...@apple.com> wrote:
I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working?

The "big dynamic alloca" approach does work, at least conceptually. It's more or less what MSVC does. They emit the normal code, then the epilogue, then a special prologue that resets ebp/rbp, and then continue with normal emission. Any local variables declared in the __except block are allocated in the parent frame and are accessed via ebp. Any calls create new stack adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly complicate a part of LLVM that is already poorly understood and hard to hack on. I think adding a pair of intrinsics that can't be inlined will be far less disruptive for the rest of LLVM. This is actually already the status quo for SjLj exceptions, which introduce a number of uninlinable intrinsic calls (although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout information you want to teach CodeGen to save. If we add the set_capture_block / get_capture_block intrinsics, then we only need to save the frame offset of *one* alloca. This is easy, we can throw it into a side table on MachineModuleInfo. If we don't go this way, we need to save just the right amount of CodeGen state to get stack offsets in some other function.

This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works?

The capture block would be a boring old LLVM struct with a type created during CodeGenPrepare.

I'm imagining a pass similar to SjLjEHPrepare that:
- Identifies all bbs reachable from landing pads
- Identifies all SSA values live in those bbs
- Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj)
- Combine all allocas used in landing pad bbs into a single LLVM alloca with a new combined struct type
- Outline code from landing pads into cleanup handlers, filters, catch handlers, etc
- In the parent function entry block, call @llvm.eh.seh.set_capture_block on the combined alloca
- In the outlined entry blocks, call @llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer to the capture block. Cast it to a pointer to the right type.
- Finally, RAUW all alloca references with GEPs into the capture block

The downside is that this approach probably hurts register allocation and stack coloring, but I think it's a reasonable tradeoff.

Thanks for prompting me on this, it helps to write things down like this. :)
 
For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

Yep. I guess the question is, is CodeGenPrep the backend or not? 

Bob Wilson

unread,
Nov 18, 2014, 2:23:14 PM11/18/14
to Reid Kleckner, LLVM Developers Mailing List
On Nov 18, 2014, at 11:07 AM, Reid Kleckner <r...@google.com> wrote:

On Tue, Nov 18, 2014 at 10:50 AM, Bob Wilson <bob.w...@apple.com> wrote:

On Nov 17, 2014, at 5:50 PM, Reid Kleckner <r...@google.com> wrote:

On Mon, Nov 17, 2014 at 5:22 PM, Bob Wilson <bob.w...@apple.com> wrote:
I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working?

The "big dynamic alloca" approach does work, at least conceptually. It's more or less what MSVC does. They emit the normal code, then the epilogue, then a special prologue that resets ebp/rbp, and then continue with normal emission. Any local variables declared in the __except block are allocated in the parent frame and are accessed via ebp. Any calls create new stack adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly complicate a part of LLVM that is already poorly understood and hard to hack on. I think adding a pair of intrinsics that can't be inlined will be far less disruptive for the rest of LLVM. This is actually already the status quo for SjLj exceptions, which introduce a number of uninlinable intrinsic calls (although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout information you want to teach CodeGen to save. If we add the set_capture_block / get_capture_block intrinsics, then we only need to save the frame offset of *one* alloca. This is easy, we can throw it into a side table on MachineModuleInfo. If we don't go this way, we need to save just the right amount of CodeGen state to get stack offsets in some other function.

This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works?

The capture block would be a boring old LLVM struct with a type created during CodeGenPrepare.

I'm imagining a pass similar to SjLjEHPrepare that:
- Identifies all bbs reachable from landing pads
- Identifies all SSA values live in those bbs
- Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj)
- Combine all allocas used in landing pad bbs into a single LLVM alloca with a new combined struct type
- Outline code from landing pads into cleanup handlers, filters, catch handlers, etc
- In the parent function entry block, call @llvm.eh.seh.set_capture_block on the combined alloca
- In the outlined entry blocks, call @llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer to the capture block. Cast it to a pointer to the right type.
- Finally, RAUW all alloca references with GEPs into the capture block

The downside is that this approach probably hurts register allocation and stack coloring, but I think it's a reasonable tradeoff.

Thanks for prompting me on this, it helps to write things down like this. :)

No problem. Now that I see the details of what you have in mind, I can’t think of any reason why that wouldn’t work, and I like the way it isolates most of the impact of SEH into one new pass. Also, if the performance impact turns out to be worse than expected, I don’t see anything here that would prevent moving to the “big dynamic alloca” approach later.

 
For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

Yep. I guess the question is, is CodeGenPrep the backend or not? 

Yes, CGP is definitely backend. I thought you were going to say that the front-end needed to decide what goes in the capture block.

Kaylor, Andrew

unread,
Nov 18, 2014, 8:55:30 PM11/18/14
to Bob Wilson, Reid Kleckner, LLVM Developers Mailing List

> For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

 

Looking beyond SEH to C++ exception handling for a moment, it seems to me that clang may be handling more than it should there.  For instance, calls like “__cxa_allocate_exception” and “__cxa_throw_exception” are baked into the clang IR output, which seems to assume that the backend is going to be using libc++abi for its implementation.  Yet it has enough awareness that this won’t always be true that it coughs up an ErrorUnsupported failure for “isWindowsMSVCEnvironment” targets when asked to emit code for “try” or “throw”.

 

Should this be generalized with intrinsics?

 

Also, I’m starting to dig into the outlining implementation and there are some things there that worry me.  I haven’t compared any existing code that might be doing similar things, so maybe these issues will become clear as I get further into it, but it seemed worth bringing it up now to smooth the progress.  I’m trying to put together a general algorithm that starts at the landing pad instruction and groups the subsequent instructions as cleanup code or parts of catch handlers.  This is easy enough to do as a human reading the code, but the way that I’m doing so seems to rely fairly heavily on the names of symbols and labels. 

 

For instance, following the landingpad instruction I expect to find an extract and store of “exn.slot” and “ehselector.slot” then everything between that and wherever the catch dispatch begins must be (I think) cleanup code.  The catch handlers I’m identifying as a sequence that starts with a load of “exn.slot” and a call to __cxa_begin_catch and continues until it reaches a call to __cxa_end_catch.

 

The calls to begin/end catch are pretty convenient bookends, but identifying the catch dispatch code and pairing catch handlers with the clauses they represent seems to depend on recognizing the pattern of loading the ehselector, getting a typeid then comparing and branching.  I suppose that will work, but it feels a bit brittle.  Then there’s the cleanup code, which I’m not yet convinced has a consistent location relative to the catch dispatching and I fear may be moved around by various optimizations before the outlining and will potentially be partially shared with cleanup for other landing pads.

 

Then there’s the matter of what all of this will look like with SEH, but I haven’t given that much thought yet.

 

For now I’ll just happily push ahead in the hopes that this will all either resolve itself or turn out not to be much of a problem, but it seemed worth talking about now at least.

 

-Andy

 

 

From: Bob Wilson [mailto:bob.w...@apple.com]
Sent: Tuesday, November 18, 2014 11:19 AM
To: Reid Kleckner
Cc: Kaylor, Andrew; LLVM Developers Mailing List
Subject: Re: [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

 

 

On Nov 18, 2014, at 11:07 AM, Reid Kleckner <r...@google.com> wrote:

Reid Kleckner

unread,
Nov 18, 2014, 9:46:25 PM11/18/14
to Kaylor, Andrew, LLVM Developers Mailing List
On Tue, Nov 18, 2014 at 5:52 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

> For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

 

Looking beyond SEH to C++ exception handling for a moment, it seems to me that clang may be handling more than it should there.  For instance, calls like “__cxa_allocate_exception” and “__cxa_throw_exception” are baked into the clang IR output, which seems to assume that the backend is going to be using libc++abi for its implementation.  Yet it has enough awareness that this won’t always be true that it coughs up an ErrorUnsupported failure for “isWindowsMSVCEnvironment” targets when asked to emit code for “try” or “throw”.

 

Should this be generalized with intrinsics?


We should just teach Clang to emit calls to the appropriate runtime functions. This isn't needed for SEH because you don't "throw", you just crash.
  

Also, I’m starting to dig into the outlining implementation and there are some things there that worry me.  I haven’t compared any existing code that might be doing similar things, so maybe these issues will become clear as I get further into it, but it seemed worth bringing it up now to smooth the progress.  I’m trying to put together a general algorithm that starts at the landing pad instruction and groups the subsequent instructions as cleanup code or parts of catch handlers.  This is easy enough to do as a human reading the code, but the way that I’m doing so seems to rely fairly heavily on the names of symbols and labels. 

 
Look at lib/Transforms/Utils/CloneFunction.cpp. Most of that code should be factored appropriately and reused. It uses a ValueMapping that we should be able to apply to the landing pad instruction to map the ehselector.slot to a constant, and propagating that through.
 

For instance, following the landingpad instruction I expect to find an extract and store of “exn.slot” and “ehselector.slot” then everything between that and wherever the catch dispatch begins must be (I think) cleanup code.  The catch handlers I’m identifying as a sequence that starts with a load of “exn.slot” and a call to __cxa_begin_catch and continues until it reaches a call to __cxa_end_catch.


I think we'll have to intrinsic-ify __cxa_end_catch when targeting *-windows-msvc to get this right. If we don't, exception rethrows will probably not work. We don't really need an equivalent of __cxa_begin_catch because there's no thread-local EH state to update, it's already managed by the caller of the catch handler.
 

The calls to begin/end catch are pretty convenient bookends, but identifying the catch dispatch code and pairing catch handlers with the clauses they represent seems to depend on recognizing the pattern of loading the ehselector, getting a typeid then comparing and branching.  I suppose that will work, but it feels a bit brittle.  Then there’s the cleanup code, which I’m not yet convinced has a consistent location relative to the catch dispatching and I fear may be moved around by various optimizations before the outlining and will potentially be partially shared with cleanup for other landing pads.


We either have to pattern match the selector == typeid pattern in the EH preparation pass, or come up with a new representation. I'm hesitant to add a new EH representation that only MSVC compatible EH uses, because it will probably trip up existing optimizations. I was hoping that something like the pruning logic in "llvm::CloneAndPruneFunctionInto" would allow us to prune the selector comparison branches reliably.

Reid Kleckner

unread,
Nov 24, 2014, 4:40:31 PM11/24/14
to Kaylor, Andrew, LLVM Developers Mailing List
On Mon, Nov 24, 2014 at 12:12 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

Hi Reid,

 

I've been working on the outlining code and have a prototype that produces what I want for a simple case.

 

Now I'm thinking about the heuristics for recognizing the various logical pieces for C++ exception handling code and removing them once they’ve been cloned.  I've been working from various comments you've made earlier in this thread, and I'd like to run something by you to make sure we're on the same page.

 

Starting from a C++ function that looks like this:

... 

I'll have IR that looks more or less like this:

... 

If I've understood your intentions correctly, we'll have an outlining pass that transforms the above IR to this: 

... 

Does that look about like what you’d expect?


Yep! That's basically what I had in mind, but I still have concerns with this model listed below.

We should also think about how to call std::terminate when cleanup dtors throw. The current representation for Itanium is inefficient. As a strawman, I propose making @__clang_call_terminate an intrinsic:

  ...
  invoke void @dtor(i8* %this) to label %cont unwind label %terminate.lpad
cont:
  ret void
terminate.lpad:
  landingpad ... catch i8* null
  call void @llvm.eh.terminate()
  unreachable

This would be good for Itanium EH, as we can actually completely elide table entries for landing pads that just catch-all and terminate.
 

I just have a few questions.

 

I'm pretty much just guessing at how you intended the llvm.eh.set_capture_block intrinsic to work.  It wasn't clear to me if I just needed to set it where the structure was created or if it would need to be set anywhere an exception might be thrown.  The answer is probably related to my next question.


I was imagining it would be called once in the entry block.

Chandler expressed strong concerns about this design, however, as @llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you add this intrinsic, we *have* to do frame layout of @_Z13do_some_thingRi *before* we can emit code for all the callers of @llvm.eh.get_capture_block. Today, this is easy, because module order defines emission order, but in the great glorious future, codegen will hopefully be parallelized, and then we've inflicted this horrible constraint on the innocent.

His suggestion to break the ordering dependence was to lock down the frame offset of the capture block to always be some fixed offset known by the target (ie ebp - 4 on x86, if we like that).

In the above example I created a single capture block for the entire function.  That works reasonably well for a simple case like this and corresponds to the co-location of the allocas in the original IR, but for functions with more complex structures and multiple try blocks it could get ugly.  Do you have ideas for how to handle that?


Not really, it would just get ugly. All allocas used from landing pad code would get mushed into one allocation. =/
 

For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions.  I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?


I think it's actually harder than that, once you consider nested trys:
void f() {
  try {
    Outer outer;
    try {
      Inner inner;
      g();
    } catch (int) {
      // ~Inner gets run first
    }
  } catch (float) {
    // ~Inner gets run first
    // ~Outer gets run next
  }
  // uncaught exception? Run ~Inner then ~Outer.
}

It's easy to hit this case after inlining as well.

We'd have to generalize @llvm.eh.outlined_handlers more to handle this case. However, if we generalize further it starts to perfectly replicate the landing pad structure, with cleanup, catch, and then we'd want to think about how to represent filter. Termination on exception spec violation seems to be unimplemented in MSVC, so we'd need our own personality function to implement filters, but it'd be good to support them in the IR.

We also have to decide how much code duplication of cleanups we're willing to tolerate, and whether we want to try to annotate the beginning and end of cleanups like ~Inner and ~Outer.
 

I didn't do that in the mock-up above, but it seems like it would simplify things.  Basically, I'm imagining a final pattern that looks like this:

 

lpad:

  %eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

      cleanup

      catch i8* @typeid1

      catch i8* @typeid2

      ...

  %label = call i8* (...)* @llvm.eh.outlined_handlers(

      void (i8*, i8*)* @<pre-catch cleanup function>,

      i8* @typeid1, i8* (i8*, i8*)* @<typeid1 catch function>,

      i8* @typeid2, i8* (i8*, i8*)* @<typeid2 catch function>,

      ...

      void (i8*, i8*)* @<uncaught exception cleanup function>)

  indirectbr i8* %label

 

 

Finally, how do you see this meshing with SEH?  As I understand it, both the exception handlers and the cleanup code in that case execute in the original function context and only the filter handlers need to be outlined.  I suppose the outlining pass can look at the personality function and change its behavior accordingly.  Is that what you were thinking?


Pretty much. The outlining pass would behave differently based on the personality function. SEH cleanups (__finally blocks) actually do need to get outlined as well as filters, but catches (__except blocks) do not need to be outlined. That's the main difference. I think it reflects the fact that you can rethrow a C++ exception, but you can't faithfully "rethrow" a trap caught by SEH.

Kaylor, Andrew

unread,
Nov 24, 2014, 7:56:33 PM11/24/14
to Reid Kleckner, LLVM Developers Mailing List

Hi Reid,

 

I've been working on the outlining code and have a prototype that produces what I want for a simple case.

 

Now I'm thinking about the heuristics for recognizing the various logical pieces for C++ exception handling code and removing them once they’ve been cloned.  I've been working from various comments you've made earlier in this thread, and I'd like to run something by you to make sure we're on the same page.

 

Starting from a C++ function that looks like this:

 

void do_some_thing(int &i)

{

  Outer outer;

  try {

    Middle middle;

    if (i == 1) {

        do_thing_one();

    }

    else {

        Inner inner;

        do_thing_two();

    }

  }

  catch (int en) {

    i = -1;

  }

}

 

 

I'll have IR that looks more or less like this:

 

; Function Attrs: uwtable

define void @_Z13do_some_thingRi(i32* dereferenceable(4) %i) #0 {

entry:

  %i.addr = alloca i32*, align 8

  %outer = alloca %class.Outer, align 1

  %middle = alloca %class.Middle, align 1

  %exn.slot = alloca i8*

  %ehselector.slot = alloca i32

  %inner = alloca %class.Inner, align 1

  %en = alloca i32, align 4

  store i32* %i, i32** %i.addr, align 8

  call void @_ZN5OuterC1Ev(%class.Outer* %outer)

  invoke void @_ZN6MiddleC1Ev(%class.Middle* %middle)

          to label %invoke.cont unwind label %lpad

 

invoke.cont:                                      ; preds = %entry

  %0 = load i32** %i.addr, align 8

  %1 = load i32* %0, align 4

  %cmp = icmp eq i32 %1, 1

  br i1 %cmp, label %if.then, label %if.else

 

if.then:                                          ; preds = %invoke.cont

  invoke void @_Z12do_thing_onev()

          to label %invoke.cont2 unwind label %lpad1

 

invoke.cont2:                                     ; preds = %if.then

  br label %if.end

 

; From 'entry' invoke of Middle constructor

;   outer needs post-catch cleanup

lpad:                                             ; preds = %if.end, %entry

  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %3 = extractvalue { i8*, i32 } %2, 0

  store i8* %3, i8** %exn.slot

  %4 = extractvalue { i8*, i32 } %2, 1

  store i32 %4, i32* %ehselector.slot

  ; No pre-catch cleanup for this landingpad

  br label %catch.dispatch

 

; From 'if.then' invoke of do_thing_one()

; Or from 'if.else' invoke of Inner constructor

; Or from 'invoke.cont5 invoke of Inner destructor

;   middle needs pre-catch cleanup

;   outer needs post-catch cleanup

lpad1:                                            ; preds = %invoke.cont5, %if.else, %if.then

  %5 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %6 = extractvalue { i8*, i32 } %5, 0

  store i8* %6, i8** %exn.slot

  %7 = extractvalue { i8*, i32 } %5, 1

  store i32 %7, i32* %ehselector.slot

  ; Branch to shared label to do pre-catch cleanup

  br label %ehcleanup

 

if.else:                                          ; preds = %invoke.cont

  invoke void @_ZN5InnerC1Ev(%class.Inner* %inner)

          to label %invoke.cont3 unwind label %lpad1

 

invoke.cont3:                                     ; preds = %if.else

  invoke void @_Z12do_thing_twov()

          to label %invoke.cont5 unwind label %lpad4

 

invoke.cont5:                                     ; preds = %invoke.cont3

  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)

          to label %invoke.cont6 unwind label %lpad1

 

invoke.cont6:                                     ; preds = %invoke.cont5

  br label %if.end

 

; From 'invoke.cont3' invoke of do_something_two()

;   middle and inner need pre-catch cleanup

;   outer needs post-catch cleanup

lpad4:                                            ; preds = %invoke.cont3

  %8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %9 = extractvalue { i8*, i32 } %8, 0

  store i8* %9, i8** %exn.slot

  %10 = extractvalue { i8*, i32 } %8, 1

  store i32 %10, i32* %ehselector.slot

  ; Pre-catch cleanup begins here, but will continue at ehcleanup

  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)

          to label %invoke.cont7 unwind label %terminate.lpad

 

invoke.cont7:                                     ; preds = %lpad4

  br label %ehcleanup

 

if.end:                                           ; preds = %invoke.cont6, %invoke.cont2

  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)

          to label %invoke.cont8 unwind label %lpad

 

invoke.cont8:                                     ; preds = %if.end

  br label %try.cont

 

; Pre-catch cleanup for lpad1

; Continuation of pre-catch cleanup for lpad4

ehcleanup:                                        ; preds = %invoke.cont7, %lpad1

  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)

          to label %invoke.cont9 unwind label %terminate.lpad

 

invoke.cont9:                                     ; preds = %ehcleanup

  br label %catch.dispatch

 

; Catch dispatch for lpad, lpad1 and lpad4

catch.dispatch:                                   ; preds = %invoke.cont9, %lpad

  %sel = load i32* %ehselector.slot

  %11 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*)) #4

  %matches = icmp eq i32 %sel, %11

  br i1 %matches, label %catch, label %ehcleanup10

 

catch:                                            ; preds = %catch.dispatch

  %exn = load i8** %exn.slot

  %12 = call i8* @__cxa_begin_catch(i8* %exn) #4

  %13 = bitcast i8* %12 to i32*

  %14 = load i32* %13, align 4

  store i32 %14, i32* %en, align 4

  %15 = load i32** %i.addr, align 8

  store i32 -1, i32* %15, align 4

  call void @__cxa_end_catch() #4

  br label %try.cont

 

try.cont:                                         ; preds = %catch, %invoke.cont8

  call void @_ZN5OuterD1Ev(%class.Outer* %outer)

  ret void

 

; Post catch cleanup for lpad, lpad1

ehcleanup10:                                      ; preds = %catch.dispatch

  invoke void @_ZN5OuterD1Ev(%class.Outer* %outer)

          to label %invoke.cont11 unwind label %terminate.lpad

 

invoke.cont11:                                    ; preds = %ehcleanup10

  br label %eh.resume

 

eh.resume:                                        ; preds = %invoke.cont11

  %exn12 = load i8** %exn.slot

  %sel13 = load i32* %ehselector.slot

  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn12, 0

  %lpad.val14 = insertvalue { i8*, i32 } %lpad.val, i32 %sel13, 1

  resume { i8*, i32 } %lpad.val14

 

terminate.lpad:                                   ; preds = %ehcleanup10, %ehcleanup, %lpad4

  %16 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          catch i8* null

  %17 = extractvalue { i8*, i32 } %16, 0

  call void @__clang_call_terminate(i8* %17) #5

  unreachable

}

 

 

 

If I've understood your intentions correctly, we'll have an outlining pass that transforms the above IR to this:

 

%struct.do_some_thing.captureblock = type { %class.Outer, %class.Middle, %class.Inner, %i32* }

 

; Uncaught exception cleanup for lpad, lpad1 and lpad4

define void @do_some_thing_cleanup0(i8* %eh_ptrs, i8* %rbp) #0 {

entry:

  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)

  %outer = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 0

  invoke void @_ZN5OuterD1Ev(%class.Outer* %outer)

          to label %invoke.cont unwind label %terminate.lpad

 

invoke.cont:

  ret void

 

terminate.lpad:                                   ; preds = %ehcleanup10, %ehcleanup, %lpad4

  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          catch i8* null

  %1 = extractvalue { i8*, i32 } %0, 0

  call void @__clang_call_terminate(i8* %1) #5

  unreachable

}

 

; Catch handler for _ZTIi

define i8* @do_some_thing_catch0(i8* %eh_ptrs, i8* %rbp) #0 {

entry:

  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)

  %i.addr = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 4

  %1 = load i32** %i.addr, align 8

  store i32 -1, i32* %1, align 4

  ret i8* blockaddress(@_Z13do_some_thingRi, %try.cont)

}

 

; Outlined pre-catch cleanup handler for lpad1

define void @do_some_thing_cleanup1(i8* %eh_ptrs, i8* %rbp) #0 {

entry:

  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi, %rbp)

  ; Outlined from 'ehcleanup'

  %middle = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 1

  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)

          to label %invoke.cont unwind label %terminate.lpad

 

invoke.cont:

  ret void

 

terminate.lpad:                                   ; preds = %ehcleanup10, %ehcleanup, %lpad4

  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          catch i8* null

  %1 = extractvalue { i8*, i32 } %0, 0

  call void @__clang_call_terminate(i8* %1) #5

  unreachable

}

 

; Outlined pre-catch cleanup handler for 'lpad4'

define void @do_some_thing_cleanup2(i8* %eh_ptrs, i8* %rbp) #0 {

entry:

  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)

  ; Outlined from 'lpad4'

  %inner = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 2

  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)

          to label %invoke.cont unwind label %terminate.lpad

 

invoke.cont:                                     ; preds = %entry

  ; Outlined from 'ehcleanup'

  %middle = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 1

  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)

          to label %invoke.cont1 unwind label %terminate.lpad

 

invoke.cont1:

  ret void

 

terminate.lpad:                                   ; preds = %ehcleanup10, %ehcleanup, %lpad4

  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          catch i8* null

  %1 = extractvalue { i8*, i32 } %0, 0

  call void @__clang_call_terminate(i8* %1) #5

  unreachable

}

 

 

; Function Attrs: uwtable

define void @_Z13do_some_thingRi(i32* dereferenceable(4) %i) #0 {

entry:

  %capture.block = alloca %struct.do_some_thing.capture.block, align 1

  %i_addr = getelementptr inbounds %struct.do_some_thing_capture_block* %capture_block, i32 0, i32 3

  store i32* %i, i32** %i_addr, align 8

  llvm.eh.set_capture_block

  %eh.cont.label = alloca i8*

  %en = alloca i32, align 4

  store i32* %i, i32** %i.addr, align 8

  %outer = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 0

  call void @_ZN5OuterC1Ev(%class.Outer* %outer)

  %middle = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 1

  invoke void @_ZN6MiddleC1Ev(%class.Middle* %middle)

          to label %invoke.cont unwind label %lpad

 

invoke.cont:                                      ; preds = %entry

  %0 = load i32** %i.addr, align 8

  %1 = load i32* %0, align 4

  %cmp = icmp eq i32 %1, 1

  br i1 %cmp, label %if.then, label %if.else

 

if.then:                                          ; preds = %invoke.cont

  invoke void @_Z12do_thing_onev()

          to label %invoke.cont2 unwind label %lpad1

 

invoke.cont2:                                     ; preds = %if.then

  br label %if.end

 

; From 'entry' invoke of Middle constructor

;   outer needs post-catch cleanup

lpad:                                             ; preds = %if.end, %entry

  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(

      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,

      void (i8*, i8*)* @do_some_thing_cleanup0)

  indirectbr i8* %eh.cont.label

 

; From 'if.then' invoke of do_thing_one()

; Or from 'if.else' invoke of Inner constructor

; Or from 'invoke.cont5 invoke of Inner destructor

;   middle needs pre-catch cleanup

;   outer needs post-catch cleanup

lpad1:                                            ; preds = %invoke.cont5, %if.else, %if.then

  %5 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(

      void (i8*, i8*)* @do_some_thing_cleanup1,

      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,

      void (i8*, i8*)* @do_some_thing_cleanup0)

  indirectbr i8* %eh.cont.label

 

if.else:                                          ; preds = %invoke.cont

  %inner = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 2

  invoke void @_ZN5InnerC1Ev(%class.Inner* %inner)

          to label %invoke.cont3 unwind label %lpad1

 

invoke.cont3:                                     ; preds = %if.else

  invoke void @_Z12do_thing_twov()

          to label %invoke.cont5 unwind label %lpad4

 

invoke.cont5:                                     ; preds = %invoke.cont3

  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)

          to label %invoke.cont6 unwind label %lpad1

 

invoke.cont6:                                     ; preds = %invoke.cont5

  br label %if.end

 

; From 'invoke.cont3' invoke of do_something_two()

;   middle and inner need pre-catch cleanup

;   outer needs post-catch cleanup

lpad4:                                            ; preds = %invoke.cont3

  %8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

          cleanup

          catch i8* bitcast (i8** @_ZTIi to i8*)

  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(

      void (i8*, i8*)* @do_some_thing_cleanup2,

      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,

      void (i8*, i8*)* @do_some_thing_cleanup0)

  indirectbr i8* %eh.cont.label

 

if.end:                                           ; preds = %invoke.cont6, %invoke.cont2

  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)

          to label %invoke.cont8 unwind label %lpad

 

invoke.cont8:                                     ; preds = %if.end

  br label %try.cont

 

try.cont:                                         ; preds = %catch, %invoke.cont8

  call void @_ZN5OuterD1Ev(%class.Outer* %outer)

  ret void

}

 

 

Does that look about like what you’d expect?

 

I just have a few questions.

 

I'm pretty much just guessing at how you intended the llvm.eh.set_capture_block intrinsic to work.  It wasn't clear to me if I just needed to set it where the structure was created or if it would need to be set anywhere an exception might be thrown.  The answer is probably related to my next question.

 

In the above example I created a single capture block for the entire function.  That works reasonably well for a simple case like this and corresponds to the co-location of the allocas in the original IR, but for functions with more complex structures and multiple try blocks it could get ugly.  Do you have ideas for how to handle that?

 

For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions.  I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?

 

I didn't do that in the mock-up above, but it seems like it would simplify things.  Basically, I'm imagining a final pattern that looks like this:

 

lpad:

  %eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)

      cleanup

      catch i8* @typeid1

      catch i8* @typeid2

      ...

  %label = call i8* (...)* @llvm.eh.outlined_handlers(

      void (i8*, i8*)* @<pre-catch cleanup function>,

      i8* @typeid1, i8* (i8*, i8*)* @<typeid1 catch function>,

      i8* @typeid2, i8* (i8*, i8*)* @<typeid2 catch function>,

      ...

      void (i8*, i8*)* @<uncaught exception cleanup function>)

  indirectbr i8* %label

 

 

Finally, how do you see this meshing with SEH?  As I understand it, both the exception handlers and the cleanup code in that case execute in the original function context and only the filter handlers need to be outlined.  I suppose the outlining pass can look at the personality function and change its behavior accordingly.  Is that what you were thinking?

 

-Andy

 

Kaylor, Andrew

unread,
Nov 25, 2014, 6:11:41 PM11/25/14
to Reid Kleckner, LLVM Developers Mailing List

> We should also think about how to call std::terminate when cleanup dtors throw. The current representation for Itanium is inefficient. As a strawman, I propose making @__clang_call_terminate an intrinsic:

 

That sounds like a good starting point.

 

 

> Chandler expressed strong concerns about this design, however, as @llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you add this intrinsic, we *have* to do frame layout of @_Z13do_some_thingRi *before* we can emit code for all the callers of @llvm.eh.get_capture_block. Today, this is easy, because module order defines emission order, but in the great glorious future, codegen will hopefully be parallelized, and then we've inflicted this horrible constraint on the innocent.

 

> His suggestion to break the ordering dependence was to lock down the frame offset of the capture block to always be some fixed offset known by the target (ie ebp - 4 on x86, if we like that).

 

Chandler probably has a better feel for this sort of thing than I do.  I can’t think of a reason offhand why that wouldn’t work, but it makes me a little nervous.

 

What would that look like in the IR?  Would we use the same intrinsics and just lower them to use the known location?

 

I’ll think about this, but for now I’m happy to just proceed with the belief that it’s a solvable problem either way.

 

>> For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions.  I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?

> I think it's actually harder than that, once you consider nested trys:

> void f() {

>  try {

>    Outer outer;

>    try {

>      Inner inner;

>      g();

>    } catch (int) {

>      // ~Inner gets run first
>    }

>  } catch (float) {

>    // ~Inner gets run first

>    // ~Outer gets run next
>  }

>  // uncaught exception? Run ~Inner then ~Outer.
> }

 

I took a look at the IR that’s generated for this example.  I see what you mean.  So there is potentially cleanup code before and after every catch handler, right?

 

Do you happen to know offhand what that looks like in the .xdata for the _CxxFrameHandler3 function?

 

-Andy

 

Reid Kleckner

unread,
Nov 25, 2014, 8:31:23 PM11/25/14
to Kaylor, Andrew, LLVM Developers Mailing List
On Tue, Nov 25, 2014 at 3:09 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

> We should also think about how to call std::terminate when cleanup dtors throw. The current representation for Itanium is inefficient. As a strawman, I propose making @__clang_call_terminate an intrinsic:

 

That sounds like a good starting point.

 

 

> Chandler expressed strong concerns about this design, however, as @llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you add this intrinsic, we *have* to do frame layout of @_Z13do_some_thingRi *before* we can emit code for all the callers of @llvm.eh.get_capture_block. Today, this is easy, because module order defines emission order, but in the great glorious future, codegen will hopefully be parallelized, and then we've inflicted this horrible constraint on the innocent.

 

> His suggestion to break the ordering dependence was to lock down the frame offset of the capture block to always be some fixed offset known by the target (ie ebp - 4 on x86, if we like that).

 

Chandler probably has a better feel for this sort of thing than I do.  I can’t think of a reason offhand why that wouldn’t work, but it makes me a little nervous.

  

What would that look like in the IR?  Would we use the same intrinsics and just lower them to use the known location?


Chandler seems to be OK with get/set capture block, as long as the codegen ordering dependence can be removed. I think we can remove it by delaying the resolution of the frame offset to assembly time using an MCSymbolRef. It would look a lot like this kind of assembly:

my_handler:
  push %rbp
  mov %rsp, %rbp
  lea Lframe_offset0(%rdx), %rax ; This is now the parent capture block
  ...
  retq

parent_fn:
  push %rbp
  mov %rsp, %rbp
  push %rbx
  push %rdi
  subq $NN, %rsp
Lframe_offset0 = X + 2 * 8 ; Two CSRs plus some offset into the main stack allocation

I guess I'll try to make that work.

I’ll think about this, but for now I’m happy to just proceed with the belief that it’s a solvable problem either way.

 

>> For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions.  I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?

> I think it's actually harder than that, once you consider nested trys:

> void f() {

>  try {

>    Outer outer;

>    try {

>      Inner inner;

>      g();

>    } catch (int) {

>      // ~Inner gets run first
>    }

>  } catch (float) {

>    // ~Inner gets run first

>    // ~Outer gets run next
>  }

>  // uncaught exception? Run ~Inner then ~Outer.
> }

 

I took a look at the IR that’s generated for this example.  I see what you mean.  So there is potentially cleanup code before and after every catch handler, right?

 

Do you happen to know offhand what that looks like in the .xdata for the _CxxFrameHandler3 function?


I can't tell how the state tables arrange for the destructors to run in the right order, but they can accomplish this without duplicating the cleanup code into the outlined catch handler functions, which is nice.

I think we may be able to address this by emitting calls to start/stop intrinsics around EH cleanups, but that may inhibit optimizations.

Vadim Chugunov

unread,
Dec 2, 2014, 7:18:02 PM12/2/14
to Reid Kleckner, LLVM Developers Mailing List
Hi Reid,
Is this design supposed to be able to cope with asynchronous exceptions?   I am having trouble imagining how this would work without adding the ability to associate landing pads with scopes in LLVM IR.

Vadim


Reid Kleckner

unread,
Dec 2, 2014, 8:27:04 PM12/2/14
to Vadim Chugunov, LLVM Developers Mailing List
On Tue, Dec 2, 2014 at 4:15 PM, Vadim Chugunov <vad...@gmail.com> wrote:
Hi Reid,
Is this design supposed to be able to cope with asynchronous exceptions?   I am having trouble imagining how this would work without adding the ability to associate landing pads with scopes in LLVM IR.

Yes, but not from within the same function. My proposal is to simply outline __try bodies into another function and invoke it. __try blocks are very uncommon, but C++ destructor cleanups are. Outlining them is prohibitively expensive, and I am not proposing to do this. In other words, faithfully implementing MSVC's -EHa flag is a non-goal.

If you want a higher fidelity implementation, I would not propose adding unwind edges to basic blocks. This has been suggested in the past, but a lot of LLVM would need to be taught about this implicit control flow.

Instead, I would propose identifying all trapping instructions and adding intrinsics for them. Intrinsics can be called with invoke, which makes all of the implicit CFG edges explicit. Peter Collingbourne proposed adding iload and istore instructions, and this would basically be a less invasive version of that. Optimizations would obviously suffer on intrinsics instead of instructions, but that's a price you have to pay for async EH anyway.

Reid Kleckner

unread,
Dec 3, 2014, 3:57:47 PM12/3/14
to Vadim Chugunov, LLVM Developers Mailing List
On Tue, Dec 2, 2014 at 6:05 PM, Vadim Chugunov <vad...@gmail.com> wrote:
Sure, but memory access violations are not the only source of asynchronous exceptions.  There are also stack overflows, integer overflows (on platforms that support that in hardware), signaling NaNs, and so on...

I am curious, what do you think of the following idea: what if instead of creating landing pads for running destructors, there were intrinsics for marking start and end of the object's lifetime (along with a pointer to destructor)?  LLVM could then emit a table of live objects for each PC range into LSDA, and the personality routine would interpret that table and invoke destructors.  (catch() would still need a landing pad, of course).
/end bike-shedding

I don't think calls to start / end are good enough, because graphs don't have scope. We are seeing problems with lifetime start / end already. Consider a transformation which turns every branch into a branch to a single basic block which switches over all possible branch targets. This is a valid LLVM transformation, even if it is not an optimization, and it would be impossible to recover the natural scope-like information from start / end call pairs.

I also think that recovering from async exceptions will always be best effort. The kinds of exceptions you describe are essentially results of undefined behavior in LLVM IR, and can't be handled reliably. Unless we introduce specific constructs with defined behavior (trapping integer divide, trapping FP ops, trapping alloca), it will never work.

Chris Lattner had a proposal from a long time ago to add 'unwind' labels to every basic block, but it introduces a lot of implicit control flow, which we don't like:

You would do this:
  %p = call i8* malloc(i32 4)
  %xp = bitcast i8* %p to i32*
  ...

mybb: unwind to label %lpad1
  %x = load i32* %xp  ; edge to lpad1 here
  store i32 0, i32* %xp ; edge to lpad1 here
  call void @f() ; edge to lpad1 here
  br label %mybb2 ; cannot remove branch due to differing lpads

mybb2: unwind to label %lpad2
  ...

lpad:
  %xx = load i32* %xp ; we cannot make %xx a phi between %x and 0 due to implicit control flow. Maybe we could split mybb and then make the phi, but, ew.

This is a mountain. I think you can climb it, but I'm *not* signing up for it. :) Adding and invoking intrinsics for all possibly trapping operations seems much more tractable. Simply outlining try bodies is even easier.

Vadim Chugunov

unread,
Dec 3, 2014, 4:31:26 PM12/3/14
to Reid Kleckner, LLVM Developers Mailing List
If we added unwind target to every potentially throwing instruction (loads, stores, all binary operations), wouldn't all such instructions have to become BB terminators?   I'd expect that CFG would then end up consisting mostly of single-instruction BBs. This can't be good for compilation performance and optimizations...

Another vague idea: what if lifetime.start() returned some kind of a token, which lifetime.end() has to consume?   That would prevent transformations that don't preserve lifetime scopes (such as the one you've described), wouldn't it?

Vadim

Reid Kleckner

unread,
Dec 3, 2014, 4:36:42 PM12/3/14
to Kaylor, Andrew, LLVM Developers Mailing List
I went ahead and implemented @llvm.frameallocate in a patch here: http://reviews.llvm.org/D6493

Andrew, do you have a wip patch for outlining, or any lessons learned from attempting it? I think outlining is now the next step, so let me know if there's something you're actively working on so I can avoid duplicated effort. :)

Reid Kleckner

unread,
Dec 3, 2014, 4:44:34 PM12/3/14
to Vadim Chugunov, LLVM Developers Mailing List
On Wed, Dec 3, 2014 at 1:27 PM, Vadim Chugunov <vad...@gmail.com> wrote:
If we added unwind target to every potentially throwing instruction (loads, stores, all binary operations), wouldn't all such instructions have to become BB terminators?   I'd expect that CFG would then end up consisting mostly of single-instruction BBs. This can't be good for compilation performance and optimizations...

Yes. This merely exposes the high cost of what the user is requesting. We could invent a more compact representation for a run of single-instruction bbs that share unwind edges, but *reliable* async exception support is fundamentally bad for optimization. Analysis passes need to see all the implicit edges.

Another vague idea: what if lifetime.start() returned some kind of a token, which lifetime.end() has to consume?   That would prevent transformations that don't preserve lifetime scopes (such as the one you've described), wouldn't it?

No, the transform is still valid. The block with the switch would contain a massive list of phis between undef and the incoming SSA values that were previously used by successor basic blocks. The incoming undef edge is not actually dynamically reachable, so it's safe to add undef there.

Vadim Chugunov

unread,
Dec 3, 2014, 5:30:29 PM12/3/14
to Reid Kleckner, LLVM Developers Mailing List
So what's to become of llvm.lifetime.start/end?   Are they going to be removed or fixed?

Reid Kleckner

unread,
Dec 3, 2014, 6:18:36 PM12/3/14
to Kaylor, Andrew, LLVM Developers Mailing List
On Wed, Dec 3, 2014 at 2:13 PM, Kaylor, Andrew <andrew...@intel.com> wrote:

Hi Reid,

 

I saw your patch but haven’t looked closely at it yet.

 

I do have a work in progress for the outlining.  I expect to have something ready to share pretty soon, hopefully by the end of the week.  It won’t be ready for primetime, as it’s making a whole lot of assumptions about the structure of the IR, but I think it will work with a sample IR file based on what you posted in your earlier SEH review.  I expect that it will be a useful point of reference for discussion and that I should be able to quickly refactor it into something product ready once we’ve ironed out the expectations as to what the incoming IR will look like and how flexible the heuristics for identifying regions to outline need to be.


Nice. Sounds like I should move forward on the SEH patch, which assumes that filters are pre-outlined, and move on to @llvm.outlined_handlers to outlined __finally blocks next. Basically, teach CodeGen how to emit SEH tables for IR that is already in the outlined form.
  

One thing that I’ve discussed with a co-worker but haven’t explored in my implementation yet is the possibility of using CodeExtractor to do the outlining rather than basing it on the CloneFunction stuff.  My current implementation is based on CloneAndPruneFunctionInto as you suggested, but I wondered if CodeExtractor might not be a better starting point.  What do you think?


Hm, I wasn't aware of CodeExtractor. It looks like it basically reparents basic blocks and fixes up the SSA graph. This is efficient, but I don't think it will work if a BB is reachable by two landing pads. A BB can't be extracted twice, but it can be cloned twice. Also, CodeExtractor can't extract invokes. I would stick to cloning for now, and add extraction later as an optimization.
 

Also, at the moment I’m more or less ignoring the frame variable issue.  My ValueMaterializer is just creating new allocas with the name I want.  I think that will be easy enough to patch up once your llvm.frameallocate stuff is in place.  The implication of this is that right now I’m not looking for live variables before I start outlining, I’m just picking them up as I go.  It seems like that may need to change.

Reid Kleckner

unread,
Dec 3, 2014, 6:26:04 PM12/3/14
to Vadim Chugunov, Arnaud A. de Grandmaison, LLVM Developers Mailing List
On Wed, Dec 3, 2014 at 2:26 PM, Vadim Chugunov <vad...@gmail.com> wrote:
So what's to become of llvm.lifetime.start/end?   Are they going to be removed or fixed?

Arnaud is looking at them. They are only an optimization hint, though, so it's OK if analysis fails a variable is considered live across the entire function. It's not OK if we think that a landing pad is active over the entire function.

Kaylor, Andrew

unread,
Dec 3, 2014, 6:53:51 PM12/3/14
to Reid Kleckner, LLVM Developers Mailing List

Hi Reid,

 

I saw your patch but haven’t looked closely at it yet.

 

I do have a work in progress for the outlining.  I expect to have something ready to share pretty soon, hopefully by the end of the week.  It won’t be ready for primetime, as it’s making a whole lot of assumptions about the structure of the IR, but I think it will work with a sample IR file based on what you posted in your earlier SEH review.  I expect that it will be a useful point of reference for discussion and that I should be able to quickly refactor it into something product ready once we’ve ironed out the expectations as to what the incoming IR will look like and how flexible the heuristics for identifying regions to outline need to be.

 

One thing that I’ve discussed with a co-worker but haven’t explored in my implementation yet is the possibility of using CodeExtractor to do the outlining rather than basing it on the CloneFunction stuff.  My current implementation is based on CloneAndPruneFunctionInto as you suggested, but I wondered if CodeExtractor might not be a better starting point.  What do you think?

 

Also, at the moment I’m more or less ignoring the frame variable issue.  My ValueMaterializer is just creating new allocas with the name I want.  I think that will be easy enough to patch up once your llvm.frameallocate stuff is in place.  The implication of this is that right now I’m not looking for live variables before I start outlining, I’m just picking them up as I go.  It seems like that may need to change.

 

-Andy

 

 

From: Reid Kleckner [mailto:r...@google.com]

Sent: Wednesday, December 03, 2014 1:32 PM
To: Kaylor, Andrew

Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

 

I went ahead and implemented @llvm.frameallocate in a patch here: http://reviews.llvm.org/D6493

Arnaud A. de Grandmaison

unread,
Dec 4, 2014, 4:34:44 AM12/4/14
to Reid Kleckner, Vadim Chugunov, LLVM Developers Mailing List

 

 

From: Reid Kleckner [mailto:r...@google.com]
Sent: 04 December 2014 00:17
To: Vadim Chugunov; Arnaud De Grandmaison
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] RFC: How to represent SEH (__try / __except) in LLVM IR

 

On Wed, Dec 3, 2014 at 2:26 PM, Vadim Chugunov <vad...@gmail.com> wrote:

So what's to become of llvm.lifetime.start/end?   Are they going to be removed or fixed?

 

Arnaud is looking at them. They are only an optimization hint, though, so it's OK if analysis fails a variable is considered live across the entire function. It's not OK if we think that a landing pad is active over the entire function.

 

As far as I know, there is no plan to remove the lifetime.start/end or change their specification. I am working on improving how they are used in clang’s IR codegen --- which proves to be tricky. With a wider usage of them, there will probably have to be some corner cases to be fixed here and there in llvm’s passes.

Cheers,

Arnaud

Reply all
Reply to author
Forward
0 new messages