[llvm-dev] [RFC] Improving compact x86-64 compact unwind descriptors

>Hi John & Ron, I read through the proposal and had a couple of quick
observations.

    Hi Jason. Thanks for the interesting observations and questions.

>1. The proposed encoding assumes that the epilogue instructions always
come at the end of the  function -- or rather, just before the next
function.  If there is a stack protector __stack_chk_fail sequence, or
there is NOP padding between functions, then the epilogue cannot be
expressed.  The proposed encoding allows for instructions scheduled
before the prologue, as long as they're guaranteed to never throw an
exception or spill registers, but there's no similar affordance for
after the epilogue.

    There are at least two alternatives for this situation:
    a.  The main group can be ended at the RET instruction and
        a new group started to cover the following bytes.
        Interestingly, both NOPs and a __stack_chk_fail call
        sequence are not really unwindable instructions.
        Perhaps a new NOUNWIND mode should be added for such
        cases; this would protect against an asynchronous
        exception attempting what is likely a dangerous stack
        walk.
    b.  Starting a new group is admittedly a rather large hammer
        that is perhaps OK if rare but not if it is common. If
        common, an alternative might be to widen the E (epilogue)
        field to allow more idiomatic alternatives. 4 bits might
        be enough: 8 codes for a RET followed by 0 to 7 NOPs +
        1 code for RET + __stack_chk_fail call + 1 code for
        just a __stack_chk_fail call + 1 code for no RET and
        nothing following. For anything else, fall back to a.
    Personally I rather like the combination of a. + b.

>2. In section 3.1, "A null frame (MODE = 8) is the simplest possible
frame, with no allocated stack of either kind (hence no saved
registers). A null frame might be considered just a special case of a
RSP-based frame with a stack size of zero. But unlike other frames,
its frame pointer is usually not 16-byte aligned." Did you meant stack
pointer here?  The x86_64 SysV abi requires 16-byte alignment of the
stack pointer before a CALL instruction, but I don't know of any
alignment requirements on the frame pointer.

    Yes, it would be clearer to say the stack pointer is not
    16-byte aligned.

    Aside: in my vocabulary, the frame pointer for a RSP-based
    or null frame function is the stack pointer. (And I find it
    an abuse of the English language to refer to a RSP-based
    frame as "frameless".) So what I wrote strictly is correct,
    but it does tend to confuse. Sorry...

3. In the same section, you propose an alternative "default null frame
group" be defined,

>"If the first attempt to lookup an unwind group for an exception
address fails, then it is (tentatively) assumed to have occurred
within a null frame function or in a part of a function that is
adequately described by a null frame. The presumed return address is
(virtually or actually) popped from the top of stack and looked up.
This second attempted lookup must succeed, in which case processing
continues normally. A failure is a fatal error."

I suspect I missed a part of the proposal covering this, but if
entries are described by only start address, is there a problem of the
previous non-null function's range extending & picking up these
functions if they don't have entries?

    For normal entries, I admit I sorta hand waved in Section 3.4
    over how to determine the end address of the last group. There
    are lots of alternatives. Most involve some kind of dummy
    extended parameter group (MODE =12) where the STARTING-ADDRESS
    is one more than the last address of the last "real" group.
    There might also be linker assistance of some kind. In any
    case I think it a problem that need not be solved upfront but
    can wait for broader issues to settle.

    The default unwind group has no starting address, of course.
    It is defined by what is left over after all other normal
    groups have been considered.

>4. For frameless functions -- small-stack RSP functions -- you're
assuming that the compiler saves and restores spilled registers with
MOV instructions instead of PUSH/POP instructions.  I think this is
the normal behavior of the compiler, something like

  subq    $0x98, %rsp
  movq    %rsi, 0x10(%rsp)
  movq    %rdi, 0x18(%rsp)

or whatever, but the proposed encoding can only allow for one change
in stack size.  The current encoding requires this for large-stack RSP
functions (we get the offset to the RSP instruction where we read the
amount being decremented in the compact unwind encoding), but for
small stack frameless functions the compiler can today express in
compact unwind a combination of sub and push instructions as long as
it's a fixed amount within the range that the small-stack RSP encoding
can express.

    You are quite right, with the understanding that "the range
    that the small-stack RSP encoding can express" is in fact the
    body of the function and excludes the prologue and epilogue.
    Our proposal, of course, is trying to describe the prologue and
    epilogue as well.

    My understanding of modern x86 processors is that there is
    no meaningful performance difference between using pushes/pops
    versus moves. That is, using only moves in prologues and
    epilogues is a negligible price to pay in our quest for
    a simple and compact unwind representation.
    But I do take the point that this should be mentioned in
    Section 3.5 regarding code generation implications.

Thanks again,
Ron

[llvm-dev] [RFC] Improving compact x86-64 compact unwind descriptors

John Reagan via llvm-dev

John Reagan via llvm-dev

Jason Molenda via llvm-dev

Nick Kledzik via llvm-dev

Ron Brender via llvm-dev

Ron Brender via llvm-dev

Jason Molenda via llvm-dev

Nick Kledzik via llvm-dev