[llvm-dev] [RFC] Improving compact x86-64 compact unwind descriptors

32 views
Skip to first unread message

John Reagan via llvm-dev

unread,
Jan 26, 2018, 10:55:33 AM1/26/18
to via llvm-dev, Ron Brender
Here is our proposal to extend/enhance the x86-64 compact unwind
descriptors to fully describe the prologue/epilogue for asynchronous
unwinding.  I believe there are missing/lacking CFI directives as well,
but I'll save that for another thread.


Asynchronous Compact Unwind Descriptors
Ron Brender, VMS Software, Inc.
Revised January 25, 2018

1  Introduction
This document proposes means to extend so-called compact unwind
descriptors to support fully
asynchronous exception handling. This will make extended compact unwind
descriptors an
alternative to DWARF CFI (call frame information) for achieving
asynchronous exception
support.

Compact unwind descriptors can and have been used in both 64- and 32-bit
environments.
However, this proposal addresses only 64-bit environments. While the
ideas presented here can
be readily adapted for use in a 32-bit environment, for simplicity this
document makes no
attempt to do so.

There are generally three kinds of information that together form the
heart of modern software
exception handling systems:

 1. Information that is used to divide the remaining unwind information
into groups that are
    specific to particular regions of memory (often, but not
necessarily, associated with a
    single function) as well as provide a way to efficiently search for
and identify the
    grouping that is associated with a particular address in memory.

 2. Information that can be used to virtually or actually unwind from
the call frame of an
    executing function to the call frame of its caller (at the point of
the call).

 3. Identification of an associated personality routine that is invoked
by the general exception
    handling mechanization to guide how processing should proceed for a
given function, as
    well as additional “language specific data” needed for the
personality routine to do its
    job. Note that the personality routine and its related data are
specified as an adjunct to
    compiled code and totally opaque to the general mechanism (other
than the specified
    interface).

Note in particular that C++ exception handling is built on top of a
personality routine and
language specific data area ABI that itself can be implemented using
either DWARE CFI or
extended compact unwind information as described here. The choice
between the two is
transparent to C++.


2  Compact Unwind Overview
This section provides a brief overview of key features of the LLVM
compact unwind design. It
does not attempt a comprehensive re-statement of all aspects of the
design except to the extent
necessary to motivate and understand later proposed changes and
enhancements.

2.1  Compact Unwind Group Description
A compact unwind group consist of five fields, as follows:

| 63                             32 31                            0 |
|-------------------------------------------------------------------|
| STARTING-ADDRESS                                                  |
| LENGTH                           | COMPACT-UNWIND-DESCRIPTION     |
| PERSONALITY-FUNCTION-POINTER                                      |
| LANGUAGE-SPECIFIC-DATA-ADDRESS                                    |
|-------------------------------------------------------------------|

STARTING-ADDRESS (64-bits) is the lowest address of a region of memory
occupied by some
code, typically the entry point of a function.

LENGTH (32-bits) is the number of bytes included in this group,
typically including all and only
the code of a function.

COMPACT-UNWIND-DESCRIPTION (32-bits) is a description of the fully
formed frame of a
function and how to unwind it. This is described further following.

PERSONALITY-FUNCTION-POINTER (64-bit) is a pointer to the personality
routine.

LANGUAGE-SPECIFIC-DATA-ADDRESS (64-bits, sometimes abbreviated LSDA) is a
pointer to some data to be passed to the personality routine when it is
called.

A key observation is that the starting address plus length way of
describing a group means that
the set of groups for a compilation unit need not describe all of the
code in that unit. In
particular, it appears to be expected that no unwind information need be
generated for leaf
functions.

On the other hand, it is reasonable to expect that the groups that are
emitted are ordered by the
starting address. This means that a simple and fast binary search can be
used to map an address
to the group that applies to that address, if any.

It is useful to note that the run-time representation of unwind
information can vary from little
more than a simple concatenation of the compile-time information to a
substantial rewriting of
unwind information by the linker. The proposal favors simple
concatenation while maintaining
the same ordering of groups as their associated code.

2.2  Compact Unwind Frame Description
A compact unwind frame description describes a frame in sufficient
detail to be able to unwind
that frame to the frame of its caller.
 
| 31    28 27    24 23                                            0 |
|-------------------------------------------------------------------|
| FLAGS   | MODE   |                                                |
|-------------------------------------------------------------------|

At the top most level, there are four bits that are not of further
interest here. Interpretation of
these bits is neither used nor changed.

Also at the top-level is a 4-bit mode field. This is the tag of a
discriminated (tagged, variant)
union that selects the interpretation of the remaining 24 bits.

Of the 16 possible modes, only 5 are defined :

Code    Meaning      Description
0       Old          “Old” is presumed to refer to some historical
design that is no longer of interest.
                     It is treated here as Reserved.
1       RBP-based    The frame uses the RBP register as a frame pointer.
The size of the frame can
        frame        vary during execution.
2       RSP-based    The frame uses RSP as the frame pointer. The size
of the frame is fixed (at
        frame        compilation time).
3       Large RSP-   The frame uses RSP as the frame pointer, The size
of the frame is fixed (at
        based frame  compilation time); however, that size is too large
to express within this 32-bit
                     descriptor encoding.
4       DWARF        The frame, for whatever reason, cannot be
adequately described using the
        escape       compact unwind frame description. The remaining
24-bits are an index into
                     what the DWARF standard calls the .debug_frame
section (__eh_frame in
                     LLVM).

2.2.1  RBP-based Frame (MODE=1)
For a RBP-based frame, the remaining 24 bits are encoded as follows:

| 23        16 | 15 | 14                                          0 |
|-------------------------------------------------------------------|
| OFFSET       | 0  | REGS                                          |
|-------------------------------------------------------------------|

In a RBP-based frame the RBP register is pushed on the stack immediately
after the return
address, then RSP is moved to RBP. To unwind, RSP is restored with the
current RPB value,
then RBP is restored by popping off the stack, and the return is done by
popping the stack once
more into the instruction pointer.

All preserved registers are saved in a small range in the stack that
starts at RBP-8 to RBP-2040. 
The offset/8 relative to RBP is encoded in the 8-bit OFFSET field. The
registers saved are
encoded in the 15-bit REGS field as five 3-bit entries.

2.2.2  RSP-Based Frame (MODE=2)
For a RSP-based frame, the remaining 24 bits are encoded as follows:

|23        16 | 15      13 | 12      10 | 9                       0 |
|-------------------------------------------------------------------|
| SIZE        |            | CNT        | REG_PERM                  |
|-------------------------------------------------------------------|

In a RSP-based frame the stack pointer serves directly as the frame
pointer and RBP is available
for use as a general register. Upon entry, the stack pointer is
decremented by 8*SIZE bytes (the
maximum stack allocation is thus 2040 bytes). To unwind, the stack size
is added to the stack
pointer, and completed by popping the stack once more into the
instruction pointer.

All preserved registers are saved on the stack immediately after the
return address.  The number
of registers saved (up to 6) is encoded in the 3-bit CNT field. The
11-bit REG_PERM field
encodes which registers were saved and in what order.

2.2.3  Large RSP-Based Frame (MODE=3)
For a large RSP-based frame, the remaining 24 bits are encoded as follows:

|23        16 | 15      13 | 12      10 | 9                       0 |
|-------------------------------------------------------------------|
| SIZE        | ADJ        | CNT        | REG_PERM                  |
|-------------------------------------------------------------------|

This case is like the previous, except the stack size is too large to
encode in the compact unwind
encoding.  Instead, the function must include a "subq $nnnnnnnn, RSP"
instruction in its
prologue to allocate the stack.  The offset from the entry point of the
function to the nnnnnnnn
value in the function is given in the SIZE field.

Depending on the exact instructions used to save registers (PUSH versus
MOV), the nnnnnnnn
value in the instruction stream may not be quite the full stack size.
ADJ * 8 is the additional
adjustment needed to get the actual size.

2.2.4  DWARF Escape (MODE=4)
The frame, for whatever reason, cannot be adequately described using a
compact unwind frame
description. The remaining 24-bits are an index into what the DWARF
standard calls the
.debug_frame section (called __eh_frame in LLVM).


3  Asynchronous Changes and Enhancements
It is immediately obvious that omission of unwind information for leaf
functions (with any kind
of frame) precludes handling an exception that might occur during its
execution. It follows that
unwind information must cover all of the code of a module (with one
exception discussed
below). But if successive unwind groups are ordered (as previously
assumed) and also leave no
gaps, then the LENGTH field is redundant and can be omitted. The
beginning address of a
following group is always one byte past the end of the predecessor
group. There remains only the
question of how to identify the last group of a set.

It should also be clear that the unwind representation described in the
prior section is not
sufficient to unwind from an asynchronous exception that might occur in
either the prologue or
epilogue of a function. To see this consider what would happen if an
exception occurred at either
the entry point or the return instruction of either a RBP- or RSP-frame
function. To be able to
handle asynchronous exceptions at any point during function execution,
it is necessary to add
additional information to each unwind group.

These two considerations can be combined. The result is simply to
repurpose the LENGTH field
to encode prologue and epilogue information.

3.1  Extended MODEs
To preserve backward compatibility and to allow intermixing of
traditional and extended
compact unwind groups, new MODEs are defined as follows:

Code        Meaning                   Description
----------------------------------------------------------------------------------------------
8           Null frame                See below.
9           Extended RBP-based frame  Like mode 1 but combined with
extended prologue/epilogue
                                      information in place of a LENGTH field
10          Extended RSP-based frame  Like mode 2 but combined with
extended prologue/epilogue
                                      information in place of a LENGTH field
11          Extended Large RSP-based  Like mode 3 but combined with
extended prologue/epilogue
            frame                     information in place of a LENGTH field
12          Extended parameter(s)     See below.
 
A null frame (MODE = 8) is the simplest possible frame, with no
allocated stack of either kind
(hence no saved registers).  A null frame might be considered just a
special case of a RSP-based
frame with a stack size of zero. But unlike other frames, its frame
pointer is usually not 16-byte
aligned.
 
It appears technically feasible for a null frame function to have a
personality routine. However,
the utility of such a capability seems too meager to justify allowing
this. We propose to not
support this.

There remains the question of whether it is necessary or at least
desirable to strictly apply the
requirement that “all” code be covered by an unwind group. Based on
successful experience in
OpenVMS on the Itanium architecture (as well as Windows systems we
think), this alternative is
proposed:

    If the first attempt to lookup an unwind group for an exception
address fails, then it is
    (tentatively) assumed to have occurred within a null frame function
or in a part of a function
    that is adequately described by a null frame. The presumed return
address is (virtually or
    actually) popped from the top of stack and looked up. This second
attempted lookup must
    succeed, in which case processing continues normally. A failure is a
fatal error.

This concept of a “default null frame group” is especially convenient
for dealing with
disconnected code sequences such as trampolines, PLTs, and the like.

The extended RBP-based and RSP-based frame modes (MODEs = 9 and 10) are
simple
supersets of their more traditional predecessors.

The large RSP-based frame mode (MODE = 11) is also a superset of mode 3
except that instead
of finding the stack size in the instruction stream, it is found in a
preceding extended parameter
unwind group (MODE = 12). This difference is essential for support of
execute-only code.

To understand the extended parameter group, suppose that two groups
occur one after the other
but have the same STARTING-ADDRESS value. A binary search using the
STARTING-
ADDRESS field will ignore the first of the pair because the address
being sought can never be at
least as large as the first but less than the second. Such a group can
be used as an escape
convention that allows inserting additional information into the
sequence of groups that
otherwise cannot be easily included.

The extended modes are described more fully in the following sections.

3.2  Function Parts
Functions are generally considered to consist of three distinct parts:

1. A prologue, which allocates local storage, saves registers that must
be preserved for the
   benefit of callers, and possibly other housekeeping (such as
initializing any local state
   necessary for the correct management of the function’s execution).

2. A body, which performs the main work of the function.

3. An epilogue, which makes the result available to the caller, and
undoes the effects of the
   prologue (restores preserved registers, deallocates local storage,
and so on).

Here a word of caution is in order. This vocabulary tends to be applied
at multiple levels in
software architecture and each level has its own set of issues to
consider and resulting
requirements to be observed.

Note that a null frame function has no distinct prologue, body or
epilogue. Every instruction can
be viewed as simultaneously in all three parts, or in none of them.

This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 8 in the lower part.

| 63                            48 | 47                          32 |
|-------------------------------------------------------------------|
| RESERVED                         | 0            ...             0 |
|-------------------------------------------------------------------|


| 31                            16 | 15                           0 |
|-------------------------------------------------------------------|
| 0 0 0 0 | 1 0 0 0 | 0   ...    0 | 0            ...             0 |
|-------------------------------------------------------------------|

3.2.1  Function Prologue
For the purposes of exception handling, the key steps are:

 1. Allocate space on the stack in which to save preserved registers and
for other purposes
 2. Save the registers that must be preserved.

The DWARF call frame information (CFI) provides a description that is
precise and accurate at
each and every instruction in a function (not limited to just the
prologue). But experience shows
that this representation is bulky in space and expensive in time to
decode and interpret. Indeed,
compact unwind descriptors as described in Section 2 were created to
avoid these issues.

Experience with OpenVMS on the Alpha and Itanium architectures shows
that a key constraint
can greatly simplify the unwind description: require that no preserved
register can be changed
until all of them have been saved. The last such save ends the prologue
and the next instruction
begins the body. An unwind does not need (must not) restore any
preserved registers because
they are still valid. A simple length from the entry point suffices to
describe this boundary.
[The body does not necessarily need to begin until at least one of the
preserved registers is
actually changed.]
 
3.2.1.1  RSP-Based Frame (MODE = 10)
For a RSP-based frame, it is also necessary to know which instruction
adjusts the stack pointer:
before this instruction an unwind must only (virtually or actually) pop
the return address; and
after this instruction, the stack pointer must first be incremented to
deallocate the stack frame,
then the instruction pointer restored.

This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 10 in the lower part.

| 63                       48 47 46             40 39            32 |
|-------------------------------------------------------------------|
| RESERVED                   |  |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|


Except for the MODE, the lower part is exactly like it would be for MODE
= 2.

PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue
relative to the
STARTING-ADDRESS, up to and including the instruction that allocates the
stack.

PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue
relative to the end
of the first part, up to a point after the last preserved register is
saved and before the first
preserved register is changed. (Recall, this point may not be unique.)

PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.

The maximum prologue size allowed here is much greater than will be
typical. This is deliberate
to allow compilers freedom to use shrink-wrap type optimizations in
which safe operations that
need no body state are moved from the body to the beginning of the
prologue. This avoids the
cost of stack frame setup and teardown for simple special case handling
that often leads to an
early exit.

3.2.1.2  Large RSP-based Frame (MODE = 11)
This case is like the previous, except the stack size is too large to
encode in the compact unwind
encoding.  The SIZE field is interpreted as an offset relative to the
beginning of the containing
unwind group to (what is necessarily) an earlier extended parameter group.

3.2.1.3  RBP-Based Frame (MODE = 9)
The RBP-based frame is the same as for a RSP-based frame except that
there are two instructions
that mark the transition between the two parts of the prologue: the push
of the prior RBP value
onto the stack followed by the copy of RSP to RBP to establish the new
frame pointer.

There appears to be no reason to not make this simplifying requirement:
the push and copy
always occur together.

This leads to the following proposal for the upper part of the extended
compact unwind
quadword for use in combination with MODE = 9 in the lower part.

| 63                       48 47 46             40 39            32 |
|-------------------------------------------------------------------|
| RESERVED                   |  |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|

Except for the MODE, the lower part is exactly like it would be for MODE
= 1.

PROLOGUE-SIZE1 is the length in bytes of the first part of the prologue
relative to the
STARTING-ADDRESS, up to and including the instruction that pushes the
prior RBP contents.

PROLOGUE-SIZE2 is the length in bytes of the second part of the prologue
relative to the end
of the first part, up to a point after the last preserved register is
saved and before the first
preserved register is changed. (Recall, this point may not be unique.)

PROLOGUE-SIZE1 + PROLOGUE-SIZE2 gives the total size of the prologue.

3.2.2  Function Epilogue
For the purposes of exception handling, the key steps in an epilogue are:

 1. Restore the registers that were preserved
 2. Deallocate space on the stack that was used to preserve registers
and for other purposes

A key observation is that restoring registers is idempotent. That is, if
in the midst of restoring
registers when an exception occurs, it will do no harm if all of the
preserved registers are
restored again when unwinding. In effect, restoration of registers need
not be distinguished from
the body of a function. It is not until reaching the first instruction
that deallocates stack is
executed that the “real” epilogue begins.

3.2.2.1  RSP-Based Frame (MODE = 10 or 11)
For a RSP-based frame, whether small or large, the only unwind action
remaining after stack
deallocation is to pop the return address into the instruction pointer
(RIP). In the vast majority of
cases this means that the epilogue of the code described by an unwind
group is just a 1-byte RET
instruction that occurs at the end of the unwind group. There are other
possibilities but they are
rare and con be dealt with separately (see below).

It suffices to include a single 1-bit field, E, in the upper part of the
extended compact unwind
description:

| 63                       48 47 46             40 39            32 |
|-------------------------------------------------------------------|
| RESERVED                  | E |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|


The E flag indicates that the unwind group ends with a standard 1-byte
RET instruction.

3.2.2.2  RBP-Based Frame (MODE = 9)
The RBP-based frame is much the same as for a RSP-based frame except
that there are two
instructions that mark the transition between the two parts of the
prologue: the copy of RBP to
RSP to cut back the stack, followed by the pop of RBP to restore what
may be the callers frame
pointer. Thus the epilogue, for the purposes of unwinding, begins
immediately after the copy of
RBP to RSP. In the vast majority of cases, the immediately following two
instructions will
consist of POP RBP and RET. The POP is not idempotent because it changes
the stack pointer,
but the unwind processing code can distinguish whether the stack has
been popped or not based
on the address (3 versus 1 byte less than the highest address of the group).

It suffices to include a single 1-bit field, E, in the upper part of the
extended compact unwind
description:

| 63                       48 47 46             40 39            32 |
|-------------------------------------------------------------------|
| RESERVED                  | E |  PROLOGUE-SIZE2 |  PROLOGUE-SIZE1 |
|-------------------------------------------------------------------|

The E flag indicates that the unwind group ends with a standard
3-instruction return sequence.

3.3  Special Issues
Up until this point, it might appear that each unwind group corresponds
one-for-one to the code
for a single function. While common, this is not required. Important
variations are illustrated
here.

Issue/Problem                                        Possible Resolution
-------------------------------------------------------------------------------------------------
The first part of the prologue is too large to be    Use two unwind groups:
described by the 8-bit RxP-FRAME-SIZE1 field.        1) a null frame
group of any appropriate size,
                                                     2) a suitable
RxP-based frame.

The second part of the prologue is too large to be   Use three unwind
groups:
described by the 8-bit RxP-FRAME-SIZE2 field.        1) a RSP-frame
group to describe the first part of
                                                     the prologue only
(PROLOGUE-SIZE1 = the
                                                     length of the
entire group and PROLOGUE-SIZE2
                                                     = 0,
                                                     2) a RSP-frame
group to describe the second part
                                                     of the prologue
(PROLOGUE-SIZE1 & 2 both = 0
                                                     and no registers
saved),
                                                     3) a suitable
RxP-based frame group (with
                                                     PROLOGUE-SIZE1 & 2
both = 0).

There is more than one return location (without a    Generally use one
unwind group for each sequence
shared epilogue sequence).                           of code ending with
a RET. A group following a
                                                     RET will have no
prologue (PROLOGUE-SIZE1
                                                     & 2 both = 0).

There is more than one entry point to a function     Generally use one
unwind group for each sequence
(for example, a FORTRAN multiple entry point         of code beginning
at an entry point. Each group
subroutine or analogous assembly language            may have a normal
prologue and all but the last
equivalents)                                         might have NO
epilogue (E flag clear).
 

These examples are not exhaustive, of course. But they illustrate the
flexibility of the
mechanism, which should be suitable for the vast majority of cases in
practice.

3.4  Number of Unwind Groups
A simple concatenation of unwind groups by the linker that combines
unwind group sections
from multiple modules does not, of itself, provide direct information to
the unwind software
regarding how many unwind groups are present. However, the image
information (ELF
segments) should suffice to determine this number based on the size of
the segment and the
known size of each group. Alternatively, the linker might create special
symbols that can be used
to help determine the number of groups.

This proposal leaves this detail to target-specific linker, image loader
and exception handling
system to specify.

3.5  Interaction with Code Generation
The relatively simple and fixed size nature of the extended compact
unwind information
proposed here depends on observing certain restrictions on optimization
that might affect code in
prologues and epilogues. These restrictions were noted where relevant
throughout this section.
Here is a summary of those requirements:

 1. Use of any preserved register must be delayed until all of the
preserved registers have
    been saved.
 2. In the prologue of a function with a RBP-based frame, the two
instructions that save the
    prior RBP contents and copy the stack pointer to RBP must be kept
adjacent. Similarly,
    in the epilogue of a function with a RBP-based frame, the two
instructions that cut back
    the stack and restore the prior RBP contents must be kept adjacent.
[This adjacency
    requirement could be relaxed by introducing an additional MODE that
corresponds to having
    the RBP saved in memory but not yet updated from RSP. This does not
seem worthwhile.]
 3. For shrink wrap optimization in the presence of a
handler/personality routine, care must
    be taken to not move instructions into the prologue that might cause
an exception.
    Floating point exceptions and access violations are of particular
concern in this regard.

4  System Specific Extensions for OpenVMS
Omitted since nobody other than OpenVMS should care but it describes
adding an additional
four byte extension to the extended compact unwind group to pass along
some information from
the VAX Macro-32 compiler about VAX register usage.

signature.asc

John Reagan via llvm-dev

unread,
Jan 26, 2018, 12:24:24 PM1/26/18
to via llvm-dev
Apologies for the readability/formatting of the document.  It started
out as Word doc.  I've put the PDF in my DropBox

https://www.dropbox.com/s/hyvit0tnfao19v3/Asynchronous%20Compact%20Unwind%20Descriptors%202018-01-25.pdf?dl=0

John


signature.asc

Jason Molenda via llvm-dev

unread,
Jan 26, 2018, 7:20:08 PM1/26/18
to John Reagan, Ron Brender, via llvm-dev, Francis Visoiu Mistrih
Hi John & Ron, I read through the proposal and had a couple of quick observations.

1. The proposed encoding assumes that the epilogue instructions always come at the end of the function -- or rather, just before the next function. If there is a stack protector __stack_chk_fail sequence, or there is NOP padding between functions, then the epilogue cannot be expressed. The proposed encoding allows for instructions scheduled before the prologue, as long as they're guaranteed to never throw an exception or spill registers, but there's no similar affordance for after the epilogue.

2. In section 3.1, "A null frame (MODE = 8) is the simplest possible frame, with no allocated stack of either kind (hence no saved registers). A null frame might be considered just a special case of a RSP-based frame with a stack size of zero. But unlike other frames, its frame pointer is usually not 16-byte aligned." Did you meant stack pointer here? The x86_64 SysV abi requires 16-byte alignment of the stack pointer before a CALL instruction, but I don't know of any alignment requirements on the frame pointer.

3. In the same section, you propose an alternative "default null frame group" be defined,

"If the first attempt to lookup an unwind group for an exception address fails, then it is (tentatively) assumed to have occurred within a null frame function or in a part of a function that is adequately described by a null frame. The presumed return address is (virtually or actually) popped from the top of stack and looked up. This second attempted lookup must succeed, in which case processing continues normally. A failure is a fatal error."

I suspect I missed a part of the proposal covering this, but if entries are described by only start address, is there a problem of the previous non-null function's range extending & picking up these functions if they don't have entries?

4. For frameless functions -- small-stack RSP functions -- you're assuming that the compiler saves and restores spilled registers with MOV instructions instead of PUSH/POP instructions. I think this is the normal behavior of the compiler, something like

subq $0x98, %rsp
movq %rsi, 0x10(%rsp)
movq %rdi, 0x18(%rsp)

or whatever, but the proposed encoding can only allow for one change in stack size. The current encoding requires this for large-stack RSP functions (we get the offset to the RSP instruction where we read the amount being decremented in the compact unwind encoding), but for small stack frameless functions the compiler can today express in compact unwind a combination of sub and push instructions as long as it's a fixed amount within the range that the small-stack RSP encoding can express.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nick Kledzik via llvm-dev

unread,
Jan 26, 2018, 9:54:32 PM1/26/18
to John Reagan, Ron Brender, via llvm-dev
John and Ron,

I developed the original compact unwind implementation for macOS 10.6 back in 2009. I tried to leave space in the design to support finer grain exception handling such as for asynchronous or for the shrink wrap optimization. The idea I had at the time was instead of having just one 32-bit compact unwind info per function, there could be an array of them each covering a different range. All but the first would have the UNWIND_IS_NOT_FUNCTION_START bit set.

On macOS, the linker takes the 32-byte “compact unwind group description” records and folds them down into a complete different (read only) runtime table which basically maps an offset in the DSO into a 32-bit compact info for that address.

It seems like from your proposal that your linker keeps the “compact unwind group description” records as is and just concatenates them, and the runtime would need to understand the prolog/epilog encoding. I would think that just widening the group record (as you suggest in 5.1) would be much simpler (like SHT_REL was widened to SHT_RELA).

-Nick

Ron Brender via llvm-dev

unread,
Jan 28, 2018, 6:14:32 PM1/28/18
to llvm...@lists.llvm.org
>Hi John & Ron, I read through the proposal and had a couple of quick
observations.

    Hi Jason. Thanks for the interesting observations and questions.

>1. The proposed encoding assumes that the epilogue instructions always
come at the end of the  function -- or rather, just before the next
function.  If there is a stack protector __stack_chk_fail sequence, or
there is NOP padding between functions, then the epilogue cannot be
expressed.  The proposed encoding allows for instructions scheduled
before the prologue, as long as they're guaranteed to never throw an
exception or spill registers, but there's no similar affordance for
after the epilogue.

    There are at least two alternatives for this situation:
    a.  The main group can be ended at the RET instruction and
        a new group started to cover the following bytes.
        Interestingly, both NOPs and a __stack_chk_fail call
        sequence are not really unwindable instructions.
        Perhaps a new NOUNWIND mode should be added for such
        cases; this would protect against an asynchronous
        exception attempting what is likely a dangerous stack
        walk.
    b.  Starting a new group is admittedly a rather large hammer
        that is perhaps OK if rare but not if it is common. If
        common, an alternative might be to widen the E (epilogue)
        field to allow more idiomatic alternatives. 4 bits might
        be enough: 8 codes for a RET followed by 0 to 7 NOPs +
        1 code for RET + __stack_chk_fail call + 1 code for
        just a __stack_chk_fail call + 1 code for no RET and
        nothing following. For anything else, fall back to a.
    Personally I rather like the combination of a. + b.

>2. In section 3.1, "A null frame (MODE = 8) is the simplest possible
frame, with no allocated stack of either kind (hence no saved
registers). A null frame might be considered just a special case of a
RSP-based frame with a stack size of zero. But unlike other frames,
its frame pointer is usually not 16-byte aligned." Did you meant stack
pointer here?  The x86_64 SysV abi requires 16-byte alignment of the
stack pointer before a CALL instruction, but I don't know of any
alignment requirements on the frame pointer.

    Yes, it would be clearer to say the stack pointer is not
    16-byte aligned.

    Aside: in my vocabulary, the frame pointer for a RSP-based
    or null frame function is the stack pointer. (And I find it
    an abuse of the English language to refer to a RSP-based
    frame as "frameless".) So what I wrote strictly is correct,
    but it does tend to confuse. Sorry...

3. In the same section, you propose an alternative "default null frame
group" be defined,

>"If the first attempt to lookup an unwind group for an exception
address fails, then it is (tentatively) assumed to have occurred
within a null frame function or in a part of a function that is
adequately described by a null frame. The presumed return address is
(virtually or actually) popped from the top of stack and looked up.
This second attempted lookup must succeed, in which case processing
continues normally. A failure is a fatal error."

I suspect I missed a part of the proposal covering this, but if
entries are described by only start address, is there a problem of the
previous non-null function's range extending & picking up these
functions if they don't have entries?

    For normal entries, I admit I sorta hand waved in Section 3.4
    over how to determine the end address of the last group. There
    are lots of alternatives. Most involve some kind of dummy
    extended parameter group (MODE =12) where the STARTING-ADDRESS
    is one more than the last address of the last "real" group.
    There might also be linker assistance of some kind. In any
    case I think it a problem that need not be solved upfront but
    can wait for broader issues to settle.

    The default unwind group has no starting address, of course.
    It is defined by what is left over after all other normal
    groups have been considered.

>4. For frameless functions -- small-stack RSP functions -- you're
assuming that the compiler saves and restores spilled registers with
MOV instructions instead of PUSH/POP instructions.  I think this is
the normal behavior of the compiler, something like

  subq    $0x98, %rsp
  movq    %rsi, 0x10(%rsp)
  movq    %rdi, 0x18(%rsp)

or whatever, but the proposed encoding can only allow for one change
in stack size.  The current encoding requires this for large-stack RSP
functions (we get the offset to the RSP instruction where we read the
amount being decremented in the compact unwind encoding), but for
small stack frameless functions the compiler can today express in
compact unwind a combination of sub and push instructions as long as
it's a fixed amount within the range that the small-stack RSP encoding
can express.

    You are quite right, with the understanding that "the range
    that the small-stack RSP encoding can express" is in fact the
    body of the function and excludes the prologue and epilogue.
    Our proposal, of course, is trying to describe the prologue and
    epilogue as well.

    My understanding of modern x86 processors is that there is
    no meaningful performance difference between using pushes/pops
    versus moves. That is, using only moves in prologues and
    epilogues is a negligible price to pay in our quest for
    a simple and compact unwind representation.
    But I do take the point that this should be mentioned in
    Section 3.5 regarding code generation implications.

Thanks again,
Ron


On Fri, Jan 26, 2018 at 7:19 PM, Jason Molenda via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi John & Ron, I read through the proposal and had a couple of quick observations.
...


Ron Brender via llvm-dev

unread,
Jan 29, 2018, 4:37:09 PM1/29/18
to Nick Kledzik, via llvm-dev, John Reagan
Hi Nick,

It is a pleasure to be in contact with the creator of the compact unwind approach!

I can see how an array of 32-bit unwind blocks could be used to describe each distinct point within a function (within a prolog in particular). But then you end up with six or seven or more such blocks for a large percentage of functions, don't you? Seems like a lot of additional space for something that is usually simple, compact (sic) and idiomatic and dilutes the original benefit. We are seeking a summary description in a small fixed size to supplement the base unwind block.

Since you mention it, why have the UNWIND_IS_NOT_FUNCTION_START flag? Seems like you don't need it for unwinding purposes. Especially in a function that has a code layout where the entry point is not the first/lowest addressed instruction of the function (which I have seen in some GEM produced code for Alpha). An aside, but just curious.

Not being a MAC guy the macOS two-level lookup scheme has always been rather a mystery. What little I know is gleaned from the compact_unwind_encoding.h file found around the net. But I have never seen the .c/.cxx/.cpp that goes with it to better understand how the mapping works. Always figured the was considered proprietary but maybe I just don't know where/how to look. Is there public code and/or design note information to look at?

You are correct that this paper tends toward trying to create a scheme with little or no linker involvement. There are good reasons to ask more of the linker for better space economy, so this is still under discussion.

You are also correct that the unwinder needs to interpret the additional prolog/epilog information. But the rest of your comment has me confused: the extra 2-bytes is just a widening as you suggest, right? And there is no Section 5.1--did you mean 3.1?

Thanks again,
Ron

Jason Molenda via llvm-dev

unread,
Jan 29, 2018, 9:48:35 PM1/29/18
to Ron Brender, via llvm-dev, John Reagan
For what it's worth, there are two implementations of consuming linked compact unwind in lldb http://lldb.llvm.org . In tools/compact-unwind there is a standalone reader that I wrote in 2014 to make sure I understood the format from the compact_unwind_encoding.h header when I first read it. I should really go back and make sure that it is correct all the way through or remove it, I have a vague memory that it may not interpret something correctly in how it decodes the unwind state information.

I can more strongly recommend the reader in source/Symbol/CompactUnwindInfo.cpp -- lldb has been using this information for unwinds for a few years now. This is written as a part of lldb so it won't be quite as simple to understand, but it's all there. lldb has been using this for for a few years now. I don't read/use the LSDA or personality information yet.

Nick Kledzik via llvm-dev

unread,
Jan 29, 2018, 10:42:41 PM1/29/18
to Ron Brender, via llvm-dev, John Reagan

On Jan 29, 2018, at 1:36 PM, Ron Brender <ron.b...@gmail.com> wrote:

Hi Nick,

It is a pleasure to be in contact with the creator of the compact unwind approach!

I can see how an array of 32-bit unwind blocks could be used to describe each distinct point within a function (within a prolog in particular). But then you end up with six or seven or more such blocks for a large percentage of functions, don't you?
You would normally just need one region for the prolog and one for the epilog.  In the compact unwind info you can have mode that describes which kind of prolog/epilog is used for all the common cases.  Only if instructions are scheduled into the middle of a prolog would not need more ranges.

Seems like a lot of additional space for something that is usually simple, compact (sic) and idiomatic and dilutes the original benefit. We are seeking a summary description in a small fixed size to supplement the base unwind block.

Since you mention it, why have the UNWIND_IS_NOT_FUNCTION_START flag? Seems like you don't need it for unwinding purposes.
The LSDA info for C++ encodes the landing pads (e.g. catch clause inside a function) as offsets from the start of the function. If each compact info range was independent, the runtime could not the function start to pass to the personality function, which the personality function adds to the offset and computes the landing pad. By marking the non-start compact infos this way, the runtime can walk backwards until to it finds the first range without that bit and know it is the actual start of the function.


Especially in a function that has a code layout where the entry point is not the first/lowest addressed instruction of the function (which I have seen in some GEM produced code for Alpha). An aside, but just curious.

Not being a MAC guy the macOS two-level lookup scheme has always been rather a mystery. What little I know is gleaned from the compact_unwind_encoding.h file found around the net. But I have never seen the .c/.cxx/.cpp that goes with it to better understand how the mapping works.
You can see the runtime unwinder that parses the compact unwind section in UnwindCursor<>::getInfoFromCompactEncodingSection() in 

The goal is to make the section read-only (no load time fixups), small, and fast


Always figured the was considered proprietary but maybe I just don't know where/how to look. Is there public code and/or design note information to look at?

You are correct that this paper tends toward trying to create a scheme with little or no linker involvement. There are good reasons to ask more of the linker for better space economy, so this is still under discussion.
If there is no linker optimization, does that mean at runtime the array of group entries in unsorted and you have to do a linear search? 


You are also correct that the unwinder needs to interpret the additional prolog/epilog information. But the rest of your comment has me confused: the extra 2-bytes is just a widening as you suggest, right? And there is no Section 5.1--did you mean 3.1?
In the PDF I see section 5 “Other OpenVMS Refinements”.  I read that as maybe changing the layout of the group record (widening).

-Nick
Reply all
Reply to author
Forward
0 new messages