[llvm-dev] [RFC] Asynchronous unwind tables attribute

35 views
Skip to first unread message

Momchil Velikov via llvm-dev

unread,
Nov 17, 2021, 6:19:06 AM11/17/21
to LLVM Mailing list, Momchil Velikov
On one hand, we have the `uwtable` attribute in LLVM IR, which tells
whether to emit CFI directives. On the other hand, we have the `clang
-cc1` command-line option `-funwind-tables=1|2 ` and the codegen
option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
asynchronous unwind tables (2)`.
Thus we lose along the way the information whether we want just some
unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

We could, for example, extend the `uwtable` attribute with an optional
value, e.g.
- `uwtable` (default to 2)
- `uwtable(1)`, sync unwind tables
- `uwtable(2)`, async unwind tables
- `uwtable(3)`, async unwind tables, but tracking only a subset of
registers (e.g. CFA and return address)

Or add a new attribute `async_uwtable`.

Other suggestions? Comments?

~chill

--
Compiler scrub, Arm
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Fāng-ruì Sòng via llvm-dev

unread,
Nov 20, 2021, 3:26:28 AM11/20/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov
On Wed, Nov 17, 2021 at 3:19 AM Momchil Velikov via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> On one hand, we have the `uwtable` attribute in LLVM IR, which tells
> whether to emit CFI directives. On the other hand, we have the `clang
> -cc1` command-line option `-funwind-tables=1|2 ` and the codegen
> option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
> asynchronous unwind tables (2)`.
> Thus we lose along the way the information whether we want just some
> unwind tables or asynchronous unwind tables.

Thanks for starting the topic. I am very interested in the topic and
would like to see that CFI gets improved.

I have looked into -funwind-tables/-fasynchronous-unwind-tables and
done some relatively simple changes
like (default to -fasynchronous-unwind-tables for aarch64/ppc,
fix -f(no-)unwind-tables/-f(no-)asynchronous-unwind-tables/make
-fno-asynchronous-unwind-tables work with instrumentation,
add `-funwind-tables=1|2 `) but haven't done anything on the IR level.
It's good to see that someone picks up the heavylift work so that I
don't need to do it:)
That said, if you need a reviewer or help on some work items, feel
free to offload some to me.

> Asynchronous unwind tables take more space in the runtime image, I'd
> estimate something like 80-90% more, as the difference is adding
> roughly the same number of CFI directives as for prologues, only a bit
> simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
> more, if you consider tail duplication of epilogue blocks.
> Asynchronous unwind tables could also restrict code generation to
> having only a finite number of frame pointer adjustments (an example
> of *not* having a finite number of `SP` adjustments is on AArch64 when
> untagging the stack (MTE) in some cases the compiler can modify `SP`
> in a loop).

The restriction on MTE is new to me as I don't know much about MTE yet.

>
> Having the CFI precise up to an instruction generally also means one
> cannot bundle together CFI instructions once the prologue is done,
> they need to be interspersed with ordinary instructions, which means
> extra `DW_CFA_advance_loc` commands, further increasing the unwind
> tables size.
>
> That is to say, async unwind tables impose a non-negligible overhead,
> yet for the most common use cases (like C++ exceptions), they are not
> even needed.
>
> We could, for example, extend the `uwtable` attribute with an optional
> value, e.g.
> - `uwtable` (default to 2)
> - `uwtable(1)`, sync unwind tables
> - `uwtable(2)`, async unwind tables
> - `uwtable(3)`, async unwind tables, but tracking only a subset of
> registers (e.g. CFA and return address)
>
> Or add a new attribute `async_uwtable`.
>
> Other suggestions? Comments?

I have thought about extending uwtable as well. In spirit the idea
looks great to me.
The mode removing most callee-saved registers is useful.
For example, I think linux-perf just uses pc/sp/fp (as how its ORC
unwinder is designed).

My slight concern with uwtable(3) is that the amount of unwind
information is not monotonic.
Since sync->async and the number of registers are two dimensions,
perhaps we should use two function attributes?

>
> ~chill

BTW, are you working on improving the general CFI problems for aarch64?
I tried to understand the implementation limitation in September (in
https://reviews.llvm.org/D109253) but then stopped.
If you have patches, I'll be happy to study them:)

I know there are quite problems like:

(a) .cfi_* directives in prologue are less precise

% cat a.c
void foo() {
asm("" ::: "x23", "x24", "x25");
}
% clang --target=aarch64-linux-gnu a.c -S -o -
...
foo: // @foo
.cfi_startproc
// %bb.0: // %entry
str x25, [sp, #-32]! // 8-byte Folded Spill
stp x24, x23, [sp, #16] // 16-byte Folded Spill
.cfi_def_cfa_offset 32 ////// should be immediately after
the pre-increment str
.cfi_offset w23, -8
.cfi_offset w24, -16
.cfi_offset w25, -32
//APP
//NO_APP

(b) .cfi_* directives (for MachineInstr::FrameDestroy) in epilogue are
generally missing

(c) A basic block following an exit block may have wrong CFI
information (this can be fixed with .cfi_restore)

Most problems apply to all non-x86 targets.

---

Since we are discussing asynchronous unwind tables, may I ask two
slightly off-topic things?

(1) What's your opinion on ld --no-ld-generated-unwind-info?
Mine is https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#no-ld-generated-unwind-info

(2) How should future stack unwinding strategy evolve?
Hardware assisted approach like leveraging shadow call stack?
Making FP more efficient so that user code can leverage
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer and drop
inefficient (both size and run-time performance) .eh_frame?

Last year I wrote a post
https://maskray.me/blog/2020-11-08-stack-unwinding as I learn stack
unwinding.
I am going to amend it to include my recent thoughts.

Momchil Velikov via llvm-dev

unread,
Nov 20, 2021, 10:57:22 AM11/20/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
On Sat, 20 Nov 2021 at 08:26, Fāng-ruì Sòng <mas...@google.com> wrote:
> > Asynchronous unwind tables could also restrict code generation to
> > having only a finite number of frame pointer adjustments (an example
> > of *not* having a finite number of `SP` adjustments is on AArch64 when
> > untagging the stack (MTE) in some cases the compiler can modify `SP`
> > in a loop).
>
> The restriction on MTE is new to me as I don't know much about MTE yet.

It has nothing to do with MTE per se, I just noticed it in an MTE test
(`llvm/test/CodeGen/AArch64/settag.ll:stg_alloca17()`).
I've got a patch for this, that just uses an extra scratch register (that's in
the epilogue before popping CSRs and we have plenty of registers) and a constant
number (usually one) of SP adjustments by a constant.


> > We could, for example, extend the `uwtable` attribute with an optional
> > value, e.g.
> >   -  `uwtable` (default to 2)
> >   -  `uwtable(1)`, sync unwind tables
> >   -  `uwtable(2)`, async unwind tables
> >   -  `uwtable(3)`, async unwind tables, but tracking only a subset of
> > registers (e.g. CFA and return address)
> >
> > Or add a new attribute `async_uwtable`.
> >
> > Other suggestions? Comments?
>
> I have thought about extending uwtable as well. In spirit the idea
> looks great to me.
> The mode removing most callee-saved registers is useful.
> For example, I think linux-perf just uses pc/sp/fp (as how its ORC
> unwinder is designed).
>
> My slight concern with uwtable(3) is that the amount of unwind
> information is not monotonic.
> Since sync->async and the number of registers are two dimensions,
> perhaps we should use two function attributes?

I reckon this matters when combining (for whatever reasons) multiple `uwtable` attributes?
Indeed, in my first version, I dropped the encoding 3 and then I was able to synthesize
the attribute for an outlined function by simply taking the max of the attribute
in the outlined-from functions - it was just simpler.

How about instead we exchange the meaning of 2 and 3 so we get
  - 1, sync unwind tables
  - 2, "minimal" async unwind tables
  - 3, full async unwind tables
Then on the principle that we should always emit CFI information that the `uwtable` requested
(as it may be an ABI mandate), possibly optimised, depending on the `nounwind` attribute, we would get:

          | nounwind 0           |  nounwind 1
----------+----------------------+--------------
uwtable 0 | sync, full           |  no CFI
----------+----------------------+--------------
uwtable 1 | sync, full           |  sync, full
----------+----------------------+--------------
uwtable 2 | async, full prologue,|
          | mininal epilogue     |  async, min
----------+----------------------+--------------
uwtable 3 | async, full          |  async, full

as a starting point, and then backends may choose any of the entries
in the following rows of the same column, as a QOI decision.

All that said, I'm not even entirely convinced we need it as a separate `uwtable` option.
The decision to skip some of the CFI instructions can be made during final object encoding.
It probably has to be made during the final encoding, e.g. no point including epilogue CFI instructions
in `.eh_frame`, or an ORC generator would naturally ignore most CFI instructions anyway.

> BTW, are you working on improving the general CFI problems for aarch64?
Yeah, I'm implementing support for `-fasynchronous-unwind-tables`. A slightly outdated
series of patches start from https://reviews.llvm.org/D112330

The full list I have right now is:
* [AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop
  - this fixes the issue described above
* [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer
  - this fixes some case(s) where load/store optimiser moves an SP inc/dec after the matching CFI instruction
* [CodeGen] Async unwind - add a pass to fix CFI information
  - this is a pass that inserts `.cfi_remember_state`/`.cfi_restore_state`, ideally should work
    for all targets and replace `CFIInstrInserter`
* [AArch64] Async unwind - function epilogues
* [AArch64] Async unwind - function prologues
  - these are the core functionality
* [AArch64] Async unwind - Refactor generation of shadow call stack prologue/epilogue
* [AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true
* [AArch64] Async unwind - helper functions to decide on CFI emission
  - the three above: preparation/refactoring/simplification, `emitEpilogue` especially is a big mess
* [AArch64] Async unwind - do not schedule frame setup/destroy
* Extend the `uwtable` attribute with unwind table kind
(I was meaning to update it for a few days now, only always something else pops up ...)

> Since we are discussing asynchronous unwind tables, may I ask two
> slightly off-topic things?
>
> (1) What's your opinion on ld --no-ld-generated-unwind-info?

I would say, from a design point of view, an unwinder of any kind should not analyse and interpret machine
instructions as it's, in the general case, fragile - that's been my experience from developing and maintaining
an unwinder that analysed prologues/epilogues, over a period of 10+ years, each new compiler version required adjustments.

Then, PLT entries are likely to be a special case as they are both tiny and extremely unlikely to change between
different compilers or different compiler versions. In a sense, one can treat them as having implicit identical unwind
table entries (of any unwind table kind) associated with their address range, therefore explicit entries in the regular
unwind tables are superfluous.

> (2) How should future stack unwinding strategy evolve?
Well, that's a good question ... :D

Fāng-ruì Sòng via llvm-dev

unread,
Nov 20, 2021, 2:12:45 PM11/20/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov

I have trouble understanding "min" in the uwtable 2 row.
What does it mean?

As IR attributes, I'd hope the behavior is strictly monotonic in every
dimensions. I.e. If uwtable A provides some information, I'd expect that
uwtable B provides at least the same amount of information if A < B.

We may end up with too many dimensions.
For such rare ones (e.g. the number of registers), I think it is
entirely fine to omit it from IR attributes and make it an
llvm::TargetOptions::* option, like -ffunction-sections. We just lose
fine-grained LTO merge behavior which is probably not needed at all.

Thanks:)

>> Since we are discussing asynchronous unwind tables, may I ask two
>> slightly off-topic things?
>>
>> (1) What's your opinion on ld --no-ld-generated-unwind-info?
>
>I would say, from a design point of view, an unwinder of any kind should
>not analyse and interpret machine
>instructions as it's, in the general case, fragile - that's been my
>experience from developing and maintaining
>an unwinder that analysed prologues/epilogues, over a period of 10+ years,
>each new compiler version required adjustments.
>
>Then, PLT entries are likely to be a special case as they are both tiny and
>extremely unlikely to change between
>different compilers or different compiler versions. In a sense, one can
>treat them as having implicit identical unwind
>table entries (of any unwind table kind) associated with their address
>range, therefore explicit entries in the regular
>unwind tables are superfluous.

Thanks for the comments.

>> (2) How should future stack unwinding strategy evolve?
>Well, that's a good question ... :D

* compact unwind scheme.
* hardware assisted. Piggybacking on security hardening features like shadow call stack. However, this unlikely provides more information about callee-saved registers.
* mainly FP-based. People don't use FP due to performance loss. If `-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer` doesn't hurt performance that much, it may be better than some information in `.eh_frame`. We can use unwind information to fill the gap, e.g. for shrink wrapping.

If we go with the FP route, the expectation from .eh_frame may be lower.
We just need it to fill the gap.
We can probably indicate the intention with a codegen only llvm::TargetOptions option as well.

>~chill
>
>--
>Compiler scrub, Arm

Momchil Velikov via llvm-dev

unread,
Nov 20, 2021, 7:07:19 PM11/20/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
On Sat, 20 Nov 2021 at 19:12, Fāng-ruì Sòng <mas...@google.com> wrote:
>
> On 2021-11-20, Momchil Velikov wrote:
> >How about instead we exchange the meaning of 2 and 3 so we get
> >  - 1, sync unwind tables
> >  - 2, "minimal" async unwind tables
> >  - 3, full async unwind tables
> >Then on the principle that we should always emit CFI information that the
> >`uwtable` requested
> >(as it may be an ABI mandate), possibly optimised, depending on the
> >`nounwind` attribute, we would get:
> >
> >          | nounwind 0           |  nounwind 1
> >----------+----------------------+--------------
> >uwtable 0 | sync, full           |  no CFI
> >----------+----------------------+--------------
> >uwtable 1 | sync, full           |  sync, full
> >----------+----------------------+--------------
> >uwtable 2 | async, full prologue,|
> >          | mininal epilogue     |  async, min
> >----------+----------------------+--------------
> >uwtable 3 | async, full          |  async, full
> >
> >as a starting point, and then backends may choose any of the entries
> >in the following rows of the same column, as a QOI decision.
>
> I have trouble understanding "min" in the uwtable 2 row.
> What does it mean?

"min" would be minimal unwind information, suitable just for getting the list of callers.
Thus, if we request "minimal" asynchronous unwind tables, and a function has the
"nounwind" attribute, then we can fully honour that request and emit CFI instructions
for CFA and PC only, both in the prologue and the epilogue, whereas if we don't have
the "nounwind" attribute, there's no other option other than to also include CFI instructions
for CSRs as well, but only in the prologue, the epilogue stays the same.


> As IR attributes, I'd hope the behavior is strictly monotonic in every
> dimensions. I.e. If uwtable A provides some information, I'd expect that
> uwtable B provides at least the same amount of information if A < B.

I'm not sure how useful would that be, as the sensible end result would also
depend on the "nounwind" attribute.

Also, I'm basically equating "asynchronous unwind info" with generating CFI
instructions for epilogues. In principle, CFI for prologues could be different
and slightly more compact if it does not need to be instruction precise, but
as a practical matter, I doubt extra complexity in implementation warrants
supporting anything but instruction precise unwinding, once a backend implements it.

So, the table above is no good, better presented would be like:

         | nounwind 0  |  nounwind 1
----------+-------------+--------------
uwtable 0 | <full,no>   |  <no,no>
----------+-------------+--------------
uwtable 1 | <full,no>   |  <full,no>
----------+-------------+--------------
uwtable 2 | <full,min> | <min, min>
----------+-------------+--------------
uwtable 3 | <full,full> |  <full,full>

where 
 - "full" means full unwind info - CFA, CSRs, return address
 - "min" is minimal, sans CSRs,
 - "no" is, well, no unwind info and
 - "<p,e>" is the kind generated for prologues and epilogues, respectively.

Fāng-ruì Sòng via llvm-dev

unread,
Nov 20, 2021, 7:33:17 PM11/20/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov

uwtable 1/nounwind 1: <full,no>
uwtable 2/nounwind 1: <min,min>

Why is there a full->min transition for the generated prologue?

Momchil Velikov via llvm-dev

unread,
Nov 20, 2021, 7:53:01 PM11/20/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
On Sun, 21 Nov 2021 at 00:33, Fāng-ruì Sòng <mas...@google.com> wrote:
On 2021-11-21, Momchil Velikov wrote:
>         | nounwind 0  |  nounwind 1
>----------+-------------+--------------
>uwtable 0 | <full,no>   |  <no,no>
>----------+-------------+--------------
>uwtable 1 | <full,no>   |  <full,no>
>----------+-------------+--------------
>uwtable 2 | <full,min> | <min, min>
>----------+-------------+--------------
>uwtable 3 | <full,full> |  <full,full>
>
>where
> - "full" means full unwind info - CFA, CSRs, return address
> - "min" is minimal, sans CSRs,
> - "no" is, well, no unwind info and
> - "<p,e>" is the kind generated for prologues and epilogues, respectively.

uwtable 1/nounwind 1: <full,no>
uwtable 2/nounwind 1: <min,min>

Why is there a full->min transition for the generated prologue?

Because for a synchronous unwind table it makes only sense for the prologue to be full, <min, no> is
unusable combination, whereas <full, no> is usable for a debugger (it's basically what we have now for most backends).

--
Compiler scrub, Arm

Fāng-ruì Sòng via llvm-dev

unread,
Nov 20, 2021, 8:02:26 PM11/20/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov

I wanted to ask why the prologue information has degraded from full to
min when transiting from uwtable 1 to uwtable 2.

I do not understand why moving from uwtable 1 to uwtable 2 is not monotonic.

Momchil Velikov via llvm-dev

unread,
Nov 21, 2021, 5:11:42 AM11/21/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
It's not that it was degraded in the case for "uwtable=2,nounwind=1", but that it was
"too much" for "uwtable=1,nounwind=1".  One could generate "<min,no>" there, but that
serves no purpose - it's unusable for debugging, and for profiling, one would
be better off with the "<min,min> variant. Also, this is the current state, and
degrading *that* could be viewed as a regression.

--
Compiler scrub, Arm

Fāng-ruì Sòng via llvm-dev

unread,
Nov 21, 2021, 1:22:15 PM11/21/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov

The argument with keeping <full,no> for "uwtable=1,nounwind=1" as the
current state is fine.
But then why is <min,min> for "unwtable=2,nounwind=1" not a problem
for debugging?

Momchil Velikov via llvm-dev

unread,
Nov 21, 2021, 1:47:05 PM11/21/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
On Sun, 21 Nov 2021 at 18:22, Fāng-ruì Sòng <mas...@google.com> wrote:
> But then why is <min,min> for "unwtable=2,nounwind=1" not a problem
> for debugging?

Well, it is, but it's a different use case. A user chooses whatever
suits their needs.
If they need debugging they can choose either "uwtable=1" or
"uiwtable=3". If they need
profiling they can choose "uwtable=2" or "uwtable=3". If they don't
need either, they can choose "uwtable=0".

--
Compiler scrub, Arm

Fāng-ruì Sòng via llvm-dev

unread,
Nov 21, 2021, 2:02:03 PM11/21/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov
On Sun, Nov 21, 2021 at 10:47 AM Momchil Velikov
<momchil...@gmail.com> wrote:
>
> On Sun, 21 Nov 2021 at 18:22, Fāng-ruì Sòng <mas...@google.com> wrote:
> > But then why is <min,min> for "unwtable=2,nounwind=1" not a problem
> > for debugging?
>
> Well, it is, but it's a different use case. A user chooses whatever
> suits their needs.
> If they need debugging they can choose either "uwtable=1" or
> "uiwtable=3". If they need
> profiling they can choose "uwtable=2" or "uwtable=3". If they don't
> need either, they can choose "uwtable=0".
>
> --
> Compiler scrub, Arm

Got it. So there are indeed 3 dimensions as I think.

(a) nounwind: raise exceptions or not
(b) uwtable: generate additional information even if nounwind is
specified: none, sync, async
(c) number of registers: pc(or link register)/sp/(maybe fp), or full
(most changed callee-saved registers)

The uwtable=0,1,2,3 scale combines (b) and (c), but the (b)x(c)
possibilities cannot be linearized.
What I wondered is whether we can make (c) an llvm::TargetOptions::*
option like FunctionSections/DataSections, then (b) can be linearized
(none,sync,async).
The downsize is that in LTO builds, one cannot indicate the intent
that a.o wants a subset of registers while b.o wants a full set.
My feeling is that users wanting mix-and-match (c) behavior is rare,
therefore an llvm::TargetOptions::* option can serve them well.
If the requirement actually becomes real, we can introduce new
function attributes.

If we start with the non-linear uwtable=0,1,2,3, then we cannot fix it
in the future.

John Reagan via llvm-dev

unread,
Nov 23, 2021, 4:15:19 PM11/23/21
to via llvm-dev

>
> On one hand, we have the `uwtable` attribute in LLVM IR, which tells
> whether to emit CFI directives. On the other hand, we have the `clang
> -cc1` command-line option `-funwind-tables=1|2 ` and the codegen
> option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
> asynchronous unwind tables (2)`.
> Thus we lose along the way the information whether we want just some
> unwind tables or asynchronous unwind tables.
>
> Asynchronous unwind tables take more space in the runtime image, I'd
> estimate something like 80-90% more, as the difference is adding
> roughly the same number of CFI directives as for prologues, only a bit
> simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
> more, if you consider tail duplication of epilogue blocks.
> Asynchronous unwind tables could also restrict code generation to
> having only a finite number of frame pointer adjustments (an example
> of*not* having a finite number of `SP` adjustments is on AArch64 when

> untagging the stack (MTE) in some cases the compiler can modify `SP`
> in a loop).
> Having the CFI precise up to an instruction generally also means one
> cannot bundle together CFI instructions once the prologue is done,
> they need to be interspersed with ordinary instructions, which means
> extra `DW_CFA_advance_loc` commands, further increasing the unwind
> tables size.
>
> That is to say, async unwind tables impose a non-negligible overhead,
> yet for the most common use cases (like C++ exceptions), they are not
> even needed.
>
> We could, for example, extend the `uwtable` attribute with an optional
> value, e.g.
> - `uwtable` (default to 2)
> - `uwtable(1)`, sync unwind tables
> - `uwtable(2)`, async unwind tables
> - `uwtable(3)`, async unwind tables, but tracking only a subset of
> registers (e.g. CFA and return address)
>
> Or add a new attribute `async_uwtable`.
>
> Other suggestions? Comments?
>

Yes, thanks for bringing this up. OpenVMS on x86-64 needs full async
unwind tables as our system allows for exceptions to be handled during
both the prologue and epilogue. We are currently avoiding the poor
quality of prologue/epilogue CFI information at present until we can
move forward to a recent LLVM version (long story) where I think the
support is better.

I like just having a uwtable level (1,2,3,etc).

John
This e-mail (including any attachments) may contain privileged, confidential, proprietary, private, copyrighted, or other legally protected information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient (even if the e-mail address above is yours), please notify us by return e-mail immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

Momchil Velikov via llvm-dev

unread,
Nov 24, 2021, 5:23:59 AM11/24/21
to Fāng-ruì Sòng, LLVM Mailing list, Momchil Velikov
On Sun, 21 Nov 2021 at 19:01, Fāng-ruì Sòng <mas...@google.com> wrote:
> What I wondered is whether we can make (c) an llvm::TargetOptions::*
> option like FunctionSections/DataSections, then (b) can be linearized
> (none,sync,async).

Yes, that would be my preferred option and is what my current change does:
- `uwtable` (default to 2
- `uiwtable(1)` sync unwind tables
- `uwtable(2)` async unwind tables

Eric Christopher via llvm-dev

unread,
Nov 24, 2021, 1:18:50 PM11/24/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov
Hi Momchil,

So, I think to elaborate from the thread you're looking at separating out:

no tables,
exception handling,
instruction level unwind accuracy

for unwind tables? Some examples of cases you expect to work and explicitly not work in each of these would be fairly motivating. Going down the use cases for each.

Thanks!

-eric

Momchil Velikov via llvm-dev

unread,
Dec 9, 2021, 9:57:28 AM12/9/21
to Eric Christopher, LLVM Mailing list, Momchil Velikov
On Wed, 24 Nov 2021 at 18:18, Eric Christopher <echr...@gmail.com> wrote:
>
> Hi Momchil,
>
> So, I think to elaborate from the thread you're looking at separating out:
>
> no tables,
> exception handling,
> instruction level unwind accuracy
>
> for unwind tables? Some examples of cases you expect to work and explicitly not work in each of these would be fairly motivating. Going down the use cases for each.

Not really. What I'm looking for is to convey the value of the CodeGen option `UnwindTables` from clang to LLVM.


         | nounwind 0  |  nounwind 1
----------+-------------+--------------
uwtable 0 | <full,no>   |  <no,no>
----------+-------------+--------------
uwtable 1 | <full,no>   |  <full,no>
----------+-------------+--------------
uwtable 2 | <full,full> |  <full,full>


Lacking that, a backend can choose to generate unwind tables either  according to the
second or the third rows, but a user has no control of it. As different kinds of unwind
tables have different functionality and trade-offs, that should be something under user control.

~chill

Eric Christopher via llvm-dev

unread,
Dec 10, 2021, 7:47:56 PM12/10/21
to Momchil Velikov, LLVM Mailing list, Momchil Velikov
Ultimately I think I'd like to know why you think you should do this, hence the request for use cases :)

Thanks!

-eric

Momchil Velikov via llvm-dev

unread,
Dec 11, 2021, 4:04:24 AM12/11/21
to Eric Christopher, LLVM Mailing list, Momchil Velikov
On Sat, 11 Dec 2021 at 00:47, Eric Christopher <echr...@gmail.com> wrote:
> Ultimately I think I'd like to know why you think you should do this, hence the request for use cases :)

Right, so the use cases I have in mind are:
* synchronous exceptions: these need non-instruction precise unwind
tables, correct at function calls; generally all the callee-saved
registers need to be described
no `uwtable`, `uwtable(1)` support this case; `uwtable(2)` too, but
is needlessly precise and verbose
* debugging: best debugging experience would be instruction precise
unwind tables,
over the whole function, however, there is a bit of a wiggle room:
- unwind tables may not be instruction precise, essentially at the
same level of precision as the case for exceptions
- unwind tables for epilogues might be missing
This is what we have in most backends now. `uwtable(2)` supports
this case, `uwtable(1)` results in smaller unwind tables, at the cost
of a degraded user experience.
* sampling profilers: for best precision need instruction precise
unwind tables both in the prologue and in the epilogue (which makes
for comparatively large unwind tables)
`uiwtable(2)` supports this case.
In principle, there is no need to describe all the callee-saved
registers, just those needed to get a function's caller, so, whatever
passes for "return address, "SP", and "FP", and others. In the
course of the discussion, we agreed that's best addressed by a
separate mechanism.

~chill

Reply all
Reply to author
Forward
0 new messages