thanks,
adrian
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr
I don't totally follow the proposed encoding change & would appreciate a small example.Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',
s.t. the cost of a relocation for the address is paid down the more it's used?
How do you figure the offset out?
On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant...@apple.com> wrote:I don't totally follow the proposed encoding change & would appreciate a small example.Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',
With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:
Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
s.t. the cost of a relocation for the address is paid down the more it's used?
Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
How do you figure the offset out?
Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc...
I think I get it now, thanks for explaining!On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant...@apple.com> wrote:I don't totally follow the proposed encoding change & would appreciate a small example.Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',
With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:
Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
s.t. the cost of a relocation for the address is paid down the more it's used?
Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
How do you figure the offset out?
Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc...If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset?
Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).
On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant...@apple.com> wrote:I think I get it now, thanks for explaining!On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant...@apple.com> wrote:I don't totally follow the proposed encoding change & would appreciate a small example.Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',
With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:
Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
s.t. the cost of a relocation for the address is paid down the more it's used?
Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
How do you figure the offset out?
Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc...If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset?
Nah - terminologically, ELF sections are indivisible - more akin to MachO subsections. ELF files can have multiple sections with the same name (as is used for comdat sections for inline functions, and for -ffunction-sections (roughly equivalent to MachO's "subsections via symbols", as I understand it) (or can use ".text.suffix" naming to give each separate .text section its own name - but the linker strips the suffixes and concatenates all these together into the final linked .text section)
Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).
Yeah - currently the "base address" for each section is determined by the first function with debug info being emitted in that section ( https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 ) - with PROPELLER we'd need to add similar code when function fragments are emitted. (I'm planning to check the PROPELLER work in progress tree soon and do another sanity pass over the debug info emitted to check this is working as intended - in part because this base address selection, coupled with DWARFv5 and maybe with the changes I'm suggesting in this thread (& will commit under flags "soon" (might take me a week or two judging by my review/bug/investigation load right now... *fingers crossed*)) might make PROPELLER less expensive in terms of debug info size, or more expensive relative to the significant improvements this provides)
Owing to the way MachO debug info distribution works differently & if I understand correctly doesn't need relocations in many cases due to DWARF-aware parsing/linking (& if it does use relocations, I've no knowledge of when/how and how big they are compared to the ELF relocations I've been measuring) it's quite possible MachO would have different tradeoffs in this space.
On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant...@apple.com> wrote:I think I get it now, thanks for explaining!On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant...@apple.com> wrote:I don't totally follow the proposed encoding change & would appreciate a small example.Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',
With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:
Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
s.t. the cost of a relocation for the address is paid down the more it's used?
Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
How do you figure the offset out?
Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc...If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset?
Nah - terminologically, ELF sections are indivisible - more akin to MachO subsections. ELF files can have multiple sections with the same name (as is used for comdat sections for inline functions, and for -ffunction-sections (roughly equivalent to MachO's "subsections via symbols", as I understand it) (or can use ".text.suffix" naming to give each separate .text section its own name - but the linker strips the suffixes and concatenates all these together into the final linked .text section)I see, so an ELF linker may reorder sections relative to each other, but not the contents of a section. (That matches up with what I've read elsewhere - you'd use -ffunction-sections to reorder function symbols, IIRC.)
And in this proposal to increase address pool reuse, label differences in a MachO would be relative to the subsection.
In Propeller, is basic block reordering done after a .o is emitted?
If so, I suppose I don't yet see how the proposed scheme is resilient to this reordering.
OTOH if block reordering is done just before the label difference is evaluated, then there shouldn't be any issue.Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).
Yeah - currently the "base address" for each section is determined by the first function with debug info being emitted in that section ( https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 ) - with PROPELLER we'd need to add similar code when function fragments are emitted. (I'm planning to check the PROPELLER work in progress tree soon and do another sanity pass over the debug info emitted to check this is working as intended - in part because this base address selection, coupled with DWARFv5 and maybe with the changes I'm suggesting in this thread (& will commit under flags "soon" (might take me a week or two judging by my review/bug/investigation load right now... *fingers crossed*)) might make PROPELLER less expensive in terms of debug info size, or more expensive relative to the significant improvements this provides)Thanks for investigating!Owing to the way MachO debug info distribution works differently & if I understand correctly doesn't need relocations in many cases due to DWARF-aware parsing/linking (& if it does use relocations, I've no knowledge of when/how and how big they are compared to the ELF relocations I've been measuring) it's quite possible MachO would have different tradeoffs in this space.A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't contain relocations for AT_low_pc or AT_call_return_pc in the simple examples I tried out. We do emit relocations for those attributes in MachO object files (there isn't something analogous to a .dwo on MachO, the debug info just goes into a different set of sections in the .o). My understanding (based on the definition of `macho_relocation_info` in the ld64 sources) is that MachO relocations are 8 bytes in size. It looks like ELF rel/rela relocations are 16/24 bytes in size, but I'm not sure why (perhaps they're more extensible / encode more information).
Would a vanilla DWARFv4 .dwp (without your patches applied) contain a relocation for each 'AT_low_pc (<direct address>)'?
On 2021-02-10, David Blaikie via llvm-dev wrote:
>All 3 options are now implemented & I've tidied up a flag name (still an
>-mllvm flag - I don't think this should ever be a user-visible flag).
>
>-mllvm -minimize-addr-in-v5=Ranges
> Uses debug_rnglists even for contiguous ranges if doing so would avoid
>adding another entry to .debug_addr eg: a CU with 3 functions, two in the
>same section. The first function in each section uses low/high, the CU has
>a rnglist, and can share/reuse the low_pc of those two functions. But for a
>function that is later in a section that already has another function in it
>- that one would use the low_pc of the first function in the section as its
>base address, and an offset pair - avoiding the need for a 3rd debug_addr
>entry and associated relocation
>
>-mllvm -minimize-addr-in-v5=Expressions
> This uses the exprloc idea - using a non-trivial expression for a
>DW_AT_low_pc or other address classed attribute. This reduces the overhead
>compared to the 'Ranges' technique, and allows more cases - including
>DW_TAG_labels and DW_TAG_call_sites.
This option emits: DW_OP_addrx 0, DW_OP_const4u 9, DW_OP_plus.
DW_OP_const4u is a bit wasteful. This could be changed to DW_OP_addrx 0,
DW_OP_plus_udata 9. However, the current implementation requires the size of the
DWARF expression, and we don't know the addend size of DW_OP_plus_udata.
.byte size_of_exprloc # This would be dependent on the size of .uleb128
...
.byte 35
.long .Ltmp1-.Lfunc_begin0
# it'd be nice if we can use .uleb128 .Ltmp1-.Lfunc_begin0
size_of_exprloc could be changed to a subtraction of two labels.
When .uleb128 is used, we should be careful about assembler convergence.
* GNU as hacked around the problem specifically for .gcc_except_table by inserting additional .align https://sourceware.org/bugzilla/show_bug.cgi?id=4029 It works for .gcc_except_table but can be a problem for our .uleb128 + .byte scheme.
* LLVM MC's solution is generic.
>-mllvm -minimize-addr-in-v5=Form
> Similar to Expressions, but using a custom form to make things a bit
>more compact (has the drawback that consumers who don't recognize the form
>can't parse any of the DWARF because they can't skip over the attribute due
>to not knowing its size)
This option emits a new form: DW_FORM_LLVM_addrx_offset, which is the composite
of DW_FORM_addrx and DW_FORM_data4. This is superior to Expressions because the
bytes for the exprloc size and the plus operation can be saved.
Similar to Expressions, there is a question whether DW_FORM_udata would be better.
It could save 3 bytes compare with DW_OP_plus_udata.
In short, this is consistent with how we encode instruction sequence
lengths in other places in LLVM today. (eg: DW_AT_high_pc could be
DW_FORM_udata, but we use DW_FORM_data4).
There's been some argument that using fixed-width forms improves DWARF
parsing performance significantly, but that idea's probably gone out
the window lately with exprloc (well, I guess we used 'blockN' before
that, which is also variable length, even if it might have a fixed
length length field to start with) and addrx forms (though we do use
fixed with strx forms (though that would mean more abbreviations - a
DW_TAG_subprogram with a low-indexed name would have a different form
for the DW_AT_name than one with a high-indexed name that needed more
bytes to encode).
> This could be changed to DW_OP_addrx 0,
> DW_OP_plus_udata 9. However, the current implementation requires the size of the
> DWARF expression, and we don't know the addend size of DW_OP_plus_udata.
Right - and that requirement for the current implementation is pretty
deeply embedded - we need to know the length of attributes so we know
the length of the DIEs that contain them so we know the offsets of
those DIEs so we can encode those offsets when doing DIE-to-DIE
references, etc. Pavel had a proposal a year or two ago about
potentially moving away from this and using symbolic references, label
differences, etc to do DIE offsets - as it'd make the resulting DWARF
assembly more legible and modifiable, but no one's taken that up as
yet (not sure if Pavel/others tried and hit any fundamental blockers)
and might have some performance tradeoffs, etc.
>
> .byte size_of_exprloc # This would be dependent on the size of .uleb128
> ...
> .byte 35
> .long .Ltmp1-.Lfunc_begin0
> # it'd be nice if we can use .uleb128 .Ltmp1-.Lfunc_begin0
>
> size_of_exprloc could be changed to a subtraction of two labels.
>
> When .uleb128 is used, we should be careful about assembler convergence.
>
> * GNU as hacked around the problem specifically for .gcc_except_table by inserting additional .align https://sourceware.org/bugzilla/show_bug.cgi?id=4029 It works for .gcc_except_table but can be a problem for our .uleb128 + .byte scheme.
> * LLVM MC's solution is generic.
>
> >-mllvm -minimize-addr-in-v5=Form
> > Similar to Expressions, but using a custom form to make things a bit
> >more compact (has the drawback that consumers who don't recognize the form
> >can't parse any of the DWARF because they can't skip over the attribute due
> >to not knowing its size)
>
> This option emits a new form: DW_FORM_LLVM_addrx_offset, which is the composite
> of DW_FORM_addrx and DW_FORM_data4. This is superior to Expressions because the
> bytes for the exprloc size and the plus operation can be saved.
>
> Similar to Expressions, there is a question whether DW_FORM_udata would be better.
> It could save 3 bytes compare with DW_OP_plus_udata.
Yep, see the general discussion on that above.
Though there is some question here about what FORM we could/would
actually propose to standardize in DWARF. GCC looks like they use
address sized instruction sequence lengths like DW_AT_high_pc (data4
in 32bit builds, data8 in 64 bit builds) and LLVM always uses data4 (I
implemented that based on GCC's behavior - guess I didn't look too
closely at the 32/64 bit aspect, or perhaps GCC's behavior changed
since I implemented LLVM's).
GCC always uses addrx (doesn't use any addrxN encodings), like LLVM.
(interestingly GCC also always uses strx, never a strxN encoding)
So at least for LLVM and GCC's current behavior, having a
DW_FORM_addrx_data4 and DW_FORM_addrx_data8 would be consistent. But
do we end up proposing a full matrix of
DW_FORM_addrx{,1,2,3,4}_{udata,data{1,2,4,8,16}} ? That'd be
unfortunate for the DW_FORM space, there aren't any other instances of
such combinatorial explosion in form types.
It'd probably be good to have the hypothetical ideal
DW_FORM_addrx_udata standardized, in addition to at least the
addrx_data4 and addrx_data8 form even if no one's going to use it
right away. But it'll probably be an open discussion with the DWARF
committee about what/how they might see this being standardized.
- Dave