GC-ability of SHT_INIT_ARRAY

101 views
Skip to first unread message

Fangrui Song

unread,
Jul 13, 2021, 2:34:21 AM7/13/21
to Generic System V Application Binary Interface
In PE/COFF, .CRT$XCU is the global initializer section (similar to ELF SHT_INIT_ARRAY).
A .CRT$XCU in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section referencing an IMAGE_SCN_LNK_COMDAT text section can be garbage collected.

In ELF, SHT_INIT_ARRAY is a GC root in ld.bfd/gold/ld.lld, regardless of SHF_GROUP/SHF_LINK_ORDER.
How do other implementations perform GC with SHT_INIT_ARRAY? What do you think the sensible behavior would be?

cat > a.s <<eof
.globl _start
_start:
  ret

.section .text.module_ctor,"axG",@progbits,module_ctor,comdat
module_ctor:
  ret

# In a GRP_COMDAT section group with signature 'module_ctor'
.section .init_array,"aGw",@init_array,module_ctor,comdat
.quad module_ctor

# SHF_LINK_ORDER, sh_link references .text.module_ctor
.section .fini_array,"awo",@fini_array,module_ctor
.quad module_ctor
eof
cc -c a.s
ld.bfd --gc-sections --print-gc-sections a.o # .init_array/.fini_array are not printed



There has always been a feature request to mark an arbitrary section as a GC root.
In tje GNU ABI, before the introduction of SHF_GNU_RETAIN, placing a section into a group with an SHT_INIT_ARRAY member was the best emulation.
(I listed some GC roots on https://maskray.me/blog/2021-02-28-linker-garbage-collection no other solution is elegant.)
In llvm-project, asan/hwasan/SanitizerCoverage leverage this property to force a section group live.

If ld.lld/ld.bfd/gold keep the current SHT_INIT_ARRAY semantics within section groups, for similarity it seems that we shouldn't change SHF_LINK_ORDER as well.
In the end, we don't have a mechanism to make a SHT_INIT_ARRAY GCable.
The D language compiler seems to want a GCable SHT_INIT_ARRAY (https://github.com/ldc-developers/ldc/issues/3786).
And we don't have a mechanism to do it.

Michael Matz

unread,
Jul 13, 2021, 9:48:31 AM7/13/21
to 'Fangrui Song' via Generic System V Application Binary Interface
Hello,

On Mon, 12 Jul 2021, 'Fangrui Song' via Generic System V Application
Binary Interface wrote:

> If ld.lld/ld.bfd/gold keep the current SHT_INIT_ARRAY semantics within
> section groups, for similarity it seems that we shouldn't change
> SHF_LINK_ORDER as well.
> In the end, we don't have a mechanism to make a SHT_INIT_ARRAY GCable.
> The D language compiler seems to want a GCable SHT_INIT_ARRAY
> (https://github.com/ldc-developers/ldc/issues/3786).
> And we don't have a mechanism to do it.

If one puts an entry into a section twice, why would anyone be surprised
that these two entries are then in fact in the output? Especially with
the above ldc 'issue': it even puts _different_ entries into the section
(them being pointers to different functions, that just so happen to have
the same body). Why should the linker only include one of them?

Assuming you want to devise a mechanism to make SHT_INIT_ARRAY GCable, you
need to think about the implication for these examples:

---- file1.c ----
#include <stdio.h>

static void init () { printf ("init\n"); }
static void init2 () { printf ("init\n"); }

static void (*const init_array []) ()
__attribute__ ((used, section (".init_array")))
= { init };

static void (*const init_array2 []) ()
__attribute__ ((used, section (".init_array")))
= { init2 };

int main() { return 0; }
-----------------

It seems clear enough that here the linker should _not_ remove one of the
entries from .init_array, just because it happens to point to a
semantically equivalent function.

So, why would the linker remove one of the entries if the second
initializer would be literally the same (i.e. 'init')? Why would it
change the picture if the two init_array definitions would be put in
different files?

I think that no matter where the above array definitions are put,
and what entries they contain, the end result should always be the same
(and result in two calls to printf).

So, first we need to ask if we have a problem that we want to fix. Have
we? Basically shouldn't the answer to "but my ctors are called twice" be
"then don't put them twice into .init_array"?

If there are usecases where a GCable init_array makes sense (which ones?),
then it seems sensible to use section groups for that purpose, I guess.
As you mentioned that will result in some problems along the way (we can't
misuse init_array anymore to implement GC roots, but meanwhile GNU_RETAIN
exists, so maybe that's no issue anymore), so is it then still worthwhile
to change anything?

(FWIW: right now I don't see a compelling enough reason to go down the
rabbit hole)


Ciao,
Michael.

Fangrui Song

unread,
Jul 13, 2021, 4:38:09 PM7/13/21
to gener...@googlegroups.com
The original LDC report has some misunderstanding of LLVM's provided
semantics for @llvm.global_ctors. Let's put it aside.

I think an SHT_INIT_ARRAY section not in a section group should be a GC
root. (Retroactively, if we had SHF_GNU_RETAIN before the HP-UX style
SHT_INIT_ARRAY we could avoid the special case for SHT_INIT_ARRAY.)

>I think that no matter where the above array definitions are put,
>and what entries they contain, the end result should always be the same
>(and result in two calls to printf).
>
>So, first we need to ask if we have a problem that we want to fix. Have
>we? Basically shouldn't the answer to "but my ctors are called twice" be
>"then don't put them twice into .init_array"?
>
>If there are usecases where a GCable init_array makes sense (which ones?),
>then it seems sensible to use section groups for that purpose, I guess.
>
>As you mentioned that will result in some problems along the way (we can't
>misuse init_array anymore to implement GC roots, but meanwhile GNU_RETAIN
>exists, so maybe that's no issue anymore), so is it then still worthwhile
>to change anything?

Yeah, I am thinking of this: if a GCable SHT_INIT_ARRAY makes sense,
llvm-project and GNU ld may need to do some preparation. I don't know a
use case yet but want to know what other implementations do.

In GNU ld, an SHT_NOTE section within a section group is GCable.
An SHT_NOTE section without a section group is a GC root.
I ported the behavior to lld/ELF https://reviews.llvm.org/D70146
for RedHat annobin.

So arguably SHT_INIT_ARRAY within a section group GCable makes the rule
more consistent. llvm-project needs some preparation by adding
SHF_GNU_RETAIN to places where it currently abuses the .init_array for
the GC root semantics. There are some minor but surmountable portability issues.
(For example, msan has disabled comdat because of a gold bug. asan
enables comdat despite a fixed bug:))

Roland McGrath

unread,
Jul 13, 2021, 4:47:34 PM7/13/21
to gener...@googlegroups.com
If someone wants deduplication for their init_array entries, then they can already do that by putting those entries into COMDAT groups.

If someone wants actual GC-ability for "unreferenced" entries, I agree that in theory an .init_array section in a non-COMDAT group makes sense for that.  I think we might squeak through on compatibility issues generally there.  If Clang and/or GCC already used non-COMDAT groups more thoroughly, it might be an issue, but they don't.  What I mean by that is that under `-ffunction-sections -fdata-sections` today, both C++ static ctors and `__attribute__((constructor))` will generate `.init_array` entries in an .init_array section outside any group even when what it points to is in a GC-able section.  So I don't think there are probably widespread uses of .init_array in a non-COMDAT section group.


--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/20210713203802.b3ny73hu7mxuawo5%40google.com.

Michael Matz

unread,
Jul 14, 2021, 9:36:45 AM7/14/21
to 'Roland McGrath' via Generic System V Application Binary Interface
Hello,

On Tue, 13 Jul 2021, 'Roland McGrath' via Generic System V Application Binary Interface wrote:

> If someone wants deduplication for their init_array entries, then they
> can already do that by putting those entries into COMDAT groups.

Uhm, I thought Fangrui was saying that exactly this doesn't work and asked
if it does work (and how) in some other implementations. I haven't
checked myself, though.


Ciao,
Michael.

Fangrui Song

unread,
Jul 18, 2021, 6:45:11 PM7/18/21
to Generic System V Application Binary Interface
On Wednesday, July 14, 2021 at 6:36:45 AM UTC-7 Michael Matz wrote:
Hello,

On Tue, 13 Jul 2021, 'Roland McGrath' via Generic System V Application Binary Interface wrote:

> If someone wants deduplication for their init_array entries, then they
> can already do that by putting those entries into COMDAT groups.
Uhm, I thought Fangrui was saying that exactly this doesn't work and asked
if it does work (and how) in some other implementations. I haven't
checked myself, though.

An SHT_INIT_ARRAY member in a GRP_COMDAT section group allows deduplication (expected) but forces the group to be retained (unexpected).

I agree that "an SHT_INIT_ARRAY in a section group retains the whole group" is an abuse.
I created https://reviews.llvm.org/D106246 to drop the reliance from clang sanitizers, by leveraging SHF_GNU_RETAIN.
This will allow ld.lld to enable GC on SHT_INIT_ARRAY in a section group, if such a need ever rises.
(This means old sanitizer object files linked by future ld.lld will not work, but we don't promise such a flexible cross-version usage can work anyway.)

GNU ld can make a similar change (, if such a need ever rises), when compatibility with old sanitizer object files is no longer needed.
Reply all
Reply to author
Forward
0 new messages