GRP_COMDAT group with STB_LOCAL signature

147 views
Skip to first unread message

Fangrui Song

unread,
May 24, 2021, 4:27:45 AM5/24/21
to Generic System V Application Binary Interface
The spec says

> GRP_COMDAT
>
> This is a COMDAT group. It may duplicate another COMDAT group in
> another object file, where duplication is defined as having the
> same group signature. In such cases, only one of the duplicate
> groups may be retained by the linker, and the members of the
> remaining groups must be discarded.

I think "having the same group signature" isn't super clear whether the
deduplication is purely based on the symbol name, or taking STB_LOCAL
into consideration: a COMDAT with a STB_LOCAL signature is not
uniquified with another COMDAT with a STB_LOCAL signature of the same
name.

Woth some experiments, I believe all of GNU ld, gold, and ld.lld
simply use the symbol name and ignore the semantics of STB_LOCAL.

The issue was initially reported on https://bugs.llvm.org/show_bug.cgi?id=43094

Ali Bahrami

unread,
May 24, 2021, 9:37:30 AM5/24/21
to gener...@googlegroups.com
I don't believe that the type of the signature
symbol matters, and local symbols are allowed. It's the
name of the symbol that acts as the signature, not the
symbol itself.

Years ago, we found that the Solaris ld was failing on
objects created by the GNU as, because gas was using section
symbols (which are STB_LOCAL) as the COMDAT signature, and section
symbols often do not have a name. We followed the lead of the
GNU linkers on this, and substituted the use of the section
name referenced by section symbols in this case. I mention
this anecdote now because it shows that there is existing precedent
for local symbols being used successfully as COMDAT signatures.

There are rules about how references to COMDAT data must
be through global or weak symbols, but that's different than
the signature symbol itself. Of course, I think that it often
makes sense to use one of these referencing symbols as the
signature as well, in which case it would naturally be global
(or weak).

- Ali

Fangrui Song

unread,
May 24, 2021, 1:37:47 PM5/24/21
to Generic System V Application Binary Interface
Suppressing deduplication for a GRP_COMDAT group with a STB_LOCAL non-STT_SECTION signature symbol can be useful.
Code may use GRP_COMDAT to establish dependency relationship among a group of sections.
(AIUI most implementations interpret "Therefore, such groups must be included or omitted from the linked object as a unit." in a way to facilitate section based garbage collection.)

Now let's think of internalization (example: https://reviews.llvm.org/D53234), which is a common way not leaking symbols to other libraries/applications.
The intention can be that the symbols don't interact with others from unrelated object/archive files.

.section .text.foo,"axG",@progbits,_Z3foov,comdat
.globl _Z3foov
_Z3foov:

.section __llvm_prf_cnts,"aG",@progbits,_Z3foov,comdat
.section __llvm_prf_data,"aG",@progbits,_Z3foov,comdat

In this example, even if we internalize _Z3foov, .text.foo, __llvm_prf_cnts, and __llvm_prf_data are still in the same group.
The signature symbol _Z3foov can cause undesired deduplication with another group (_Z3foov) in an unrelated archive/object file.

Now to make this example work, seems that the compiler has to rename _Z3foov to a random string.
With the STB_LOCAL semantics we could keep using _Z3foov.

Another example is Windows PE-COFF where ELF's section group was inspired.
In PE-COFF, with both link.exe and lld-link, a COMDAT (IMAGE_SCN_LNK_COMDAT) with a non-external (i.e. not IMAGE_SYM_CLASS_EXTERNAL) selection symbol does not duplication.
I.e. if we disable STB_LOCAL deduplication for COMDAT, compilers can treat ELF and PE-COFF in the same way.
Otherwise, compiler internalization needs to rename the signature specifically for ELF.

       .section        .text,"xr",discard,"?foo@@YAHXZ"
        .globl  "?foo@@YAHXZ"                   # -- Begin function ?foo@@YAHXZ
"?foo@@YAHXZ":                          # @"?foo@@YAHXZ"
        retq

        .section        .lprfc$M,"dw",discard,"__profc_?foo@@YAHXZ.742261418966908927"
        .globl  "__profc_?foo@@YAHXZ.742261418966908927" # @"__profc_?foo@@YAHXZ.742261418966908927"
        .p2align        3
"__profc_?foo@@YAHXZ.742261418966908927":
        .zero   8

        .section        .lprfd$M,"dw",discard,"__profd_?foo@@YAHXZ.742261418966908927"
        .globl  meow0
        meow0:
        .p2align        3
#### not-external, no deduplication
"__profd_?foo@@YAHXZ.742261418966908927":
        .quad   2709792123250749187             # 0x259b1d9439832f03
        .quad   742261418966908927              # 0xa4d0ad3efffffff
        .quad   "__profc_?foo@@YAHXZ.742261418966908927"
        .quad   "?foo@@YAHXZ"
        .quad   0
        .long   1                               # 0x1
        .zero   4

        .section        .drectve,"yn"
        .ascii  " /INCLUDE:\"meow0\""


Fangrui Song

unread,
May 25, 2021, 6:17:49 PM5/25/21
to Generic System V Application Binary Interface
I pushed https://reviews.llvm.org/D103043 to use "renaming the comdat signature symbol" as a workaround for ELF.
(PE-COFF works perfectly with a non-external selection symbol.)

Ali Bahrami

unread,
May 28, 2021, 7:21:22 PM5/28/21
to gener...@googlegroups.com
On 5/24/21 11:37 AM, 'Fangrui Song' via Generic System V Application Binary Interface wrote:
> Suppressing deduplication for a GRP_COMDAT group with a STB_LOCAL non-STT_SECTION signature symbol can be useful.
> Code may use GRP_COMDAT to establish dependency relationship among a group of sections.
> (AIUI most implementations interpret "Therefore, such groups must be included or omitted from the linked object as a unit." in a way to facilitate section based garbage collection.)
>

Hi,

Sorry for the delay, but I needed to think about this.

I guess I agree with the goal, and that GROUP sections
are the right tool, but I'm confused about the
other details.

I think that the natural purpose of SHF_GROUP is to
define a group of sections that are "all or nothing".
I don't understand why it's necessary to set the COMDAT
flag, and then ignore it, in order to get this effect.

I also think that the rule in the gABI for symbols that
reference groups needing to be global is really COMDAT
specific, and that it might be OK to say that it
doesn't apply if GRP_COMDAT isn't set.

Finally, I think that treating STB_LOCAL non-STT_SECTION
symbols differently than ST_LOCAL STT_SECTION symbols
feels overly complicated, and unnecessary.

To recap, I think you might apply groups to this
as follows:

- The signature symbol just provides a unique name.
The local/global status is not part of the
equation.

- SHT_GROUP defines the dependency relationship
you want for garbage collection already.

- If this isn't a COMDAT situation, leave off the
COMDAT flag.

- The rule in the gABI about symbols accessing groups
would need to be reworded to make it clear that it
only applies when GRP_COMDAT is set.

I'd be open to alterations in the gABI text to accommodate
this, assuming that others agree with my interpretation.
Otherwise, I'm interested to hear what others think.

- Ali

Fangrui Song

unread,
May 28, 2021, 10:45:50 PM5/28/21
to Generic System V Application Binary Interface
I forgot that in LLVM there is already a way to lower to a zero flag section group.
(`comdat noduplicates`; it has been implemented in PE-COFF for many years, but it is new in ELF).
All of GNU ld, gold, and ld.lld support this.

With this, using zero flag vs GRP_COMDAT to decide whether deduplication should be used does look better to me as this is orthogonal.
 
- SHT_GROUP defines the dependency relationship
you want for garbage collection already.

- If this isn't a COMDAT situation, leave off the
COMDAT flag.

Perhaps this mean tools like objcopy --localize-hidden may need to drop the GRP_COMDAT flag if all members in a group are internal.
 
- The rule in the gABI about symbols accessing groups
would need to be reworded to make it clear that it
only applies when GRP_COMDAT is set.

+1

Roland McGrath

unread,
Jun 1, 2021, 6:26:04 PM6/1/21
to gener...@googlegroups.com
It's sad to hear of that history of (IMHO) bug compatibility with old GNU implementations.  I was not aware of that.  I think cases using nameless symbols as GRP_COMDAT signature symbols were clearly wrong to begin with.  IMHO, if anything the STT_SECTION case should be considered the "weird" one that is treated differently and that done only for bug compatibility with those ancient buggy compilers (if that is even still required anywhere).

I agree that the constraints on access into groups should only apply to GRP_COMDAT groups.

The scenario that I think you may have failed to account for is this:

1. compiler generates a COMDAT group for the usual reasons; signature symbol is STB_GLOBAL/STV_HIDDEN "foo"
    This happens in multiple translation units, resulting in multiple ET_REL files that contain mergeable COMDAT groups with signature symbol "foo".

2. ld -r combines those multiple ET_REL files into one ET_REL file.

3. objcopy --localize-hidden makes the STB_GLOBAL/STV_HIDDEN signature symbol into an STB_LOCAL symbol

4. ld (final link) combines the output from #3 with another ET_REL file that has its own COMDAT group using a signature symbol that is STB_GLOBAL "foo" (with whatever visibility)

The intended semantics here is that the various "foo" COMDAT groups input to step #2 may be combined with each other, but not with the separate "foo" COMDAT group that was in the other input to step #4.

Apparently current implementations actually do the COMDAT merging in step #2.  For the end user in this scenario, it would be fine if this were done in step #2 or if it's done in a later step.

So there are a few different approaches that come to mind to achieve the intended semantics:

A. In step #2, merge COMDAT groups so all the desired merging from the step #1 translation units is complete before step #3.
    In step #3, have objcopy notice that it's localizing a symbol that's used as a COMDAT signature symbol and clear the GRP_COMDAT bit for that group.
    Now step #4 will not be doing any merging of the #1->#2 inputs but will do all appropriate merging among the other inputs to step #4 (and any non-localized symbols left after step #3).

B. Rephrase or reconstrue the spec so COMDAT signatures are symbols rather than names.  That is, the signature symbol referenced in the group section header is treated like any other symbol reference in linking.  That reference is resolved to a final definition in the final link.  Then the final defining symbol table entry is the key to match up COMDAT groups, rather than the name string.
    Step #2 can either merge or leave the multiple input groups as they are: they'll all refer to the same symbol table entry in the step #2 output.
    When that's made STB_LOCAL in the step #3 output, they'll still all refer to the same symbol table entry.
    In step #4, any merging not done in step #2 can be done because the input groups use the same symbol table entry.
    The "foo" from step #3's output will be STB_LOCAL and so not the same as any other "foo" in other inputs to step #4, so those groups won't be merged.

Since current implementations already do the merging in step #2 (ld -r), then implementing A would only require the change to objcopy's --localize-hidden behavior.

IMHO, changing the spec to be consistent with B would be consistent with the general spirit and approach of ELF.  I think the "symbol as string" use here is inconsistent with other aspects of ELF and was a mistake if that was the specific intent of the spec originally (IMHO no part of the ELF "spec" is worded sufficiently clearly or rigorously that it's possible to discern the true intent from the wording directly as one does with formal standards documents).  I think the historical use of STT_SECTION/STB_LOCAL symbols as COMDAT signature symbols was clearly a bug that's in violation of any plausible interpretation of the ELF spec.

However, given the checkered history and the current behavior of existing implementations, the objcopy change in A might be the simplest practical solution at this point. 


--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/8997d0f8-92d5-60fd-e20b-8dc7435965d3%40emvision.com.

Ali Bahrami

unread,
Jun 1, 2021, 10:53:15 PM6/1/21
to gener...@googlegroups.com
Hi Roland,

Thanks for the concrete description. Saying that
I "failed to account" gives me too much credit --- I
didn't really understand the "why" underneath this
question at all, so this is very helpful.

I'm not sure that I agree that there's necessarily a bug
in the ELF spec regarding the handling of the signature symbol.
The spec says nothing about the signature symbol itself --- it
is merely a name donor. That, and the presence of the flags
field (with unassigned bits as well as GRP_COMDAT) suggests
that the intent is to allow flexibility for possible future uses,
which may or may not involve comdat. One can at least imagine
some scenarios where a local symbol might be appropriate for one
of these unknown future uses. At the same time, substituting the
section name for the nameless STT_SECTION symbol is something
that happens in other contexts, and if the section symbol provides a
sufficiently unique name, I don't know why a compiler should
have to create some other symbol (probably with the same name)
just to avoid using it.

So it's arguable. Maybe you're right, but another way to view
this is that the design isn't wrong, but simply provides a great
deal of flexibility by intentionally not over specifying (which
is very ELF-like, IMO). In any event, changing it would be way
too painful now, so let's try to assume that it's right, and see
if there isn't a "correct" solution that fits the existing rules.

I think your solution A is it. To expand, I think that
there are logically 2 distinct groups involved: an initial
COMDAT group, and then later, a new non-COMDAT one. Confusingly,
they have the same name, and contain the same sections, but
they serve different purposes.

When a link-editor creates a "final" object (executable
or shared object), one of the things it does, after processing
groups, is to discard the group section, and the SHF_GROUP flags,
from the result. At this point, their purpose has been served,
and their continued presence would be confusing at best.
It seems to me that the step (3) in your description below is a
sort of finalization, in which the COMDAT processing has been
done. It would make sense for objcopy to remove the group section,
and flags, after doing this.

However, we also want to indicate to the next linking stage, that
garbage collection on these sections is indivisible. A group
is also the answer to that problem, but not a COMDAT group,
just a plain group. Logically, you might view this as finalizing
away the COMDAT group, and then creating a new non-COMDAT group
with the same name, and sections, for this distinct new purpose. Of
course, given that only the GRP_COMDAT flag differs between the old
group and the new one, we don't really have to do all that. Removing
the GRP_COMDAT flag amounts to the same thing.

To sum up, it sounds to me like objcopy failing to remove the
GRP_COMDAT as part of step 3 in this process was the bug, and
that your solution A is the right fix. And of course, it's *way*
easier and quicker than solution B. And, far more straightforward
than requiring future link-edits to second guess the intent
behind mislabeled GRP_COMDAT groups based on the type of their
signature symbol.

- Ali
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com <mailto:generic-abi%2Bunsu...@googlegroups.com>.
> <https://groups.google.com/d/msgid/generic-abi/8997d0f8-92d5-60fd-e20b-8dc7435965d3%40emvision.com>.
>
> --
> You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com <mailto:generic-abi...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/CAB%3D4xho8wnuiMDTtYc9AWh1wrHVS8wABRCu7pbaY_Qk0PoRpsA%40mail.gmail.com
> <https://groups.google.com/d/msgid/generic-abi/CAB%3D4xho8wnuiMDTtYc9AWh1wrHVS8wABRCu7pbaY_Qk0PoRpsA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Cary Coutant

unread,
Nov 2, 2021, 1:31:51 AM11/2/21
to Generic System V Application Binary Interface
> I'm not sure that I agree that there's necessarily a bug
> in the ELF spec regarding the handling of the signature symbol.
> The spec says nothing about the signature symbol itself --- it
> is merely a name donor.

For some historical context, we first implemented COMDAT groups on
HP-UX before pushing the idea to the gABI committee. In my initial
design of the feature, I let the section name of the SHT_GROUP section
serve as the comdat signature. Since our motivation for COMDAT groups
was C++ template instantiation and out-of-line inlines, that comdat
signature was always a mangled C++ name. We soon observed that the
section name table was blowing up with all those mangled names that
were already contained in the symbol string table, and we modified the
design to use a symbol as the signature. So the original intent was
that the signature symbol was, in fact, merely a name donor. It didn't
matter at all whether it was a local or global symbol, but in the
context of C++, of course the symbol was always global.

In retrospect, sure, it might have made sense if we had tied the
COMDAT group deduplication to the final resolution of the signature
symbol, but given the way we started, that wasn't the natural
endpoint.

-cary

Fangrui Song

unread,
Nov 2, 2021, 3:58:50 AM11/2/21
to gener...@googlegroups.com
On 2021-11-01, Cary Coutant wrote:
>> I'm not sure that I agree that there's necessarily a bug
>> in the ELF spec regarding the handling of the signature symbol.
>> The spec says nothing about the signature symbol itself --- it
>> is merely a name donor.
>
>For some historical context, we first implemented COMDAT groups on
>HP-UX before pushing the idea to the gABI committee.

Thanks for the context. Added to my
https://maskray.me/blog/2021-07-25-comdat-and-section-group

>In my initial
>design of the feature, I let the section name of the SHT_GROUP section
>serve as the comdat signature. Since our motivation for COMDAT groups
>was C++ template instantiation and out-of-line inlines, that comdat
>signature was always a mangled C++ name. We soon observed that the
>section name table was blowing up with all those mangled names that
>were already contained in the symbol string table, and we modified the
>design to use a symbol as the signature.

.strtab and .shstrtab can be combined to leverage tail string merge to
save space:)

GNU as feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=18599

I don't know any linker which combines does the combination, though.

>
>So the original intent was
>that the signature symbol was, in fact, merely a name donor. It didn't
>matter at all whether it was a local or global symbol, but in the
>context of C++, of course the symbol was always global.
>
>In retrospect, sure, it might have made sense if we had tied the
>COMDAT group deduplication to the final resolution of the signature
>symbol, but given the way we started, that wasn't the natural
>endpoint.

OK, a STB_LOCAL signature being identical to a zero flag section group.
Reply all
Reply to author
Forward
0 new messages