[llvm-dev] PGO GC inefficiency due to link.exe's IMAGE_COMDAT_SELECT_ASSOCIATIVE limitation

42 views
Skip to first unread message

Fāng-ruì Sòng via llvm-dev

unread,
May 26, 2021, 3:54:38 AM5/26/21
to LLVM Developers Mailing List
I have studied the GC state of Windows (IR/frontend) PGO a bit.
Below is my understanding. Hope folks can clarify it I make a mistake:)

For PGO, we need cnts/data/vals/names sections. cnts/data are main
metadata sections. vals is for value profiling.
For Windows, the cnts section is named .lprfc$M and the data section
is named .lprfd$M.
The garbage collection story is unfortunate.

If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external
symbol, MSVC link.exe may report a spurious duplicate symbol error
(error LNK2005), even if the associative section would be discarded
after handling the leader symbol.
lld-link doesn't have this limitation.
However, a portable implementation needs to work around MSVC link.exe.

For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr),
otherwise references to a non-prevailing symbol would cause an error.
Due to the limitation, .lprfd$M has to reside in its own COMDAT, no
sharing with .lprfc$M.

Different COMDAT groups mean that the liveness of one .lprfc$M does
not make its associative .lprfd$M live.
Since a .lprfd$M may be unreferenced, we have to conservatively assume
all COMDAT .lprfd$M live.
Since .lprfc$M input sections parallel .lprfd$M input sections, we
have to conservatively assume all COMDAT .lprfc$M live.
For an external symbol, we use a /INCLUDE: directive in .drectve to
mark it as a GC root.
As a result, .drectve may have many /INCLUDE: directives, just to work
around the link.exe limitation.
IIRC Chrome folks have found that the /INCLUDE: directives can cause
huge memory usage.

Note: for ELF we can use R_*_NONE to establish an artificial
dependency edge between two sections.
I don't think PE-COFF provides a similar feature.

So does link.exe accept feature requests? The external symbol
limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite
unfortunate...

It seems that /OPT:REF has another inefficiency - it cannot discard
non-COMDAT sections.
IIUC Fuchsia folks found that GC on non-COMDAT functions can save a
lot of space (40%? of PGO metadata sections). It is unfortunate that
link.exe doesn't discard non-COMDAT sections...

--
宋方睿
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

unread,
May 26, 2021, 4:51:46 PM5/26/21
to Fāng-ruì Sòng, LLVM Developers Mailing List
On Wed, May 26, 2021 at 12:54 AM Fāng-ruì Sòng via llvm-dev <llvm...@lists.llvm.org> wrote:
I have studied the GC state of Windows (IR/frontend) PGO a bit.
Below is my understanding. Hope folks can clarify it I make a mistake:)

For PGO, we need cnts/data/vals/names sections. cnts/data are main
metadata sections. vals is for value profiling.
For Windows, the cnts section is named .lprfc$M and the data section
is named .lprfd$M.
The garbage collection story is unfortunate.

If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external
symbol, MSVC link.exe may report a spurious duplicate symbol error
(error LNK2005), even if the associative section would be discarded
after handling the leader symbol.
lld-link doesn't have this limitation.
However, a portable implementation needs to work around MSVC link.exe.

Yes, that's pretty much how I remember this working.
 
For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr),
otherwise references to a non-prevailing symbol would cause an error.
Due to the limitation, .lprfd$M has to reside in its own COMDAT, no
sharing with .lprfc$M.

Yes. I'm sure the instrumented code references the counters, so the counters must be external. However, if the __profd_ symbol is not referenced directly from anywhere, perhaps we could make it internal and have it join the __profc_* comdat to reverse the edge. If there are explicit __profd_ references from code, this won't work.

Different COMDAT groups mean that the liveness of one .lprfc$M does
not make its associative .lprfd$M live.
Since a .lprfd$M may be unreferenced, we have to conservatively assume
all COMDAT .lprfd$M live.
Since .lprfc$M input sections parallel .lprfd$M input sections, we
have to conservatively assume all COMDAT .lprfc$M live.
For an external symbol, we use a /INCLUDE: directive in .drectve to
mark it as a GC root.
As a result, .drectve may have many /INCLUDE: directives, just to work
around the link.exe limitation.
IIRC Chrome folks have found that the /INCLUDE: directives can cause
huge memory usage.

Yes, I have wanted to implement an LLD-specific extension for this feature. The general problem is that symbol names are long, and we should avoid repeating them in the object file if at all possible. I think llvm.used and __attribute__((used)) should be compatible with link.exe, but it would be reasonable to make PGO an LLD-only feature to get these object file size savings. Specifically, we can copy the design of .llvm_addrsig to mark symbols as GC roots.
 
Note: for ELF we can use R_*_NONE to establish an artificial
dependency edge between two sections.
I don't think PE-COFF provides a similar feature.

Yep, not so far as I am aware.
 
So does link.exe accept feature requests? The external symbol
limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite
unfortunate...

I have a few points of contact at Microsoft, but there isn't a good general way to submit these kinds of low-level ABI feature requests.

The way the feature is designed, it only really works well when the associative data is static, unreferenced, and discovered only by section concatenation and iteration. This supports very many use cases: debug info, unwind info, dynamic initializers, asan global metadata, control flow guard metadata, others. In the PGO use case, I think we would be violating a hidden invariant that all the associative data should be static and unreferenced.
 
It seems that /OPT:REF has another inefficiency - it cannot discard
non-COMDAT sections.
IIUC Fuchsia folks found that GC on non-COMDAT functions can save a
lot of space (40%? of PGO metadata sections). It is unfortunate that
link.exe doesn't discard non-COMDAT sections...

This is not a major issue in practice because with -ffunction-sections / -fdata-sections, everything is in a comdat section, they just all use IMAGE_COMDAT_SELECT_NODUPLICATES.

I guess I really only have one idea here: find a way to avoid __profd_* references, and make __profc_ the comdat leader.

----

Separately, I have noticed that PGO greatly inhibits GC for other reasons. See the example from my last patch: https://reviews.llvm.org/D102818
For most linkonce_odr functions (so basically everything), the __profd_ global references the function, and then __profd_ is added to llvm.used. This prevents GC from working on any linkonce_odr function, which is bad for code size, so I removed the feature from most coverage builds.

Fāng-ruì Sòng via llvm-dev

unread,
Jun 4, 2021, 6:50:40 PM6/4/21
to LLVM Developers Mailing List

Circling back on this. Two patches have landed, decreasing the object
file/linked image sizes for ELF and Windows a bit.

* https://reviews.llvm.org/D103355 [InstrProfiling] Delete
linkage/visibility toggling for Windows
* https://reviews.llvm.org/D103372 [InstrProfiling] If no value
profiling, make data variable private and (for Windows) use one comdat


Windows still suffers from /INCLUDE:__covrec_DB956436E78DD5FAu strings
in the .drectve section.
Since .lcovfun$M is in a comdat for deduplication, it is subject to
default -opt:ref.
We have to use /INCLUDE because there is no cheaper ELF-style R_*_NONE.

Reply all
Reply to author
Forward
0 new messages