__start_<section>/__stop_<section> symbols and GC

69 views
Skip to first unread message

Fangrui Song

unread,
Feb 3, 2021, 11:06:16 PM2/3/21
to Generic System V Application Binary Interface

If a section name consists of C locale isalnum characters or '_', some linkers
define __start_<section>/__stop_<section> symbols if they are undefined.

GNU ld treats such sections as GC roots (arguably the sections do not convey the
intention) Now if users want to make such sections collectable, they
have to use some flags. Currently LLVM is moving toward a convention to use
SHF_LINK_ORDER to make them GCable.

(If SHT_GNU_RETAIN were invented earlier, arguably the sections intending to be
retained should use that flag. Then no SHF_LINK_ORDER is needed.)

The above is an overview of the matter in GNU and LLVM toolchains.  What do
other implementations do?


FWIW I summarized what has been done to SHF_LINK_ORDER in GNU and LLVM

Ali Bahrami

unread,
Feb 8, 2021, 12:34:40 PM2/8/21
to gener...@googlegroups.com
I'll try to answer your question for Solaris. I'm not sure I
fully understand the question, but I think I understand enough
to do the experiment. I do understand that you're trying to use
SHF_LINK_ORDER to represent a relationship between a text section
and a related "meta data" section, and that you're trying to achieve
this without using the ELF comdat group mechanism.

From Alan's comments in the bug, I gather that the sections refer
to each other, or perhaps to themselves, using SHF_LINK_ORDER. I'm
not sure how to interpret "self-link", but I'm sure it's one of those:

> How exactly is the proposed SHF_LINK_ORDER trick supposed
> to work? A reference to start_foo/stop_foo symbols from
> a kept section currently marks all input sections named
> foo as kept. Are you suggesting that any section named
> foo that is SHF_LINK_ORDER with a self-link should not
> be marked? But those foo that do not have a self-link
> should be marked?

I can't easily create your exact scenario, but I did an experiment
using this little hello world variant:

#include <stdio.h>
static void f2(void);

static void
f1(void)
{
printf("f1 \n");
f2();
}

static void
f2(void)
{
printf("f2 \n");
f1();
}

int
main(int argc, char **argv)
{
printf("hello \n");
}

If f1 or f2 were ever called, this program would fill the
stack and die, but here, they're unused, and just a way
to experiment with unused material processing.

Compiling this with the option to put each function in a separate
section yields sections for f1() and f2(), which are related to
each other, but not accessed by anything else. If I link this normally,
the sections are kept, but if I link with the option to discard
unused sections, f1() and f2() are discarded.

With that as a baseline, I took a copy of the relocatable object,
and modified the section headers for the f1 and f2 sections to add
SHF_LINK_ORDER and set sh_link. I did 2 versions of this experiment:

1) The sh_link for f1 references f2, and the sh_link for
f2 references f1.

2) The sh_link for each section references itself.

In both cases, the objects linked cleanly, with no infinite
loops, and turning on the option to discard unused sections
eliminated the f1 and f2 sections. In both cases, the apparent
behavior is as if none of these SHF_LINK_ORDER games were being
played.

I hope that helps. If I've misunderstood your question to the
point where the above isn't useful, let me know.

- Ali

Alan Modra

unread,
Feb 10, 2021, 6:16:46 AM2/10/21
to gener...@googlegroups.com
On Mon, Feb 08, 2021 at 10:34:38AM -0700, Ali Bahrami wrote:
> From Alan's comments in the bug, I gather that the sections refer
> to each other, or perhaps to themselves, using SHF_LINK_ORDER. I'm
> not sure how to interpret "self-link", but I'm sure it's one of those:

Themselves. So it is a cunning use of SHF_LINK_ORDER that wouldn't
normally occur.

Some background. GNU ld defines __start_<section_name> and
__stop_<section_name> symbols for input sections that meet some naming
limitations. If those symbols are referenced but not defined by the
user then __start_<section_name> will be defined at the beginning of
the first input section so named and __stop_<section_name> at the end
of the last input section so named. A reference to either of those
symbols from a section that gc-sections keeps for any reason, will
cause all the <section_name> input sections to be kept. Fangrui Song
would like a way of modifying that gc-sections behaviour, so that not
all <section_name> sections are kept by the special effect of linker
defined __start_<section_name> and __stop_<section_name> symbols.
The proposal is that any section with SHF_LINK_ORDER and sh_link to
itself not be kept due to this special effect. Of course, such a
section may be kept from garbage collection for other reasons.

--
Alan Modra
Australia Development Lab, IBM

Fangrui Song

unread,
Jul 18, 2021, 7:58:44 PM7/18/21
to Generic System V Application Binary Interface
[Circle back to this thread. Share some updates so that folks can know what happened in the Linux world.]

I consider "__start_/__stop_ references from a live input section retains all C identifier name sections." unfortunate
I can sorta understand it because at that time there wasn't a good mechanism making an arbitrary section GC root (now GNU ABI has the section flag SHF_GNU_RETAIN)
without using linker options/scripts.

GNU ld (since 2.37) and ld.lld (since 13) have -z start-stop-gc which drops this rule.
ld.lld 13.0.0 enables -z start-stop-gc by default.

---

The encapsulation symbol behavior is entirely orthogonal to SHF_LINK_ORDER.
SHF_LINK_ORDER is kinda cumbersome and is not as useful as it could be.

Compilers doing instrumentation make find https://reviews.llvm.org/D104933 useful.
In short, SHF_LINK_ORDER is only useful when the metadata cannot be referenced by inlinable functions.
(Non-inlinable is nearly impossible to ensure, if you consider LTO and the GNU __attribute__((always_inline)) semantics.)

---

More GC refinement may need to happen for Linux linkers: section types/flags
should probably lose traditional GC root semantics, if in a section group or has
GC-ability of SHT_INIT_ARRAY)

---

This post is not editable. I try to keep the more detailed
up-to-date if I ever find new issues.
Reply all
Reply to author
Forward
0 new messages