[llvm-dev] Support for zero flag ELF section groups in LLVM IR

50 views
Skip to first unread message

Petr Hosek via llvm-dev

unread,
Feb 11, 2021, 3:00:45 AM2/11/21
to llvm-dev
D95851 introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers and linkers already support zero flag ELF section groups and this change helps us reach feature parity.

An open question is how to best represent these in LLVM IR.

We represent COMDAT sections as global variables and other global variables can be included in COMDAT sections, see https://llvm.org/docs/LangRef.html#comdats for details.

We want to capture the fact that COMDAT sections are a special type of ELF section groups and we also want to preserve the existing syntax and API for backwards compatibility, but also because other formats like COFF support COMDAT sections, but not section groups.

Our proposal is to introduce ELF section groups as a new type of global variable akin to COMDAT sections. We would extend the language by changing:

  [, comdat[($name)]]

when declaring a global variable to:

  [, \(group[($name)] | [group] comdat[($name)]\)]

When it comes to C++ API, we would introduce Group as a superclass of Comdat:

  class Group {
    StringRef getName() const;
  };
  class Comdat : public Group {
    ...
  };
  class GlobalObject : public GlobalValue {
    ...
    bool hasGroup();
    Group *getGroup();
    void setGroup(Group G);
    // has/get/setComdat functions re-implemented in terms of has/get/setGroup
    ...
  };

Does this make sense? Can anyone think of a better representation?

Reid Kleckner via llvm-dev

unread,
Feb 11, 2021, 5:04:31 PM2/11/21
to Petr Hosek, llvm-dev
We are already using LLVM IR comdat groups for the same purpose, linker GC association, on COFF. I think we just need a flag to mark ELF comdat groups as, essentially, not actually being common data that the linker should deduplicate by name, aka a zero flag group. See how Windows ASan uses comdat groups on internal globals for metadata registration:

$ cat t.cpp
int f();
static int gv = f();

$ clang -S t.cpp  --target=x86_64-windows-msvc -o - -emit-llvm -fsanitize=address
...
$gv = comdat noduplicates
...
@gv = internal global { i32, [60 x i8] } zeroinitializer, comdat, align 32
...
@__asan_global_gv = private global { i64, i64, i64, i64, i64, i64, i64, i64 } { i64 ptrtoint ({ i32, [60 x i8] }* @gv to i64), i64 4, i64 64, i64 ptrtoint ([3 x i8]* @___asan_gen_.1 to i64), i64 ptrtoint ([6 x i8]* @___asan_gen_ to i64), i64 1, i64 ptrtoint ({ [6 x i8]*, i32, i32 }* @___asan_gen_.3 to i64), i64 -1 }, section ".ASAN$GL", comdat($gv), align 64, !associated !0


We are using the "noduplicates" comdat flag here, but @gv has internal linkage, and COFF linkers merge symbols, not section group names, so this code does what we want it to. Maybe it would make more sense if we used some kind of portable flag, like "internal" or "unique" on the comdat group to indicate that the group doesn't participate in merging. On COFF, we'd have the limitation that this feature only works for comdat groups named after internal linkage globals, but on ELF, the group could have any name.

You could rename the Comdat class to Group or SectionGroup or something, but I'm not sure there's much value in it. The terminology as it is makes sense for COFF, if not for ELF. ELF makes the distinction between comdat section groups and non-comdat section groups, but MSVC and clang-cl use the IMAGE_SCN_COMDAT symbol flag and the IMAGE_COMDAT_SELECT_ASSOCIATIVE selection flag to implement these types of groups.

Then, there's the cost of churning the textual IR spellings and method names. We have the freedom to change these things, but we should acknowledge that it does create work for ourselves and others. IMO, it is worth living with COFF-centric naming of an IR feature to avoid paying these costs. However, I am probably biased, as I have been calling this idea of a group of sections that travel together a "comdat" for a while now.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Petr Hosek via llvm-dev

unread,
Feb 11, 2021, 6:45:55 PM2/11/21
to Reid Kleckner, llvm-dev
Representing zero flag ELF section groups as a new type of IR COMDAT is something I considered. I was originally against it because semantically it seems backwards since in ELF, COMDAT is a special kind of group, not the other way round. However, I'm definitely biased in the other, that is ELF-centric direction and I agree with you that there's value in minimizing churn. Using a new COMDAT type is by far the easiest way to achieve what we want in IR.

Fangrui Song via llvm-dev

unread,
Feb 11, 2021, 11:43:14 PM2/11/21
to Reid Kleckner, llvm-dev
On 2021-02-11, Reid Kleckner via llvm-dev wrote:
>We are already using LLVM IR comdat groups for the same purpose, linker GC
>association, on COFF. I think we just need a flag to mark ELF comdat groups
>as, essentially, not actually being common data that the linker should
>deduplicate by name, aka a zero flag group. See how Windows ASan uses
>comdat groups on internal globals for metadata registration:

PE-centric naming is fine. ELF GRP_COMDAT was inspired by PE/COFF so
modeling comdat with COFF is justified :)

>$ cat t.cpp
>int f();
>static int gv = f();
>
>$ clang -S t.cpp --target=x86_64-windows-msvc -o - -emit-llvm
>-fsanitize=address
>...
>$gv = comdat noduplicates
>...
>@gv = internal global { i32, [60 x i8] } zeroinitializer, comdat, align 32
>...
>@__asan_global_gv = private global { i64, i64, i64, i64, i64, i64, i64, i64
>} { i64 ptrtoint ({ i32, [60 x i8] }* @gv to i64), i64 4, i64 64, i64
>ptrtoint ([3 x i8]* @___asan_gen_.1 to i64), i64 ptrtoint ([6 x i8]*
>@___asan_gen_ to i64), i64 1, i64 ptrtoint ({ [6 x i8]*, i32, i32 }*
>@___asan_gen_.3 to i64), i64 -1 }, section ".ASAN$GL", comdat($gv), align
>64, !associated !0
>
>
>We are using the "noduplicates" comdat flag here, but @gv has internal
>linkage, and COFF linkers merge symbols, not section group names, so this
>code does what we want it to. Maybe it would make more sense if we used
>some kind of portable flag, like "internal" or "unique" on the comdat group
>to indicate that the group doesn't participate in merging. On COFF, we'd
>have the limitation that this feature only works for comdat groups named
>after internal linkage globals, but on ELF, the group could have any name.

Yes, looks like ELF's zero section group flag can just reuse `comdat noduplicates`.

>You could rename the Comdat class to Group or SectionGroup or something,
>but I'm not sure there's much value in it. The terminology as it is makes
>sense for COFF, if not for ELF. ELF makes the distinction between comdat
>section groups and non-comdat section groups, but MSVC and clang-cl use the
>IMAGE_SCN_COMDAT symbol flag and the IMAGE_COMDAT_SELECT_ASSOCIATIVE
>selection flag to implement these types of groups.

Start OT now.

It is somewhat awkward that the section header has the
IMAGE_SCN_LNK_COMDAT flag while the selection record is specified as an
auxiliary symbol record.

In ELF, to encode symbol table related information. A new section is
needed (64 bytes for the section header).
IIUC PE/COFF needs a section header (40 bytes). But to represent
association whether no new section/symbol is needed, an auxiliary symbol
record can be used (18 bytes). If the number of the interconnected
sections is small, the cost can be smaller than adding a new section
header.

Reply all
Reply to author
Forward
0 new messages