Proposal for a new section type SHT_RELR

1444 views
Skip to first unread message

Rahul Chaudhry

unread,
Dec 15, 2017, 5:21:00 PM12/15/17
to gener...@googlegroups.com, Sriraman Tallam
This is a proposal for defining a new section type "SHT_RELR" in the gABI. This
new section will be used to compactly encode R_*_RELATIVE relocations in shared
object files and position independent executables (PIE).

## Background

In position independent executables (PIE), R_*_RELATIVE relocations can account
for more than 99% of the entries in SHT_REL/SHT_RELA sections. This results in
more than 5% size bloat compared to non-PIE binaries. See this thread for more
details: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html

Recent releases of Debian, Ubuntu, and several other distributions build
executables as PIE by default. Suprateeka Hegde posted some statistics in the
above thread on the prevalence of relative relocations in executables residing
in /usr/bin: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00013.html

This proposal is based on the 'experimental-relr' prototype from Cary Coutant
that is available at 'users/ccoutant/experimental-relr' branch in the binutils
repository, and was described in this post in the same thread on gnu-gabi:
https://sourceware.org/ml/gnu-gabi/2017-q2/msg00003.html


## Proposal

We propose defining a new section type and dynamic array tags in the
generic-abi for compactly encoding the R_*_RELATIVE relocations in a
special '.relr.dyn' section.

Defining a new section type:
Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-9: Section Types,sh_type

Name Value
-----------------
SHT_RELR 19

Description:

SHT_RELR: The section holds relative relocation entries without explicit
addends or info, such as type Elf32_Relr for the 32-bit class of
object files or type Elf64_Relr for the 64-bit class of object
files. An object file may have multiple relocation sections. See
``Relocation'' below for details.


Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-14: sh_link and sh_info Interpretation

sh_type: SHT_RELR (same as SHT_REL and SHT_RELA)
sh_link: The section header index of the associated symbol table.
sh_info: The section header index of the section to which the relocation
applies.


Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-16: Special Sections

Name: .relr<name>
Type: SHT_RELR
Attributes: (same as .rel<name> and .rela<name>)
These sections hold relocation information, as described in
``Relocation''. If the file has a loadable segment that includes
relocation, the sections' attributes will include the SHF_ALLOC bit;
otherwise, that bit will be off. Conventionally, name is supplied by the
section to which the relocations apply. Thus a relocation section for
.text normally would have the name .rel.text, .rela.text, or .relr.text.


Defining new dynamic array tags:
Chapter 5: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
Figure 5-10: Dynamic Array Tags, d_tag

Name Value d_un Executable Shared Object
---------------------------------------------------
DT_RELRSZ 35 d_val optional optional
DT_RELR 36 d_ptr optional optional
DT_RELRENT 37 d_val optional optional

Note: The use of d_un conforms to the odd-even rule in that section.

Description:

DT_RELR: This element is similar to DT_RELA, except its table has implicit
addends and info, such as Elf32_Relr for the 32-bit file class or
Elf64_Relr for the 64-bit file class. If this element is present,
the dynamic structure must also have DT_RELRSZ and DT_RELRENT
elements.

DT_RELRSZ: This element holds the total size, in bytes, of the DT_RELR
relocation table.

DT_RELRENT: This element holds the size, in bytes, of the DT_RELR relocation
entry.


This table will hold entries of type Elf32_Relr for the 32-bit class of object
files or type Elf64_Relr for the 64-bit class of object files:

Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.reloc.html
Figure 4-23: Relocation Entries

typedef Elf32_Word Elf32_Relr;
typedef Elf64_Xword Elf64_Relr;


Addends are stored implicitly in the location to be modified, similar to
Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
entries effectively store a sorted list of offsets only.

The encoding used in these entries is a simple combination of delta-encoding
and a bitmap of offsets. For the 64-bit entries, higher 8-bits contain delta
since last offset, and lower 56-bits contain a bitmap for which words to apply
the relocation to. This is best described by showing the code for decoding the
section entries:

#define ELF64_R_JUMP(val) ((val) >> 56)
#define ELF64_R_BITS(val) ((val) & 0xffffffffffffff)

#ifdef DO_RELR
{
ElfW(Addr) offset = 0;
for (; relative < end; ++relative)
{
ElfW(Addr) jump = ELFW(R_JUMP) (*relative);
ElfW(Addr) bits = ELFW(R_BITS) (*relative);
offset += jump * sizeof(ElfW(Addr));
if (jump == 0)
{
++relative;
offset = *relative;
}
ElfW(Addr) r_offset = offset;
for (; bits != 0; bits >>= 1)
{
if ((bits&1) != 0)
elf_machine_relr_relative (l_addr, (void *) (l_addr+r_offset));
r_offset += sizeof(ElfW(Addr));
}
}
}
#endif

Note that the 8-bit 'jump' encodes the number of ElfW(Addr) sized words since
last offset. The case where jump would not fit in 8-bits is handled by setting
jump to 0, and emitting the full offset for the next relocation in the
subsequent entry.

This encoding can represent up to 56 relative relocations in a single 64-bit
entry. The above code is the entirety of the implementation for decoding and
processing SHT_RELR sections in glibc dynamic loader.

For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and 24-bits for
the bitmap:

#define ELF32_R_JUMP(val) ((val) >> 24)
#define ELF32_R_BITS(val) ((val) & 0xffffff)


## Motivating examples

Here are three real world examples that demonstrate the savings:

1. Chrome browser (x86_64, built as PIE):
File size (stripped): 152265064 bytes (145.21MB)
605159 relocation entries (24 bytes each) in '.rela.dyn'
594542 are R_X86_64_RELATIVE relocations (98.25%)
14269008 bytes (13.61MB) in use in '.rela.dyn' section
109256 bytes (0.10MB) if moved to '.relr.dyn' section

Savings: 14159752 bytes, or 9.29% of original file size.


2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http')
File size (stripped): 10238168 bytes (9.76MB)
83810 relocation entries (24 bytes each) in '.rela.dyn'
83804 are R_X86_64_RELATIVE relocations (99.99%)
2011296 bytes (1.92MB) in use in .rela.dyn section
43744 bytes (0.04MB) if moved to .relr.dyn section

Savings: 1967552 bytes, or 19.21% of original file size.


3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
File size (stripped): 3030032 bytes (2.89MB)
6680 relocation entries (24 bytes each) in '.rela.dyn'
6272 are R_X86_64_RELATIVE relocations (93.89%)
150528 bytes (0.14MB) in use in .rela.dyn section
1992 bytes (0.00MB) if moved to .relr.dyn section

Savings: 148536 bytes, or 4.90% of original file size.


Recent releases of Debian, Ubuntu, and several other distributions build
executables as PIE by default. The third example shows that using SHT_RELR
sections to encode relative relocations can bring decent savings to executable
sizes in /usr/bin across many distributions.


## Topics for discussion

* Does this really belong in the generic-abi? The R_*_RELATIVE relocations can
be specified in the SHT_REL/SHT_RELA sections. This is just an optimization.

Our opinion (for seeding the discussion):

Yes, SHT_RELR is just an optimization for storing the R_*_RELATIVE relocations.
If this proposal is rejected, the relocations can continue to reside in the
SHT_REL/SHT_RELA sections like they do today. However, from that point of view,
SHT_REL is also just an optimization over SHT_RELA, so there is precedent for
doing this in the generic-abi.

Entries of type Elf32_Rel and Elf64_Rel store an implicit addend in the
location to be modified, saving 33% memory per entry over Elf32_Rela or
Elf64_Rela.

SHT_RELR takes this same approach a few steps further: removing the r_info
field (relocation type is implicit, and there is no need for symbol table
index), and compactly encoding the remaining data in the entries, which is
simply a sorted list of offsets.


* Assuming there is consensus that adding SHT_RELR to generic-abi is a good
idea, should the ABI really detail the encoding to be used or just define
SHT_RELR/DT_RELR* and leave the encoding unspecified?

Based on the specification of Elf32_Rel, Elf32_Rela, Elf64_Rel, and Elf64_Rela
in the ABI, it probably makes sense to at least define the Elf32_Relr and
Elf64_Relr types.


* Assuming there is consensus that the ABI should specify the encoding used in
SHT_RELR sections, is the encoding described here the best one possible?

Better encodings may be possible. Discovering a more compact *and* simpler
encoding would be a very pleasant surprise indeed. Ideas welcome.

We would be happy to provide details of encoding schemes we've experimented
with and how they compared.

There are two minor optimizations of the above scheme that are worth mentioning
here: (64-bit version described here, but they're applicable to 32-bit as well)

1. The jump field in an entry will always be >=56. Values 0-55 are not used,
since the corresponding offsets would have been covered in the previous
entry. We can bias jump by 55 (jump=0 is used as a special value) to raise
the upper bound for deltas that can be encoded without requiring an extra
word.

2. The least significant bit of the bitmap is always 1. There is an explicit
relocation at every new value of offset. The bitmap can be defined to mean
the *next* 56 words after the new offset, making the relocation at the new
offset implicit. This enables each entry to encode 57 relocations instead
of 56.


* What happens when an older dynamic loader that does not recognize DT_RELR*
tags meets a shared library or executable that contains a SHT_RELR section?

Mysterious crashes!!

Unknown DT_* tags are not fatal errors at program loading time. The older
dynamic loader will simply skip the relocations in SHT_RELR section and start
the executable. This is likely to result in mysterious crashes.

Here's one proposal to handle the situation:

While unknown DT_* tags are not fatal errors at program loading time, unknown
relocations are.

We propose adding a new R_*_RELR relocation on each architecture that adds
support for SHT_RELR sections. Whenever the linker emits a SHT_RELR section in
a shared object or executable, it must also include one entry of type R_*_RELR
in the SHT_REL/SHT_RELA section of the binary. When the dynamic loader adds
support on that architeccure for SHT_RELR sections, it must also add support
for the corresponding R_*_RELR relocation (by simply ignoring it instead of
aborting).

This scheme would ensure that older dynamic loaders error out with a useful
error message when trying to load newer binaries with SHT_RELR sections.

Rod Evans

unread,
Dec 19, 2017, 2:47:43 PM12/19/17
to gener...@googlegroups.com
On Fri, Dec 15, 2017 at 11:32 AM, 'Rahul Chaudhry' via Generic System V Application Binary Interface <gener...@googlegroups.com> wrote:
This is a proposal for defining a new section type "SHT_RELR" in the gABI. This
new section will be used to compactly encode R_*_RELATIVE relocations in shared
object files and position independent executables (PIE).


Do you have any data that indicates this new structure provides any benefit
to the runtime cost of establishing these relocations?  And is this benefit
worth creating a new structure that will require a number of tool chain
modifications?


FWIW, in Solaris we long ago made changes to the relocation layout with
the goal of minimizing runtime relocation overhead without having to invent
new record types, or teach the tools that create/consume/analyze these
records to cope with new types.

 i.   Relocation sections were originally concatenated together to form a
      contiguous relocation block, identified by DT_ entries.  We changed
      this to create one .SUNW_reloc section - although the PLT's are still
      in an adjacent relocation section to allow for lazy PLT binding.

 ii.  This single relocation section could then be sorted.  All relative
      relocations are grouped together, and identified by additional DT_
      entries, allowing the runtime linker to enter an optimized (simplified)
      relocation loop, that processes the relocations quickly.  I assume
      your new relocation table would provide for the same.

 iii. This single relocation section also has all symbolic relocations
      sorted by symbol name, which allows the runtime linker to reuse the
      last symbol it looked up for the next record if appropriate.  This
      reduced the number if symbols being searched for without requiring
      a complex caching mechanism.

The above changes had a significant affect in reducing the relocation
overhead for all types of dynamic objects loaded in a process.  I have
a feeling you guys are already doing these sorts of things too.

 iv.  The biggest savings came from reducing the exported symbols for
      dynamic objects to only the symbols needed by external objects.
      This interface definition reduced *many* global symbols to locals,
      which in turn transformed symbolic relocations into relatives.
      That 100'th dependency loaded by Firefox no longer needed to look
      in the 100 other objects to find the global "foobar" symbol it
      defined itself :-)  This interface definition had the biggest affect
      of all the relocation tricks we tried, pushing more work into the
      optimized relocation engine, and less work into symbol lookup.
      Plus, the greatly reduced .dynsym brought down the size of the
      .dynstr and .hash with it.

 v.   We then added direct-bindings, where objects searched "directly"
      in the dependencies established at link-edit time for the external
      definitions they required, rather than the expensive 100 dependency
      search that could take place in the Firefox object.  This again had
      a huge win in reducing symbol lookup and reducing overall relocation
      overhead.

Bottom line, we did quite a bit of stuff (and you might have too), without
having changed the relocation record structure, and I'm wondering if you'd
"feel" any improvements after making the changes you're suggesting.

Anyway, just my 10 cents.

--
Rod.

Cary Coutant

unread,
Dec 19, 2017, 7:24:25 PM12/19/17
to gener...@googlegroups.com, Sriraman Tallam
> Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
> Figure 4-16: Special Sections
>
> Name: .relr<name>
> Type: SHT_RELR
> Attributes: (same as .rel<name> and .rela<name>)
> These sections hold relocation information, as described in
> ``Relocation''. If the file has a loadable segment that includes
> relocation, the sections' attributes will include the SHF_ALLOC bit;
> otherwise, that bit will be off. Conventionally, name is supplied by the
> section to which the relocations apply. Thus a relocation section for
> .text normally would have the name .rel.text, .rela.text, or .relr.text.

Since these relocations will normally be found only in ET_EXEC and
ET_DYN files, the typical name will be .relr.dyn. I think it's worth
pointing out here that these are for dynamic linking.

> Addends are stored implicitly in the location to be modified, similar to
> Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
> entries effectively store a sorted list of offsets only.

I think the term "addend" is, while not incorrect per se, a bit
misleading. We really have a relocated value in the data -- an actual
value rather than just the addend part of an expression -- that may
need further relocation if the load module is loaded at a virtual
address different from its link-time base address.
I'm probably being a bit too OCD over this, but it bugs me that we
have this order of relocation entries:

0BBBBBBB AAAAAAAA JBBBBBBB JBBBBBBB 0BBBBBBB AAAAAAAA ...

Where the B's in the first, third, and fourth words apply to the
address (A's) in the second word, and the B's in the fifth word apply
to the address in the sixth, and so on.

I'd prefer a layout like this:

AAAAAAAA BBBBBBBJ BBBBBBBJ BBBBBBBX AAAAAAAA BBBBBBBJ ...

Where we start with an address, apply 56 (or 24) relocation bits from
there, then check the jump bits. If the jump bits are the escape value
(X), then read a new address before the next set of relocation bits.

I don't think this changes your encoding efficiency at all, but I do
think it will make the encoding a little clearer, and perhaps less
prone to misinterpretation.

> There are two minor optimizations of the above scheme that are worth mentioning
> here: (64-bit version described here, but they're applicable to 32-bit as well)
>
> 1. The jump field in an entry will always be >=56. Values 0-55 are not used,
> since the corresponding offsets would have been covered in the previous
> entry. We can bias jump by 55 (jump=0 is used as a special value) to raise
> the upper bound for deltas that can be encoded without requiring an extra
> word.

I'd argue for an implicit jump of 56, using jump = 255 as the escape
value, so that two consecutive runs of 56 words would use jump = 0.

> 2. The least significant bit of the bitmap is always 1. There is an explicit
> relocation at every new value of offset. The bitmap can be defined to mean
> the *next* 56 words after the new offset, making the relocation at the new
> offset implicit. This enables each entry to encode 57 relocations instead
> of 56.

I don't think this is worth the (minor) additional complication.

> * What happens when an older dynamic loader that does not recognize DT_RELR*
> tags meets a shared library or executable that contains a SHT_RELR section?
>
> Mysterious crashes!!
>
> Unknown DT_* tags are not fatal errors at program loading time. The older
> dynamic loader will simply skip the relocations in SHT_RELR section and start
> the executable. This is likely to result in mysterious crashes.
>
> Here's one proposal to handle the situation:
>
> While unknown DT_* tags are not fatal errors at program loading time, unknown
> relocations are.
>
> We propose adding a new R_*_RELR relocation on each architecture that adds
> support for SHT_RELR sections. Whenever the linker emits a SHT_RELR section in
> a shared object or executable, it must also include one entry of type R_*_RELR
> in the SHT_REL/SHT_RELA section of the binary. When the dynamic loader adds
> support on that architeccure for SHT_RELR sections, it must also add support
> for the corresponding R_*_RELR relocation (by simply ignoring it instead of
> aborting).
>
> This scheme would ensure that older dynamic loaders error out with a useful
> error message when trying to load newer binaries with SHT_RELR sections.

I'd suggest instead using a new program header table tag,
PT_DYNAMIC_2, to replace PT_DYNAMIC. In updated loaders, it would be
treated exactly the same as PT_DYNAMIC, but its support would imply
support for the DT_RELR* tags.

If legacy loaders wouldn't complain about PT_DYNAMIC_2, or about the
absence of PT_DYNAMIC, then maybe we need something more aggressive,
or a trick like the proposal above.

-cary

Cary Coutant

unread,
Dec 19, 2017, 7:52:05 PM12/19/17
to gener...@googlegroups.com
> Do you have any data that indicates this new structure provides any benefit
> to the runtime cost of establishing these relocations? And is this benefit
> worth creating a new structure that will require a number of tool chain
> modifications?

I'm curious to see some measurements, too, but I recall that Simon
Baldwin found some significant startup improvements in addition to the
significant space savings. It's kind of hard to imagine that you
wouldn't see some significant improvements in speed simply by reducing
the total size of the relocations by the factors given in the
proposal.

> FWIW, in Solaris we long ago made changes to the relocation layout with
> the goal of minimizing runtime relocation overhead without having to invent
> new record types, or teach the tools that create/consume/analyze these
> records to cope with new types.
>
> i. Relocation sections were originally concatenated together to form a
> contiguous relocation block, identified by DT_ entries. We changed
> this to create one .SUNW_reloc section - although the PLT's are still
> in an adjacent relocation section to allow for lazy PLT binding.
>
> ii. This single relocation section could then be sorted. All relative
> relocations are grouped together, and identified by additional DT_
> entries, allowing the runtime linker to enter an optimized
> (simplified)
> relocation loop, that processes the relocations quickly. I assume
> your new relocation table would provide for the same.

Yes, Linux does this. We sort all the relative relocations to the
front, and add DT_RELCOUNT/DT_RELACOUNT to indicate the break. This
proposal simply takes that idea to the next logical level.

> iii. This single relocation section also has all symbolic relocations
> sorted by symbol name, which allows the runtime linker to reuse the
> last symbol it looked up for the next record if appropriate. This
> reduced the number if symbols being searched for without requiring
> a complex caching mechanism.

Linux also does this. (Both optimizations are thanks, almost
certainly, to Sun's example.)

> iv. The biggest savings came from reducing the exported symbols for
> dynamic objects to only the symbols needed by external objects.
> This interface definition reduced *many* global symbols to locals,
> which in turn transformed symbolic relocations into relatives.
> That 100'th dependency loaded by Firefox no longer needed to look
> in the 100 other objects to find the global "foobar" symbol it
> defined itself :-) This interface definition had the biggest affect
> of all the relocation tricks we tried, pushing more work into the
> optimized relocation engine, and less work into symbol lookup.
> Plus, the greatly reduced .dynsym brought down the size of the
> .dynstr and .hash with it.

To this end, we try to encourage developers to use -fvisibility=hidden
when compiling, with explicit __attribute__((visibility("default")))
in the source for exported symbols, similar to the use of
__dllexport__ on Windows. I think some of these big apps that really
care are doing this.

> v. We then added direct-bindings, where objects searched "directly"
> in the dependencies established at link-edit time for the external
> definitions they required, rather than the expensive 100 dependency
> search that could take place in the Firefox object. This again had
> a huge win in reducing symbol lookup and reducing overall relocation
> overhead.

I'd like to see -Bdirect on Linux. I had a proposal for HP-UX to offer
a form of direct binding as an option before I left HP, but never
could drum up much interest -- at the time, it was considered too much
like Windows, and too little like Unix.

-cary

Florian Weimer

unread,
Dec 20, 2017, 1:33:16 PM12/20/17
to gener...@googlegroups.com, Cary Coutant, Sriraman Tallam
On 12/20/2017 01:24 AM, Cary Coutant wrote:
> Since these relocations will normally be found only in ET_EXEC and
> ET_DYN files, the typical name will be .relr.dyn. I think it's worth
> pointing out here that these are for dynamic linking.
Are they? Statically linked position-independent executables will need
them as well.

> I'd suggest instead using a new program header table tag,
> PT_DYNAMIC_2, to replace PT_DYNAMIC. In updated loaders, it would be
> treated exactly the same as PT_DYNAMIC, but its support would imply
> support for the DT_RELR* tags.

I believe that this would break many tools which would not have to
understand those relocations.

Maybe we can use an ABI note for this?

Thanks,
Florian

Rod Evans

unread,
Dec 20, 2017, 4:25:17 PM12/20/17
to gener...@googlegroups.com
On Tue, Dec 19, 2017 at 4:52 PM, Cary Coutant <ccou...@gmail.com> wrote:
I'm curious to see some measurements, too, but I recall that Simon
Baldwin found some significant startup improvements in addition to the
significant space savings. It's kind of hard to imagine that you
wouldn't see some significant improvements in speed simply by reducing
the total size of the relocations by the factors given in the
proposal.

Err, yeah.  Although I don't know how many times I've saved space and
reduced the amount of processing necessary to achieve a goal, only to
be unable to observe anything measurable with normal systems use.

We straddle a fence where there's a trade off between the complexity
of solving a problem and the effect of the solution.

The complexity with this proposal is how many tool chain components
might need enhancing to use this new data, or at least not fall over
because of it.

I'd like to see -Bdirect on Linux...

Ohh, and how did I forget to bullet list one of our greatest performers?

 vi.  Lazy loading, where objects are only loaded if a reference binds
      to them.  If you don't load objects in the first place, you don't
      relocate them, call their .init's, etc.  There's nothing faster
      that not doing something at all.

--
Rod.

Rahul Chaudhry

unread,
Dec 20, 2017, 5:27:59 PM12/20/17
to gener...@googlegroups.com, Sriraman Tallam
On Tue, Dec 19, 2017 at 4:24 PM, Cary Coutant <ccou...@gmail.com> wrote:
>> Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
>> Figure 4-16: Special Sections
>>
>> Name: .relr<name>
>> Type: SHT_RELR
>> Attributes: (same as .rel<name> and .rela<name>)
>> These sections hold relocation information, as described in
>> ``Relocation''. If the file has a loadable segment that includes
>> relocation, the sections' attributes will include the SHF_ALLOC bit;
>> otherwise, that bit will be off. Conventionally, name is supplied by the
>> section to which the relocations apply. Thus a relocation section for
>> .text normally would have the name .rel.text, .rela.text, or .relr.text.
>
> Since these relocations will normally be found only in ET_EXEC and
> ET_DYN files, the typical name will be .relr.dyn. I think it's worth
> pointing out here that these are for dynamic linking.

Ack. Making a note that this should probably be more SHT_RELR specific.


>> Addends are stored implicitly in the location to be modified, similar to
>> Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
>> entries effectively store a sorted list of offsets only.
>
> I think the term "addend" is, while not incorrect per se, a bit
> misleading. We really have a relocated value in the data -- an actual
> value rather than just the addend part of an expression -- that may
> need further relocation if the load module is loaded at a virtual
> address different from its link-time base address.

Ack. Making a note that the language here needs some work in the final version.


> I'm probably being a bit too OCD over this, but it bugs me that we
> have this order of relocation entries:
>
> 0BBBBBBB AAAAAAAA JBBBBBBB JBBBBBBB 0BBBBBBB AAAAAAAA ...
>
> Where the B's in the first, third, and fourth words apply to the
> address (A's) in the second word, and the B's in the fifth word apply
> to the address in the sixth, and so on.
>
> I'd prefer a layout like this:
>
> AAAAAAAA BBBBBBBJ BBBBBBBJ BBBBBBBX AAAAAAAA BBBBBBBJ ...
>
> Where we start with an address, apply 56 (or 24) relocation bits from
> there, then check the jump bits. If the jump bits are the escape value
> (X), then read a new address before the next set of relocation bits.
>
> I don't think this changes your encoding efficiency at all, but I do
> think it will make the encoding a little clearer, and perhaps less
> prone to misinterpretation.

Agree. This does look more regular and easier to understand.

Andrew Grieve suggested another encoding, which is arguably simpler *and* gives
better results. A very pleasant surprise indeed! Here's the gist of it:

The encoded sequence looks like:
[ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]

i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63
relocations each, at subsequent offsets following the last address entry.

The bitmap entries must have 1 in the least-significant-bit. The assumption
here is that an address cannot have 1 in lsb. Odd addresses are not supported.

This has a couple of interesting properties:

1. Looking at any entry, it is clear whether it's an address or a bitmap: even
means address, odd means bitmap.

2. Just a simple list of addresses is a valid encoding.


Here's how it compares for the examples in the proposal:

1. Chrome browser (x86_64, built as PIE):
109256 bytes using 8-bit jump/56-bit bitmap encoding
98024 bytes using this 63-bit bitmap encoding

2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http')
43744 bytes using 8-bit jump/56-bit bitmap encoding
43080 bytes using this 63-bit bitmap encoding

3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
1992 bytes using 8-bit jump/56-bit bitmap encoding
1792 bytes using this 63-bit bitmap encoding

Do we really care about relative relocations at odd offsets? In the rare cases
that these do exist, the odd ones can stay in the SHT_REL / SHT_RELA sections.


>> There are two minor optimizations of the above scheme that are worth mentioning
>> here: (64-bit version described here, but they're applicable to 32-bit as well)
>>
>> 1. The jump field in an entry will always be >=56. Values 0-55 are not used,
>> since the corresponding offsets would have been covered in the previous
>> entry. We can bias jump by 55 (jump=0 is used as a special value) to raise
>> the upper bound for deltas that can be encoded without requiring an extra
>> word.
>
> I'd argue for an implicit jump of 56, using jump = 255 as the escape
> value, so that two consecutive runs of 56 words would use jump = 0.

Sounds good to me.


>> 2. The least significant bit of the bitmap is always 1. There is an explicit
>> relocation at every new value of offset. The bitmap can be defined to mean
>> the *next* 56 words after the new offset, making the relocation at the new
>> offset implicit. This enables each entry to encode 57 relocations instead
>> of 56.
>
> I don't think this is worth the (minor) additional complication.

Ack.
Sounds good to me. While I realize that this is an important issue to resolve,
I don't have an informed opinion on the best way to handle this.

Rahul

Rahul Chaudhry

unread,
Dec 20, 2017, 6:29:38 PM12/20/17
to gener...@googlegroups.com
On Tue, Dec 19, 2017 at 11:47 AM, Rod Evans <rod.i...@gmail.com> wrote:
> Do you have any data that indicates this new structure provides any benefit
> to the runtime cost of establishing these relocations? And is this benefit
> worth creating a new structure that will require a number of tool chain
> modifications?
>
>
> Bottom line, we did quite a bit of stuff (and you might have too), without
> having changed the relocation record structure, and I'm wondering if you'd
> "feel" any improvements after making the changes you're suggesting.

I did some measurements on boot time for Chromebooks (which run Chrome browser
on boot). There was no measurable speedup in my limited experiments.

The main goal of this proposal is not speedups in program startup time. The
main goal is bringing the size of PIE binaries at par with regular executables.
The way end users will "feel" an improvement is savings in disk space, and more
importantly, network bandwidth for updates. Think 5-10% smaller packages and
quicker updates across Debian, Ubuntu, Android, etc., all of which use PIE
binaries by default, and distribute binary updates to a multitude of users.

Rahul

Alan Modra

unread,
Dec 21, 2017, 10:54:59 PM12/21/17
to 'Rahul Chaudhry' via Generic System V Application Binary Interface, Sriraman Tallam
I like the new encoding, and I think it is quite reasonable to not
support relative relocs at odd addresses.

An idea occurred to me when reading the proposal earlier this month at
https://sourceware.org/ml/binutils/2017-12/msg00067.html, but I
hesitated to mention it then because it's quite radical. I'm only
mentioning it now because it seems we don't have a solid proposal to
solve the problem of old dynamic loaders (or tools that examine
relocs) simply ignoring the new feature, except for emitting a new
R_*_RELR dynamic reloc. DT_DYNAMIC_2 is a cunning trick for the
dynamic loader problem, but I agree with Florian that DT_DYNAMIC_2 is
not ideal.

So if R_*_RELR must be used, you don't really need any of .relr.dyn,
SHT_RELR, or DT_RELR*. The encoded bitmaps and addresses could be
placed immediately after the R_*_RELR dynamic reloc, and you'd start
with bitmap words because r_offset for the reloc is the first address
relocated. Obviously .rel{,a}.dyn would no longer be a regular array,
which is the radical part of this idea.

A somewhat less radical approach is to use R_*_RELR and put the
encoded bitmaps in a new SHT_PROGBITS section placed immediately
before .rela.dyn. You wouldn't need SHT_RELR since the interpretation
of that section isn't special (well, R_*_RELR interprets the section).
You would also only need at most one new dynamic tag to tell the
loader the size or address of the bitmaps. Either size or address is
enough given .relr.dyn is adjacent to .rela.dyn. You could also
dispense with a new tag by encoding the size in the ELF_R_SYM part of
the R_*_RELR r_info. (And if 2^24 or 2^32 words isn't large enough,
there is no reason you couldn't use another R_*_RELR reloc.)

A side benefit of driving everything from R_*_RELR is that you don't
need to worry about whether .relr.dyn is processed before .rela.dyn.
The order is implicit in the ordering of R_*_RELR amongst other
dynamic relocs.

--
Alan Modra
Australia Development Lab, IBM

Ali Bahrami

unread,
Dec 22, 2017, 12:27:44 AM12/22/17
to gener...@googlegroups.com
I have some concerns about this proposal.

The basic notions of symbols and relocations in ELF are pretty
simple, and I think that's part of why it's survived for 30 years,
and why it could go for another 30. They are also so core to ELF
that making new ones can get messy fast. I worry that adding overly
specific new relocation sections makes the format harder to learn, and
more brittle to future changes.

The follow on discussion around PT_DYNAMIC_2, and ABI notes, feels
like a further indication of things moving in the wrong direction.
We shouldn't need to fork PT_DYNAMIC, and I don't think core gABI
features should require interpreting notes.

Finally, I want to echo Rod's comments about performance. The file
size savings are nice, but in the larger scheme of things, not that
big a deal. Improved startup performance *would* be a big deal, but
only if it's perceptible. It's not really a win to take a big piece
out of something that's already too small to matter. So I do think that
hard measurements that show a substantial win across the board, for most
programs, ought to come first, and that discussions of standardizing it
in the gABI discussion should follow that.

Assuming that measurement does prove the case, the suggestion Alan
made for R_*_RELR seems preferable:

On 12/21/17 20:54, Alan Modra wrote:
> A somewhat less radical approach is to use R_*_RELR and put the
> encoded bitmaps in a new SHT_PROGBITS section placed immediately
> before .rela.dyn.

I like it as a smaller change that fits into the existing relocation sections,
rather than inventing new ones, and requiring new program headers, etc...
It's nice to add new things at the edges, rather than in the heart.
A single relocation type, and a few dynamic elements, seems more sustainable.

If we go that way, some thoughts:

- I'd like the bitmap sections to have a defined SHT_xxx type,
rather than being progbits. We know they are arrays of Words,
so why not label them, and let libelf xlate them?

- I think the sh_link of the bitmap section should give the index
of the relocation section it belongs to. It would be nice if the
relocation section could also point back, but sh_link/sh_info
are already taken.

- This opens the possibility more more than one such relocation, each
optimized to some particular use case. You asked if the proposed
bitmap design is the best one. Hard to know for sure, but here,
you have the ability to invent more similar relocations.

It's still complexity though. Every avenue to get these wins without extending
ELF ought to be considered first.




> Yes, SHT_RELR is just an optimization for storing the R_*_RELATIVE relocations.
> If this proposal is rejected, the relocations can continue to reside in the
> SHT_REL/SHT_RELA sections like they do today. However, from that point of view,
> SHT_REL is also just an optimization over SHT_RELA, so there is precedent for
> doing this in the generic-abi.

REL is smaller than RELA, but I don't think you can say that SHT_REL is an
optimization over SHT_RELA. If it were an optimization, you'd see both used
in objects for all platforms. That's not the case though. The platform ABI
for each platform (x86, x86_64, sparc, mips, ppc, pa-risc, etc...) declares the
relocation format (REL or RELA), and that single relocation format is used
consistently on that platform.

The only REL platform I know of is 32-bit x86. And the only case I know of
where a single object can have both REL and RELA is 32-bit x86 objects that
have GNU prelink information in them (those sections being GNU ABI features,
not gABI, or platform ABI).

I'm not really sure why REL was added, or why it didn't get much traction,
but if feels more like a mistake than an optimization.

- Ali

Andrew Grieve

unread,
Dec 22, 2017, 11:50:18 AM12/22/17
to gener...@googlegroups.com
On Fri, Dec 22, 2017 at 12:22 AM, Ali Bahrami <Ali.B...@oracle.com> wrote:
   I have some concerns about this proposal.

The basic notions of symbols and relocations in ELF are pretty
simple, and I think that's part of why it's survived for 30 years,
and why it could go for another 30. They are also so core to ELF
that making new ones can get messy fast. I worry that adding overly
specific new relocation sections makes the format harder to learn, and
more brittle to future changes.

The follow on discussion around PT_DYNAMIC_2, and ABI notes, feels
like a further indication of things moving in the wrong direction.
We shouldn't need to fork PT_DYNAMIC, and I don't think core gABI
features should require interpreting notes.

Finally, I want to echo Rod's comments about performance. The file
size savings are nice, but in the larger scheme of things, not that
big a deal. Improved startup performance *would* be a big deal, but
only if it's perceptible. It's not really a win to take a big piece
out of something that's already too small to matter. So I do think that
hard measurements that show a substantial win across the board, for most
programs, ought to come first, and that discussions of standardizing it
in the gABI discussion should follow that.

Thanks for the thoughtful response, and the help pushing this along! 

I am super motivated by the size savings presented here, so wanted to point out that disk space can be a very big deal. For example, there are very large number of low-end Android devices that are basically always out of disk space. These users would gladly take a start-up performance hit for some disk space savings. That said, I'd guess this new format has a greater chance of speeding up startup then slowing it down, due to the reduced IO overhead.


 


--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi+unsubscribe@googlegroups.com.
To post to this group, send email to gener...@googlegroups.com.
Visit this group at https://groups.google.com/group/generic-abi.
For more options, visit https://groups.google.com/d/optout.

Ali Bahrami

unread,
Dec 22, 2017, 12:58:35 PM12/22/17
to gener...@googlegroups.com
On 12/22/17 08:42, 'Andrew Grieve' via Generic System V Application Binary Interface wrote:
> I am super motivated by the size savings presented here, so wanted to point out that disk space can be a very big deal. For example, there are very large number of low-end Android devices that are basically always out of disk space. These users would gladly take a start-up performance hit for some disk space savings. That said, I'd guess this new format has a greater chance of speeding up startup then slowing it down, due to the reduced IO overhead.

I understand. Smaller is always better, all things being equal.
And I would expect it to go faster, though how much impact that
really has is the question.

If we adopt this, we'll live with it until ELF dies. And meanwhile,
the specs of low end devices will continue to double every year or
two, so the shoe won't pinch as hard tomorrow as it does today. Recall
that the machines of the 80's, when ELF was created, were far less
capable than the lowest end mobile device of 2017. So while I sympathize
with this example, I don't think that by itself, it's a good reason to
extend the gABI.

Perhaps low end Android devices really don't need to use PIE executables
for every object? Or perhaps some Android specific GNU ABI version of
this, which is easier to phase out later, would suffice?

I'd be happy if low end devices benefit, but the decision should be made
with a long term view about ELF.

Thanks.

- Ali

Andrew Grieve

unread,
Dec 22, 2017, 1:40:32 PM12/22/17
to gener...@googlegroups.com
On Fri, Dec 22, 2017 at 12:53 PM, Ali Bahrami <Ali.B...@oracle.com> wrote:
On 12/22/17 08:42, 'Andrew Grieve' via Generic System V Application Binary Interface wrote:
I am super motivated by the size savings presented here, so wanted to point out that disk space can be a very big deal. For example, there are very large number of low-end Android devices that are basically always out of disk space. These users would gladly take a start-up performance hit for some disk space savings. That said, I'd guess this new format has a greater chance of speeding up startup then slowing it down, due to the reduced IO overhead.

   I understand. Smaller is always better, all things being equal.
And I would expect it to go faster, though how much impact that
really has is the question.

If we adopt this, we'll live with it until ELF dies. And meanwhile,
the specs of low end devices will continue to double every year or
two, so the shoe won't pinch as hard tomorrow as it does today. Recall
that the machines of the 80's, when ELF was created, were far less
capable than the lowest end mobile device of 2017. So while I sympathize
with this example, I don't think that by itself, it's a good reason to
extend the gABI.

Forgive my ignorance, but what's the downside of living with this until ELF dies?

Specs for low end devices haven't followed this same curve. We've seen it be more common that low-end phones get cheaper rather than faster.

ELF in the 80s also didn't use PIE :P. Given that the trend is to have PIE everywhere (from what I understand, the main motivator is security), it seems highly beneficial to bring this 98-99% reduction.
 


Perhaps low end Android devices really don't need to use PIE executables
for every object? Or perhaps some Android specific GNU ABI version of
this, which is easier to phase out later, would suffice?

I'd be happy if low end devices benefit, but the decision should be made
with a long term view about ELF.

Thanks. 


- Ali

Ali Bahrami

unread,
Dec 22, 2017, 2:31:38 PM12/22/17
to gener...@googlegroups.com
On 12/22/17 11:40, 'Andrew Grieve' via Generic System V Application Binary Interface wrote:
> Forgive my ignorance, but what's the downside of living with this until ELF dies?
>
> Specs for low end devices haven't followed this same curve. We've seen it be more common that low-end phones get cheaper rather than faster.
>
> ELF in the 80s also didn't use PIE :P. Given that the trend is to have PIE everywhere (from what I understand, the main motivator is security), it seems highly beneficial to bring this 98-99% reduction.

Without hard measurement, we're speculating. I'll take the invitation
to do that, but I understand that it has yet to be proven one way or the other.

Speculating, the worst case would be that every ELF implementation becomes more
complicated than it needs to be, forever, and without seeing a commensurate benefit.
That's not just the one time cost of implementation, but also an ongoing cost in
debugging and maintenance, and in the time people spend reading the ELF spec
and trying to understand all the subtleties, Ultimately, each such thing makes
ELF harder to alter in other ways, and possibly hastens its ultimate demise.

The best case of course is that all the implementations become faster and
better, in general, and not just in specific extreme cases. Then, the costs
are worth it.

That's a solid point about PIE in the 80's. We all knew PIE was possible,
but it was more of a party trick than a real thing. Shared objects did have
these concerns, though there were far fewer of them, and they had less in
them. But the performance of libc has always been a concern, and concerns
about the overhead of dynamic linking are nothing new. I wanted to say that
ELF was born in limited circumstances, and isn't a particularly bloated format.

I don't mean to say that the low end Android issue isn't important.
And this technique might be a good solution. But if it's not generally
valuable to all ELF platforms, it can be applied in a more specific way,
as part of the GNU ELF ABI, or as part of an android specific platform ABI.

-----

98-99% reduction? Rahul claimed that 99% of the relocations were implicated,
but saw 5% in size reduction, with one case showing 20%

- Ali

Andrew Grieve

unread,
Dec 28, 2017, 4:46:54 PM12/28/17
to gener...@googlegroups.com
Ah, okay. I certainly do not have the expertise to know which level of the stack this is most appropriate for, and am certainly happy to defer to those that do. :)
 

-----

98-99% reduction? Rahul claimed that 99% of the relocations were implicated,
but saw 5% in size reduction, with one case showing 20%

Sorry for the ambiguity. I was referring to the size of the relocations. The number of bytes required to store relocations for PIEs is reduced 98-99%.

Rahul Chaudhry

unread,
Jan 12, 2018, 4:52:26 PM1/12/18
to gener...@googlegroups.com, Sriraman Tallam
This is an interesting idea, especially the part about the implicit order of
processing the R_*_RELATIVE relocations, and the fact that this opens up the
possibility of supporting multiple different R_*_RELRxy types with different
encodings.

It does, however, leads to the confusing situation where ~99% of the
relocations in a binary are outside of any SHT_REL/SHT_RELA section.

Take the case of the "chrome" PIE binary from earlier in the thread. It
contains 605,159 relocations, out of which 594,542 (98.25%) are of type
R_X86_64_RELATIVE.

With this scheme, the final ".rela.dyn" section in the binary will end up with
10,618 relocations. One of these will be a R_X86_64_RELR relocation, with info
about an undocumented section (as far as the ELF spec goes), containing the
remaining 594,542 relocations.

Any ELF relocations processing / dumping tool will need to know about handling
R_*_RELR relocations anyway, or it'll silently skip processing of 99% of the
relocations, resulting in more confusion.

It is certainly cleaner to define SHT_RELR and specify that it encodes relative
relocations, especially since a majority of position independent executables
and shared libraries will have >90% of their relocations in that section.

Rahul Chaudhry

unread,
Jan 12, 2018, 4:59:44 PM1/12/18
to gener...@googlegroups.com
Thank you for your thoughtful feedback. Your concerns are certainly valid, and
I agree with most of them.

On Thu, Dec 21, 2017 at 9:22 PM, Ali Bahrami <Ali.B...@oracle.com> wrote:
> I have some concerns about this proposal.
>
> The basic notions of symbols and relocations in ELF are pretty
> simple, and I think that's part of why it's survived for 30 years,
> and why it could go for another 30. They are also so core to ELF
> that making new ones can get messy fast. I worry that adding overly
> specific new relocation sections makes the format harder to learn, and
> more brittle to future changes.
>
> The follow on discussion around PT_DYNAMIC_2, and ABI notes, feels
> like a further indication of things moving in the wrong direction.
> We shouldn't need to fork PT_DYNAMIC, and I don't think core gABI
> features should require interpreting notes.
>
> Finally, I want to echo Rod's comments about performance. The file
> size savings are nice, but in the larger scheme of things, not that
> big a deal. Improved startup performance *would* be a big deal, but
> only if it's perceptible. It's not really a win to take a big piece
> out of something that's already too small to matter. So I do think that
> hard measurements that show a substantial win across the board, for most
> programs, ought to come first, and that discussions of standardizing it
> in the gABI discussion should follow that.

File size can be a big deal, and not just in scenarios involving low end
devices. Consider the other end of the spectrum: datacenters.

We at Google build several large server binaries. These binaries need to be
deployed across thousands of machines in multiple data centers. It's not a
unique situation, as I'd guess most other companies running their services in
data centers do the same.

One of our large server binaries is 741MB in size and is built as a position
independent executable. The size of ".rela.dyn" section in that binary is 57MB,
and contains 2,474,379 relocations, out of which all but 59 are of type
R_X86_64_RELATIVE. By using the technique and encoding described in this
proposal, we can store the same 2,474,320 relocations in ~400KB. This is a
straight up savings of 57MB, or 7.59% of original executable size.

One other end of the spectrum is popular Linux distributions like Debian and
Ubuntu. They have switched to building position independent executables
*despite* the 5-10% size bloat, for the additional security benefits of PIE
were deemed worth it. However, there is no fundamental reason for having to
choose between the additional security benefits of PIE and avoiding the file
size bloat. This proposal is trying to eliminate that artificial choice: an
executable built as PIE does not need to incur any of the 5-10% size penalty at
all. I'm sure the file size savings would be welcome in those communities, as
they too have to distribute these binaries to millions of computers across the
world. The savings add up.


> REL is smaller than RELA, but I don't think you can say that SHT_REL is an
> optimization over SHT_RELA. If it were an optimization, you'd see both used
> in objects for all platforms. That's not the case though. The platform ABI
> for each platform (x86, x86_64, sparc, mips, ppc, pa-risc, etc...) declares
> the
> relocation format (REL or RELA), and that single relocation format is used
> consistently on that platform.
>
> The only REL platform I know of is 32-bit x86. And the only case I know of
> where a single object can have both REL and RELA is 32-bit x86 objects that
> have GNU prelink information in them (those sections being GNU ABI features,
> not gABI, or platform ABI).
>
> I'm not really sure why REL was added, or why it didn't get much traction,
> but if feels more like a mistake than an optimization.

I've seen SHT_REL used on 32-bit x86 and 32-bit arm, and SHT_RELA used on
64-bit x86 and 64-bit arm, so I've always assumed that SHT_REL is used on
32-bit platforms and SHT_RELA on 64-bit platforms. Maybe my assumption is wrong
and there are counterexamples on each side.

My assumption that SHT_REL is an optimization over SHT_RELA is also purely
based on the fact that it was defined later (SHT_RELA == 4, SHT_REL == 9). I
don't know for sure what the motivation behind SHT_REL was. I can't think of
any possible reason other than the space savings over SHT_RELA.

Ali Bahrami

unread,
Jan 12, 2018, 5:23:38 PM1/12/18
to gener...@googlegroups.com
On 01/12/18 14:59, 'Rahul Chaudhry' via Generic System V Application Binary Interface wrote:
> I've seen SHT_REL used on 32-bit x86 and 32-bit arm, and SHT_RELA used on
> 64-bit x86 and 64-bit arm, so I've always assumed that SHT_REL is used on
> 32-bit platforms and SHT_RELA on 64-bit platforms. Maybe my assumption is wrong
> and there are counterexamples on each side.


It's defined by the platform ABI. For instance, the 32-bit
x86 ABI specifies it. I'm not familiar with the arm ABI, but
assume it's the same.

Sparc uses RELA for both, to name an example I'm familiar with.
I don't think it's a 32/64 thing. Just a choice that the ABI
authors get to make.

> My assumption that SHT_REL is an optimization over SHT_RELA is also purely
> based on the fact that it was defined later (SHT_RELA == 4, SHT_REL == 9). I
> don't know for sure what the motivation behind SHT_REL was. I can't think of
> any possible reason other than the space savings over SHT_RELA.

I don't really know, but had assumed that the addition of REL most likely
corresponded to the introduction of Unix on early x86 machines, which
came along a bit after the original ELF. It's quite possible that they
were trying to make smaller objects.

- Ali

Rahul Chaudhry

unread,
Jan 12, 2018, 5:48:15 PM1/12/18
to gener...@googlegroups.com
On Fri, Dec 22, 2017 at 9:53 AM, Ali Bahrami <Ali.B...@oracle.com> wrote:
>
> If we adopt this, we'll live with it until ELF dies. And meanwhile,
> the specs of low end devices will continue to double every year or
> two, so the shoe won't pinch as hard tomorrow as it does today. Recall
> that the machines of the 80's, when ELF was created, were far less
> capable than the lowest end mobile device of 2017. So while I sympathize
> with this example, I don't think that by itself, it's a good reason to
> extend the gABI.
>
> Perhaps low end Android devices really don't need to use PIE executables
> for every object? Or perhaps some Android specific GNU ABI version of
> this, which is easier to phase out later, would suffice?

A few clarifications:

1. Disabling one of the basic security features specifically on low end Android
devices is obviously not a good option.

2. As outlined in the previous email, the issue is not limited to low end
devices at all. The savings apply to other ends of the spectrum as well,
namely data centers and popular Linux distributions.

3. Even if this is done in the form of a specific GNU ABI extension, it would
be here to stay for the long term (as much as the decision to build PIE
binaries is here to stay). I don't see why once it's there, anyone would want
to phase it out and increase binary sizes across the board by 5-10%.

Rahul Chaudhry

unread,
Jan 12, 2018, 6:48:28 PM1/12/18
to gener...@googlegroups.com
On Fri, Dec 22, 2017 at 11:31 AM, Ali Bahrami <Ali.B...@oracle.com> wrote:
>
> I don't mean to say that the low end Android issue isn't important.
> And this technique might be a good solution. But if it's not generally
> valuable to all ELF platforms, it can be applied in a more specific way,
> as part of the GNU ELF ABI, or as part of an android specific platform ABI.

Indeed, if SHT_RELR is not acceptable for addition to the generic-abi, the next
best thing would be to take it to gnu-abi and see if it would be acceptable
there as an extension.

Here's a not too unlikely scenario if this gets implemented as a GNU specific
extension:

1. It would still make its way to most Linux distributions, Android devices,
and data centers (basically anybody and everybody who is building and
distributing PIE binaries at a large scale).

2. As a result, more than 90% of the relocations in binaries across all these
platforms will be stored in sections not defined in the ELF spec.

In a previous mail, you wrote:

> The basic notions of symbols and relocations in ELF are pretty
> simple, and I think that's part of why it's survived for 30 years,
> and why it could go for another 30. They are also so core to ELF
> that making new ones can get messy fast. I worry that adding overly
> specific new relocation sections makes the format harder to learn, and
> more brittle to future changes.

I would like to ask a simple question (apologies if it appears presumptuous):

What is inherently so good about the simple notion of relocations in ELF if
>90% of the relocations in binaries across multiple ends of the spectrum have
to be stored outside of the ELF specified sections for efficiency reasons?

If we end up in a not too unlikely scenario outlined above, isn't it actually
detrimental to the notion of relocations in ELF, and therefore detrimental to
the survivability to ELF for another 30 years?

Andrew Grieve

unread,
Jan 19, 2018, 3:04:22 PM1/19/18
to gener...@googlegroups.com
Conversation here seems a bit stalled, so wanted to try and bump this along.

Ali - are you swayed at all by Rahul's arguments?

Ali Bahrami

unread,
Jan 21, 2018, 12:47:47 AM1/21/18
to gener...@googlegroups.com
On 01/19/18 13:03, 'Andrew Grieve' via Generic System V Application Binary Interface wrote:
> Conversation here seems a bit stalled, so wanted to try and bump this along.
>
> Ali - are you swayed at all by Rahul's arguments?

I was surprised to not see more responses. I don't know if
that reflects our small numbers, or apathy, or whether everyone
else feels that all the necessary points of view have been presented.
In any event, I think we've heard all that we're going to, so it's
time to decide.

Keeping ELF simpler is more personally important to me than file size
reductions on the order of 10% (give or take). However, it's a reasonable
proposal, and we do understand that these file size reductions are important
in some environments. More importantly, we have an overriding interest in not
seeing ELF fragment more than necessary, and sometimes that means going along
for the ride.

So yes, if this is going to happen, we'd rather have it happen in the gABI.
And if it does, we'll likely add the support for our platform, and reap
the benefit of your work.

I've just reread the whole thread, and before we sign off on this,
I have some questions/requests, which I'll block off into separate
sections.

-----
ET_REL?

Early on, Cary said:

> Since these relocations will normally be found only in ET_EXEC and
> ET_DYN files, the typical name will be .relr.dyn. I think it's worth
> pointing out here that these are for dynamic linking.

Can we clarify in the spec that these sections are not found in ET_REL,
and therefore only created by the link-editor (ld)?

My understanding of this proposal is that the link-editor (ld) creates them,
as an optimization, by coalescing R_*_RELATIVE relocations it would otherwise
emit into the output object. As such, I don't expect to ever see them as
input to a link-edit.

SHT_RELR seems like an unnecessary complication for relocatable objects
(ET_REL), and having it be the sole province of ld is also nice, in that
implementations are free to adopt it on whatever schedule they see fit,
rather than be forced into doing it by compilers such as gcc, llvm,
<your compiler here>.

-----
Mysterious crashes!!

To catch the case where an object containing SHT_RELR is run on a system
that doesn't support them, the proposal suggests creating empty R_*_RELR
relocations that do nothing, while allow the old runtime linkers to
recognize an invalid relocation. That was followed by a suggestion of
having a new PT_DYNAMIC2 program header.

We'd rather not have either of these. On our platform, we provide backward
compatibility going back decades, where we define BC as "old objects
continue working on new systems". We don't support the reverse (what I
like to call time travel compatibility) where objects built on new systems
run on old systems, and indeed, time traveling objects do sometimes run
the risk of mysterious crashes (and other malfunctions). And yet, our
experience is that although we don't go to great effort to catch time traveling
objects, they very rarely cause problems for us. People seem to understand
that they can't do that.

My free advice (worth what you paid for it) is to roll out the support, and
then wait a bit before turning on the use widely, so that the support is in
place before it is needed, and to not complicate things with a way to
catch time travelers. The window of time where this can be a problem
is finite, and once you're past it, you'll be glad to have a simpler system.

If I haven't convinced you, and you are still determined to explicitly catch
the time travelers, then perhaps you could do that part outside of the gABI
(e.g. Florian's suggestion of an ABI note).

-----
Who goes first?

In an executable or shared object that contains a normal REL or RELA section,
plus a RELR section, which relocation section does the runtime linker process
first?

It's unlikely, but nothing prevents multiple relocations from modifying a given
location. In that case, the order in which these sections are processed could
alter outcomes. It might be best to nail down the ordering in the spec.

-----
Updated Spec

Rahul started this off with a nice proposal, helpfully presented as
diffs to the existing gABI. There were some alternative possibilities
listed, and the follow on discussion seems to have altered the original
proposal in some details.

Before we put this to bed, it would be really useful to see a final
proposal with all the changes incorporated.

Thanks...

- Ali

Suprateeka R Hegde

unread,
Jan 22, 2018, 9:08:02 AM1/22/18
to gener...@googlegroups.com, Ali Bahrami
On 21-Jan-2018 11:17 AM, Ali Bahrami wrote:
> My understanding of this proposal is that the link-editor (ld) creates
> them,
> as an optimization, by coalescing R_*_RELATIVE relocations it would
> otherwise
> emit into the output object. As such, I don't expect to ever see them as
> input to a link-edit.

Unless we are doing an incremental link. Unfortunately we (on HP-UX)
could not stop supporting this ugly incremental link as many important
users are still using it.

> Before we put this to bed, it would be really useful to see a final
> proposal with all the changes incorporated.

I wont be making this enhancement on HP-UX. Primarily because the
benefits seen on Itanium is quite less given the characteristics of IA64
coupled with our codegen and runtime. Secondarily because of some
temporary internal factors.

All I would request is to have a one liner saying that it is not
mandatory for a gABI compliant link-editor to provide this optimization.
However, I am OK if just the detection and/or error handling is
mandatory -- I shall add some basic detection, etc., in elfdump and
other related tools.

--
Supra

Ali Bahrami

unread,
Jan 22, 2018, 11:31:05 AM1/22/18
to hegdes...@gmail.com, gener...@googlegroups.com
On 01/22/18 07:08, Suprateeka R Hegde wrote:
> All I would request is to have a one liner saying that it is not
> mandatory for a gABI compliant link-editor to provide this optimization.


I don't think optimizations are ever required. It's inherent in
the nature of being an optimization that it's optional. So I don't
think you need an explicit sentence in the spec to justify your choice,
but I'm also fine with it.

We're not planning to implement this immediately either. We're
going to leave that door open though, and certainly might, in
due course.

- Ali

Andrew Grieve

unread,
Jan 22, 2018, 12:08:29 PM1/22/18
to gener...@googlegroups.com, Ali Bahrami
Thanks for your thoughtful response Ali.

In terms of getting one final proposal together - does anyone have opinions on the encoding itself?

I've spent a fair bit of time playing with it, and I think there's one extension that is worth considering:

* I'd like to consider adding a third type of word: "repeated bitfields":
  * Word starts with 1 -> 63-bit bitfield
  * Word starts with 01 -> 62-bit relative jump (which can handle odd addresses if you inflate it by kOffsetOfBitfield)
  * Word starts with 00 -> 14-bit repeat information + 48-bit bitfield
     * 14 bits are: 6 bits of "how many trailing bits from the previous word to use", 8 bits of "how many times to repeat those bits".

Here are the sizes from before with an added row for this encoding;

 1. Chrome browser (x86_64, built as PIE):
          109256 bytes using 8-bit jump/56-bit bitmap encoding
           98024 bytes using this 63-bit bitmap encoding
           60880 bytes with repeated bitfields


 2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http')
          43744 bytes using 8-bit jump/56-bit bitmap encoding
          43080 bytes using this 63-bit bitmap encoding
          29592 bytes using repeated bitfields


 3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
          1992 bytes using 8-bit jump/56-bit bitmap encoding
          1792 bytes using this 63-bit bitmap encoding
            896 bytes using repeated bitfields

Here's a link to what the bitfields look like for chrome x64:



Here's an example decoder written in go:

Although more complicated, this still passes the "simplicity bar" in my eyes. It's still just using bitfields, and the decoder is < 50 lines.

Thoughts?




Suprateeka R Hegde

unread,
Jan 23, 2018, 12:46:20 AM1/23/18
to gener...@googlegroups.com, Ali Bahrami
On 22-Jan-2018 10:38 PM, 'Andrew Grieve' via Generic System V
Application Binary Interface wrote:
> I've spent a fair bit of time playing with it, and I think there's one
> extension that is worth considering:

>   * Word starts with 00 -> 14-bit repeat information + 48-bit bitfield
>      * 14 bits are: 6 bits of "how many trailing bits from the previous
> word to use", 8 bits of "how many times to repeat those bits".

> Thoughts?

I have just one comment. Though all these reduce the size, computation
involved has increased. The increase may be insignificant for apps like
browsers, editors, etc. But in enterprise world, there are legacy
financial applications that comprise of hundreds of small short lived
executables. These executables perform some short service kind of work
and exit. For such enterprise environments, process startup time (reloc
processing etc) is very important. The startup time should not be more
than the actual program execution time.

(May be as simple as an option to turn off RELR (fno-relr) would help in
such cases)

--
Supra

Rahul Chaudhry

unread,
Feb 6, 2018, 7:28:00 PM2/6/18
to gener...@googlegroups.com
Sorry about the delay in posting this. Please find the updated proposal below.
I think this incorporates all the feedback from this thread. Please let me know
if I missed something. Main changes from the original proposal are:

1. Replaced the encoding in the original proposal with a simpler encoding
described earlier in this thread.

2. Added note that SHT_RELR sections are for dynamic linking, and may only
appear in object files of type ET_EXEC or ET_DYN. The corresponding sh_link
and sh_info values in the section header hold no special information.

3. Added note that relocations in .relr.dyn are processed before other
relocations in any .rel.dyn or .rela.dyn sections.

4. Removed the suggestion for how to handle cases where an object containing
SHT_RELR is run on a system that doesn't support it. gABI does not need to
specify a solution for this. Different platforms can handle this whichever
way they like outside of the gABI.

Thanks for your feedback,
Rahul


This is a proposal for defining a new section type "SHT_RELR" in the gABI. This
new section will be used to compactly encode R_*_RELATIVE relocations in shared
object files and position independent executables (PIE).

## Background

In position independent executables (PIE), R_*_RELATIVE relocations can account
for more than 99% of the entries in SHT_REL/SHT_RELA sections. This results in
more than 5% size bloat compared to non-PIE binaries. See this thread for more
details: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html

Recent releases of Debian, Ubuntu, and several other distributions build
executables as PIE by default. Suprateeka Hegde posted some statistics in the
above thread on the prevalence of relative relocations in executables residing
in /usr/bin: https://sourceware.org/ml/gnu-gabi/2017-q2/msg00013.html

This proposal is based on the 'experimental-relr' prototype from Cary Coutant
that is available at 'users/ccoutant/experimental-relr' branch in the binutils
repository, and was described in this post in the same thread on gnu-gabi:
https://sourceware.org/ml/gnu-gabi/2017-q2/msg00003.html


## Proposal

We propose defining a new section type and dynamic array tags in the
generic-abi for compactly encoding the R_*_RELATIVE relocations in a
special '.relr.dyn' section.

Defining a new section type:
Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-9: Section Types,sh_type

Name Value
-----------------
SHT_RELR 19

Description:

SHT_RELR: The section holds relative relocation entries without explicit
addends or info, such as type Elf32_Relr for the 32-bit class of
object files or type Elf64_Relr for the 64-bit class of object
files. SHT_RELR sections are for dynamic linking, and may only
appear in object files of type ET_EXEC or ET_DYN. An object file
may have multiple relocation sections. See ``Relocation'' below for
details.


Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-16: Special Sections

Name: .relr.dyn
Type: SHT_RELR
Attributes: SHF_ALLOC
This section holds relative relocation information for dynamic linking,
compactly encoded as described in ``Relocation''. The relocations in this
section are processed before other relocations in any SHT_REL or SHT_RELA
section.


Defining new dynamic array tags:
Chapter 5: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
Figure 5-10: Dynamic Array Tags, d_tag

Name Value d_un Executable Shared Object
---------------------------------------------------
DT_RELRSZ 35 d_val optional optional
DT_RELR 36 d_ptr optional optional
DT_RELRENT 37 d_val optional optional

Note: The use of d_un conforms to the odd-even rule in that section.

Description:

DT_RELR: This element is similar to DT_RELA, except its table has implicit
addends and info, such as Elf32_Relr for the 32-bit file class or
Elf64_Relr for the 64-bit file class. If this element is present,
the dynamic structure must also have DT_RELRSZ and DT_RELRENT
elements. During dynamic linking, a DT_RELR element is processed
before any DT_REL or DT_RELA elements in the same object file.

DT_RELRSZ: This element holds the total size, in bytes, of the DT_RELR
relocation table.

DT_RELRENT: This element holds the size, in bytes, of the DT_RELR relocation
entry.


This table will hold entries of type Elf32_Relr for the 32-bit class of object
files or type Elf64_Relr for the 64-bit class of object files:

Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.reloc.html
Figure 4-23: Relocation Entries

typedef Elf32_Word Elf32_Relr;
typedef Elf64_Xword Elf64_Relr;


Addends are stored implicitly in the location to be modified, similar to
Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
entries effectively store a sorted list of offsets only.

### Encoding

The encoded sequence of Elf64_Relr entries in a SHT_RELR section looks like
[ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]

i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63
relocations each, at subsequent offsets following the last address entry.

The bitmap entries must have 1 in the least-significant-bit. The assumption
here is that an address cannot have 1 in lsb. Odd addresses are not supported.

This encoding has a couple of interesting properties:

1. Looking at any entry, it is clear whether it's an address or a bitmap: even
means address, odd means bitmap.

2. Just a simple list of addresses is a valid encoding.

This encoding can represent up to 63 relative relocations in a single 64-bit
entry. The code below is the entirety of the implementation for decoding and
processing SHT_RELR sections in glibc dynamic loader.

#ifdef DO_RELR
{
ElfW(Addr) base = 0;
for (; relative < end; ++relative)
{
ElfW(Relr) entry = *relative;
if ((entry&1) == 0)
{
elf_machine_relr_relative (l_addr, (void *) (l_addr + entry));
base = entry + sizeof(ElfW(Addr));
continue;
}
ElfW(Addr) offset = base;
while (entry != 0)
{
entry >>= 1;
if ((entry&1) != 0)
elf_machine_relr_relative (l_addr, (void *) (l_addr + offset));
offset += sizeof(ElfW(Addr));
}
base += (8*sizeof(ElfW(Addr)) - 1) * sizeof(ElfW(Addr));
}
}
#endif

Elf32_Relr can encode up to 31 relative relocations in a single 32-bit entry
using the same encoding.


## Motivating examples

Here are three real world examples that demonstrate the savings:

1. Chrome browser (x86_64, built as PIE):
File size (stripped): 152265064 bytes (145.21MB)
605159 relocation entries (24 bytes each) in '.rela.dyn'
594542 are R_X86_64_RELATIVE relocations (98.25%)
14269008 bytes (13.61MB) in use in '.rela.dyn' section
98024 bytes (0.09MB) if moved to '.relr.dyn' section

Savings: 14170984 bytes (13.51MB), or 9.31% of original file size.


2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http')
File size (stripped): 10238168 bytes (9.76MB)
83810 relocation entries (24 bytes each) in '.rela.dyn'
83804 are R_X86_64_RELATIVE relocations (99.99%)
2011296 bytes (1.92MB) in use in .rela.dyn section
43080 bytes (0.04MB) if moved to .relr.dyn section

Savings: 1968216 bytes (1.88MB), or 19.22% of original file size.


3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
File size (stripped): 3030032 bytes (2.89MB)
6680 relocation entries (24 bytes each) in '.rela.dyn'
6272 are R_X86_64_RELATIVE relocations (93.89%)
150528 bytes (0.14MB) in use in .rela.dyn section
1792 bytes (0.00MB) if moved to .relr.dyn section

Savings: 148736 bytes (0.14MB), or 4.91% of original file size.


Recent releases of Debian, Ubuntu, and several other distributions build
executables as PIE by default. The third example shows that using SHT_RELR
sections to encode relative relocations can bring decent savings to executable
sizes in /usr/bin across many distributions.

Cary Coutant

unread,
Feb 6, 2018, 8:17:10 PM2/6/18
to gener...@googlegroups.com
LGTM.

I have one suggestion on wording, though. Where you describe the
relocation entries:

> Addends are stored implicitly in the location to be modified, similar to
> Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
> entries effectively store a sorted list of offsets only.

I'd suggest replacing it with a more explicit description of the
relocation operation, rather than relying on the reader's
understanding of a R_*_RELATIVE relocation that may or may not be
found in a psABI supplement. How about this:

Relative relocations are used to identify
virtual-address-sized storage units within the object whose
contents are independent of any dynamic binding, but must
still be relocated at load time to support position
independence. Before the program can begin execution, these
locations must be relocated by reading their contents and
adding a relocation factor which is computed as the
difference between the object's actual load-time virtual
address and its link-time virtual address. If the object is
loaded at the address for which it was linked, the relocation
factor is 0, and relative relocations may be ignored.

The relocation entries decode to a list of virtual addresses
that refer to storage units within the object. Each of these
storage units is 32 bits wide (in the case of ELFCLASS32
objects) or 64 bits wide (in the case of ELFCLASS64 objects).

-cary
> --
> You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.

Ali Bahrami

unread,
Feb 7, 2018, 12:48:41 PM2/7/18
to gener...@googlegroups.com
On 02/06/18 18:17, Cary Coutant wrote:
> LGTM.
>
> I have one suggestion on wording, though. Where you describe the
> relocation entries:


It looks good to me also. I think you've addressed all of the
issues I had. Thank you for the way you've managed this. It looks
like a useful feature.

In addition to Cary's comments about wording, which I agree with,
I have a further suggestion. The discussion of the bitmaps
stops with the discussion of how the least significant bit
distinguishes between address words and bitmap words. Unsaid,
other than in the code example, is a discussion of how the bitmap
is decoded to determine subsequent addresses relative to the
initial address word. Although the code example is very helpful
for our discussion here, the final gABI document needs an English
description instead of code, or perhaps as augmentation to it.
I'm probably wrong, but I'm not recalling the gABI using code to
define other features. There's usually a typedef, some english
text, and if needed, a diagram.

As a starting point, I'll offer up a rough draft:

Excluding the least significant bit in the bitmap, each non-zero bit
in the bitmap represents a relocation to be applied to a corresponding
machine word that follows the base address word. The second least
significant bit represents the machine word immediately following
the initial address, and each bit that follows represents the next
word, in linear order. As such, a single bitmap can encode up to
31 relocations in a 32-bit object, and 63 relocations in a 64-bit object.

I could imagine going crazy with some ascii art to draw the first address
and the N-1 words that follow, and list the bitmap bits next to them. I'm
not sure if that's necessary --- it depends on how opaque the above
description sounds to other readers.

Thanks again, and good luck.

- Ali

Rahul Chaudhry

unread,
Feb 7, 2018, 5:31:05 PM2/7/18
to gener...@googlegroups.com
Thank you Cary and Ali,

Here is the updated proposal with your suggestions incorporated. I've kept the
code for decoding and processing the section for reference purposes, but it is
under a separate 'Decoding' header and is not meant to be included in the gABI.

Thanks,
Relative relocations are used to identify virtual-address-sized storage units
within the object whose contents are independent of any dynamic binding, but
must still be relocated at load time to support position independence. Before
the program can begin execution, these locations must be relocated by reading
their contents and adding a relocation factor which is computed as the
difference between the object's actual load-time virtual address and its
link-time virtual address. If the object is loaded at the address for which it
was linked, the relocation factor is 0, and relative relocations may be
ignored.

The relocation entries decode to a list of virtual addresses that refer to
storage units within the object. Each of these storage units is 32 bits wide
(in the case of ELFCLASS32 objects) or 64 bits wide (in the case of ELFCLASS64
objects).

### Encoding

The encoded sequence of Elf64_Relr entries in a SHT_RELR section looks like
[ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]

i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63
relocations each, at subsequent offsets following the last address entry.

The bitmap entries must have 1 in the least significant bit. The assumption
here is that an address cannot have 1 in lsb. Odd addresses are not supported.

Excluding the least significant bit in the bitmap, each non-zero bit in the
bitmap represents a relocation to be applied to a corresponding machine word
that follows the base address word. The second least significant bit represents
the machine word immediately following the initial address, and each bit that
follows represents the next word, in linear order. As such, a single bitmap can
encode up to 31 relocations in a 32-bit object, and 63 relocations in a 64-bit
object.

This encoding has a couple of interesting properties:

1. Looking at any entry, it is clear whether it's an address or a bitmap: even
means address, odd means bitmap.

2. Just a simple list of addresses is a valid encoding.

### Decoding

The code below is the entirety of the implementation for decoding and
processing SHT_RELR sections in glibc dynamic loader.

#ifdef DO_RELR
{
ElfW(Addr) base = 0;
for (; relative < end; ++relative)
{
ElfW(Relr) entry = *relative;
if ((entry&1) == 0)
{
elf_machine_relr_relative (l_addr, (void *) (l_addr + entry));
base = entry + sizeof(ElfW(Addr));
continue;
}
ElfW(Addr) offset = base;
while (entry != 0)
{
entry >>= 1;
if ((entry&1) != 0)
elf_machine_relr_relative (l_addr, (void *) (l_addr + offset));
offset += sizeof(ElfW(Addr));
}
base += (8*sizeof(ElfW(Addr)) - 1) * sizeof(ElfW(Addr));
}
}
#endif


Ali Bahrami

unread,
Feb 12, 2018, 11:10:39 AM2/12/18
to gener...@googlegroups.com
It seems fine to me. I have a couple of wording suggestions,
which follow.

> Description:
>
> SHT_RELR: The section holds relative relocation entries without explicit
> addends or info, such as type Elf32_Relr for the 32-bit class of
> object files or type Elf64_Relr for the 64-bit class of object
> files.

The use of "such as" implies other possibilities, which as far as
I can see, isn't the case. Maybe:

The section holds an array of relocation entries, used to encode
relative relocations that do not require explicit addends or other
information. Array elements are of type Elf32_Relr for ELFCLASS32
objects, and Elf64_Relr for ELFCLASS64 objects


> DT_RELR: This element is similar to DT_RELA, except its table has implicit
> addends and info, such as Elf32_Relr for the 32-bit file class or
> Elf64_Relr for the 64-bit file class. If this element is present,
> the dynamic structure must also have DT_RELRSZ and DT_RELRENT
> elements. During dynamic linking, a DT_RELR element is processed
> before any DT_REL or DT_RELA elements in the same object file.
The first sentence says that it's similar to DT_RELA, which is
true, but it doesn't say anything about DT_REL, raising unnecessary
questions. And you've already covered the stuff about implicit addends
in the description of SHT_RELR. So maybe:

The address of an SHT_RELR relocation table. This element requires the
DT_RELRSZ and DT_RELRENT elements also be present. During dynamic linking,
a DT_RELR element is processed before any DT_REL or DT_RELA elements in
the same object file.

- Ali

Rahul Chaudhry

unread,
Feb 13, 2018, 4:38:55 PM2/13/18
to gener...@googlegroups.com
Thank you Ali for your feedback. I've incorporated both of your wording
suggestions into the proposal. The updated version is pasted below.

What are the next steps from here? How does a proposal go from this mailing
list to the latest System V ABI draft hosted at sco.com?
Description:

SHT_RELR: The section holds an array of relocation entries, used to encode
relative relocations that do not require explicit addends or other
information. Array elements are of type Elf32_Relr for ELFCLASS32
objects, and Elf64_Relr for ELFCLASS64 objects. SHT_RELR sections
are for dynamic linking, and may only appear in object files of
type ET_EXEC or ET_DYN. An object file may have multiple relocation
sections. See ``Relocation'' below for details.


Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
Figure 4-16: Special Sections

Name: .relr.dyn
Type: SHT_RELR
Attributes: SHF_ALLOC
This section holds relative relocation information for dynamic linking,
compactly encoded as described in ``Relocation''. The relocations in this
section are processed before other relocations in any SHT_REL or SHT_RELA
section.


Defining new dynamic array tags:
Chapter 5: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
Figure 5-10: Dynamic Array Tags, d_tag

Name Value d_un Executable Shared Object
---------------------------------------------------
DT_RELRSZ 35 d_val optional optional
DT_RELR 36 d_ptr optional optional
DT_RELRENT 37 d_val optional optional

Note: The use of d_un conforms to the odd-even rule in that section.

Description:

DT_RELR: The address of an SHT_RELR relocation table. This element requires
the DT_RELRSZ and DT_RELRENT elements also be present. During
dynamic linking, a DT_RELR element is processed before any DT_REL or
DT_RELA elements in the same object file.

Yunlian Jiang

unread,
Sep 4, 2018, 1:15:34 PM9/4/18
to Generic System V Application Binary Interface
Is there any update on this issue?
> SHT_RELR: The section holds relative relocation entries without explicit
> addends or info, such as type Elf32_Relr for the 32-bit class of
> object files or type Elf64_Relr for the 64-bit class of object
> files. An object file may have multiple relocation sections. See
> ``Relocation'' below for details.
>
>
> Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
> Figure 4-14: sh_link and sh_info Interpretation
>
> sh_type: SHT_RELR (same as SHT_REL and SHT_RELA)
> sh_link: The section header index of the associated symbol table.
> sh_info: The section header index of the section to which the relocation
> applies.
>
>
> Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.sheader.html
> Figure 4-16: Special Sections
>
> Name: .relr<name>
> Type: SHT_RELR
> Attributes: (same as .rel<name> and .rela<name>)
> These sections hold relocation information, as described in
> ``Relocation''. If the file has a loadable segment that includes
> relocation, the sections' attributes will include the SHF_ALLOC bit;
> otherwise, that bit will be off. Conventionally, name is supplied by the
> section to which the relocations apply. Thus a relocation section for
> .text normally would have the name .rel.text, .rela.text, or .relr.text.
>
>
> Defining new dynamic array tags:
> Chapter 5: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
> Figure 5-10: Dynamic Array Tags, d_tag
>
> Name Value d_un Executable Shared Object
> ---------------------------------------------------
> DT_RELRSZ 35 d_val optional optional
> DT_RELR 36 d_ptr optional optional
> DT_RELRENT 37 d_val optional optional
>
> Note: The use of d_un conforms to the odd-even rule in that section.
>
> Description:
>
> DT_RELR: This element is similar to DT_RELA, except its table has implicit
> addends and info, such as Elf32_Relr for the 32-bit file class or
> Elf64_Relr for the 64-bit file class. If this element is present,
> the dynamic structure must also have DT_RELRSZ and DT_RELRENT
> elements.
>
> DT_RELRSZ: This element holds the total size, in bytes, of the DT_RELR
> relocation table.
>
> DT_RELRENT: This element holds the size, in bytes, of the DT_RELR relocation
> entry.
>
>
> This table will hold entries of type Elf32_Relr for the 32-bit class of object
> files or type Elf64_Relr for the 64-bit class of object files:
>
> Chapter 4: http://www.sco.com/developers/gabi/latest/ch4.reloc.html
> Figure 4-23: Relocation Entries
>
> typedef Elf32_Word Elf32_Relr;
> typedef Elf64_Xword Elf64_Relr;
>
>
> Addends are stored implicitly in the location to be modified, similar to
> Elf32_Rel and Elf64_Rel. Relocation type is implicit (R_*_RELATIVE). The
> entries effectively store a sorted list of offsets only.
>
> The encoding used in these entries is a simple combination of delta-encoding
> and a bitmap of offsets. For the 64-bit entries, higher 8-bits contain delta
> since last offset, and lower 56-bits contain a bitmap for which words to apply
> the relocation to. This is best described by showing the code for decoding the
> section entries:
>
> #define ELF64_R_JUMP(val) ((val) >> 56)
> #define ELF64_R_BITS(val) ((val) & 0xffffffffffffff)
>
> #ifdef DO_RELR
> {
> ElfW(Addr) offset = 0;
> for (; relative < end; ++relative)
> {
> ElfW(Addr) jump = ELFW(R_JUMP) (*relative);
> ElfW(Addr) bits = ELFW(R_BITS) (*relative);
> offset += jump * sizeof(ElfW(Addr));
> if (jump == 0)
> {
> ++relative;
> offset = *relative;
> }
> ElfW(Addr) r_offset = offset;
> for (; bits != 0; bits >>= 1)
> {
> if ((bits&1) != 0)
> elf_machine_relr_relative (l_addr, (void *) (l_addr+r_offset));
> r_offset += sizeof(ElfW(Addr));
> }
> }
> }
> #endif
>
> Note that the 8-bit 'jump' encodes the number of ElfW(Addr) sized words since
> last offset. The case where jump would not fit in 8-bits is handled by setting
> jump to 0, and emitting the full offset for the next relocation in the
> subsequent entry.
>
> This encoding can represent up to 56 relative relocations in a single 64-bit
> entry. The above code is the entirety of the implementation for decoding and
> processing SHT_RELR sections in glibc dynamic loader.
>
> For 32-bit targets, we use 32-bit entries: 8-bits for 'jump' and 24-bits for
> the bitmap:
>
> #define ELF32_R_JUMP(val) ((val) >> 24)
> #define ELF32_R_BITS(val) ((val) & 0xffffff)
>
>
> ## Motivating examples
>
> Here are three real world examples that demonstrate the savings:
>
> 1. Chrome browser (x86_64, built as PIE):
> File size (stripped): 152265064 bytes (145.21MB)
> 605159 relocation entries (24 bytes each) in '.rela.dyn'
> 594542 are R_X86_64_RELATIVE relocations (98.25%)
> 14269008 bytes (13.61MB) in use in '.rela.dyn' section
> 109256 bytes (0.10MB) if moved to '.relr.dyn' section
>
> Savings: 14159752 bytes, or 9.29% of original file size.
>
>
> 2. Go net/http test binary (x86_64, 'go test -buildmode=pie -c net/http')
> File size (stripped): 10238168 bytes (9.76MB)
> 83810 relocation entries (24 bytes each) in '.rela.dyn'
> 83804 are R_X86_64_RELATIVE relocations (99.99%)
> 2011296 bytes (1.92MB) in use in .rela.dyn section
> 43744 bytes (0.04MB) if moved to .relr.dyn section
>
> Savings: 1967552 bytes, or 19.21% of original file size.
>
>
> 3. Vim binary in /usr/bin on my workstation (Ubuntu, x86_64)
> File size (stripped): 3030032 bytes (2.89MB)
> 6680 relocation entries (24 bytes each) in '.rela.dyn'
> 6272 are R_X86_64_RELATIVE relocations (93.89%)
> 150528 bytes (0.14MB) in use in .rela.dyn section
> 1992 bytes (0.00MB) if moved to .relr.dyn section
>
> Savings: 148536 bytes, or 4.90% of original file size.
>
>
> Recent releases of Debian, Ubuntu, and several other distributions build
> executables as PIE by default. The third example shows that using SHT_RELR
> sections to encode relative relocations can bring decent savings to executable
> sizes in /usr/bin across many distributions.
>
>
> ## Topics for discussion
>
> * Does this really belong in the generic-abi? The R_*_RELATIVE relocations can
> be specified in the SHT_REL/SHT_RELA sections. This is just an optimization.
>
> Our opinion (for seeding the discussion):
>
> Yes, SHT_RELR is just an optimization for storing the R_*_RELATIVE relocations.
> If this proposal is rejected, the relocations can continue to reside in the
> SHT_REL/SHT_RELA sections like they do today. However, from that point of view,
> SHT_REL is also just an optimization over SHT_RELA, so there is precedent for
> doing this in the generic-abi.
>
> Entries of type Elf32_Rel and Elf64_Rel store an implicit addend in the
> location to be modified, saving 33% memory per entry over Elf32_Rela or
> Elf64_Rela.
>
> SHT_RELR takes this same approach a few steps further: removing the r_info
> field (relocation type is implicit, and there is no need for symbol table
> index), and compactly encoding the remaining data in the entries, which is
> simply a sorted list of offsets.
>
>
> * Assuming there is consensus that adding SHT_RELR to generic-abi is a good
> idea, should the ABI really detail the encoding to be used or just define
> SHT_RELR/DT_RELR* and leave the encoding unspecified?
>
> Based on the specification of Elf32_Rel, Elf32_Rela, Elf64_Rel, and Elf64_Rela
> in the ABI, it probably makes sense to at least define the Elf32_Relr and
> Elf64_Relr types.
>
>
> * Assuming there is consensus that the ABI should specify the encoding used in
> SHT_RELR sections, is the encoding described here the best one possible?
>
> Better encodings may be possible. Discovering a more compact *and* simpler
> encoding would be a very pleasant surprise indeed. Ideas welcome.
>
> We would be happy to provide details of encoding schemes we've experimented
> with and how they compared.
>
> There are two minor optimizations of the above scheme that are worth mentioning
> here: (64-bit version described here, but they're applicable to 32-bit as well)
>
> 1. The jump field in an entry will always be >=56. Values 0-55 are not used,
> since the corresponding offsets would have been covered in the previous
> entry. We can bias jump by 55 (jump=0 is used as a special value) to raise
> the upper bound for deltas that can be encoded without requiring an extra
> word.
>
> 2. The least significant bit of the bitmap is always 1. There is an explicit
> relocation at every new value of offset. The bitmap can be defined to mean
> the *next* 56 words after the new offset, making the relocation at the new
> offset implicit. This enables each entry to encode 57 relocations instead
> of 56.
>
>
> * What happens when an older dynamic loader that does not recognize DT_RELR*
> tags meets a shared library or executable that contains a SHT_RELR section?
>
> Mysterious crashes!!
>
> Unknown DT_* tags are not fatal errors at program loading time. The older
> dynamic loader will simply skip the relocations in SHT_RELR section and start
> the executable. This is likely to result in mysterious crashes.
>
> Here's one proposal to handle the situation:
>
> While unknown DT_* tags are not fatal errors at program loading time, unknown
> relocations are.
>
> We propose adding a new R_*_RELR relocation on each architecture that adds
> support for SHT_RELR sections. Whenever the linker emits a SHT_RELR section in
> a shared object or executable, it must also include one entry of type R_*_RELR
> in the SHT_REL/SHT_RELA section of the binary. When the dynamic loader adds
> support on that architeccure for SHT_RELR sections, it must also add support
> for the corresponding R_*_RELR relocation (by simply ignoring it instead of
> aborting).
>
> This scheme would ensure that older dynamic loaders error out with a useful
> error message when trying to load newer binaries with SHT_RELR sections.

H.J. Lu

unread,
Oct 29, 2021, 9:13:14 AMOct 29
to Generic System V Application Binary Interface
On Tue, Sep 4, 2018 at 10:15 AM 'Yunlian Jiang' via Generic System V
Application Binary Interface <gener...@googlegroups.com> wrote:
>
Can we have an updated proposal without mysterious crashes? One option
is that the linker should bump EI_ABIVERSION when generating DT_RELR.


--
H.J.

Fangrui Song

unread,
Nov 1, 2021, 2:18:19 AMNov 1
to gener...@googlegroups.com
EI_ABIVERSION is dependent on EI_OSABI. Operating systems decide their EI_ABIVERSION.
Picking the value is probably out of the scope of this generic-abi proposal.

I like Ali's January 2018 reply the most:

My free advice (worth what you paid for it) is to roll out the support,
and then wait a bit before turning on the use widely, so that the
support is in place before it is needed, and to not complicate things
with a way to catch time travelers. The window of time where this can be
a problem is finite, and once you're past it, you'll be glad to have a
simpler system.

For ELFOSABI_GNU, if a time travel compatibility mechanism is really
(ever) needed, I'd probably go with _dl_have_relr (
https://sourceware.org/pipermail/binutils/2021-October/118347.html). I
don't mind if GNU ld implements this under -z relr=glibc, but I'll
probably just add the dumb alias but not synthesize the symbol for LLD.

I have a detailed write-up on https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#glibc
Reply all
Reply to author
Forward
0 new messages