Proposed gABI addition: SHF_COMPRESS

282 views
Skip to first unread message

Ali Bahrami

unread,
Jul 13, 2012, 11:16:01 PM7/13/12
to gener...@googlegroups.com
The GNU toolchain has recently adopted compression for debug
sections. At this time, such sections can only be reliably identified
via the shared .zdebug prefix in their section names, as they
are tagged with the PROGBITS section type and carry no identifying
section flag.

This works, but it would be more robust, more extensible, and generally
more in the ELF spirit to identify such sections with a section header
flag. The intent of such sections is more obvious, and it opens the
door to the compression of arbitrary non-allocable sections. The addition
of a section flag is compatible with the existing GNU implementation,
requiring gas to set the flag and ld/gold to recognize it, but not forcing
any other aspect of the existing implementation to change. In an ideal
world, the dependence on section name would go away in favor of the flag,
but that's not a requirement.

This could be done with a section flag in the vendor range, and in fact,
SHF_GNU_COMPRESS has been previously discussed. However, we are considering
adding compatible support for compressed sections to the Solaris link-editor
which means that there will soon be 2 independent implementations.
Compression is a basic concept, and so it seems appropriate to add it
to the gABI.

I would therefore like to propose adding SHF_COMPRESS to the gABI,
assigning it the value 0x800.

The relevant section of the gABI can be seen at

http://www.sco.com/developers/gabi/latest/ch4.sheader.html#sh_flags

The SHF_COMPRESS value needs to be added to Figure 4-11. Following is
proposed text for the definition that follows the table. I believe this
captures the existing GNU definition, and leaves the door open for the
later adoption of new compression algorithms.

SHF_COMPRESS

Identifies a section containing compressed data. SHF_COMPRESSED
applies only to non-allocable sections, and cannot be used in
conjunction with SHF_ALLOC. The section header for a compressed
section reflects the size of the compressed section. All relocations to
a compressed section specify offsets to the uncompressed section data.
It is therefore necessary to uncompress section data before relocations
can be applied. Each compressed section specifies the algorithm used
independently. It is permissible for different sections in a given ELF
object to employ different compression algorithms.

The first 4 bytes of a compressed section, found at section offsets
0-3, form a magic number, in big endian order. The magic number
specifies the compression algorithm used by the section. The format
and interpretation of the rest of the section is specific to each
algorithm. At this time, only ZLIB compression is recognized.

The magic number for ZLIB compression is 'ZLIB'. The length of the
uncompressed data is encoded as a 64-bit value in big endian order
comprising the 8 bytes found at offsets 4-11. The compressed data
bytes start at offset 12, and continue to the end of the section.

Michael Eager

unread,
Jul 15, 2012, 2:06:50 PM7/15/12
to gener...@googlegroups.com
It seems to me that this would cause programs to fail when this flag is
used with loaders or debuggers which do not understand the flag. In some
cases, this might be a silent failure, where garbled data is loaded.

Currently, programs which do not support compressed sections reading a ELF
file with .zdebug sections would ignore them, which might result in no debug
info being found. With the flag, these sections would be interpreted as if
they were not compressed, causing errors.

--
Michael Eager ea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306 650-325-8077


Ali Bahrami

unread,
Jul 15, 2012, 10:51:53 PM7/15/12
to gener...@googlegroups.com, Michael Eager
On 07/15/12 12:06 PM, Michael Eager wrote:
>
> It seems to me that this would cause programs to fail when this flag is
> used with loaders or debuggers which do not understand the flag. In some
> cases, this might be a silent failure, where garbled data is loaded.
>
> Currently, programs which do not support compressed sections reading a ELF
> file with .zdebug sections would ignore them, which might result in no debug
> info being found. With the flag, these sections would be interpreted as if
> they were not compressed, causing errors.
>

I think that could happen if your were to immediately stop
naming compressed sections with the .zdebug prefix and go back
to naming them .debug. However, I think that would be a mistake,
for the reasons you've highlighted, and is not what I'm suggesting.

I believe that the way forward is for gas to start setting the
flag, and for ld/gold to honor it, but to also retain the current
.zdebug section naming and the heuristics in the tools for recognizing
compressed sections by their names. I think if you did this, the
addition of the flag does not really alter they way the tools behave,
and that the transition will be unnoticed.

At some point down the road, when you feel that systems with the
old tools no longer matter, you could choose to drop the special
name and pattern matching and just use the flag. However, that clearly
can't happen immediately, and perhaps not for a long time. That's what
I meant by "ideal world" when I said

The addition
of a section flag is compatible with the existing GNU implementation,
requiring gas to set the flag and ld/gold to recognize it, but not forcing
any other aspect of the existing implementation to change. In an ideal
world, the dependence on section name would go away in favor of the flag,
but that's not a requirement.

Thanks...

- Ali

Ali Bahrami

unread,
Jul 27, 2012, 3:46:09 PM7/27/12
to gener...@googlegroups.com
I received one reply to this proposal, which I answered (I hope
satisfactorily). As it's been 2 weeks, I'd like to move to call
this done and passed. Any objections?

This brings me to the question of how one gets new material added to
the gABI document. I've seen similar discussion in recent threads
(e.g. https://groups.google.com/forum/?fromgroups#!topic/generic-abi/tLbPI5mm5iY).

Is the gABI document being maintained? If so, who do I contact?

Thanks...

- Ali

Cary Coutant

unread,
Jul 27, 2012, 5:26:23 PM7/27/12
to gener...@googlegroups.com, Dave Prosser
> This brings me to the question of how one gets new material added to
> the gABI document. I've seen similar discussion in recent threads
> (e.g.
> https://groups.google.com/forum/?fromgroups#!topic/generic-abi/tLbPI5mm5iY).
>
> Is the gABI document being maintained? If so, who do I contact?

Dave Prosser maintains the document. I haven't heard from him in a
couple of years, but, as far as I know, he's still on this list. Dave?

-cary

Joseph S. Myers

unread,
Jul 27, 2012, 5:31:31 PM7/27/12
to gener...@googlegroups.com
I think regi...@sco.com has gone to someone else (maintaining the
e_machine values at least) for a few years now.

--
Joseph S. Myers
jos...@codesourcery.com

Cary Coutant

unread,
Jul 27, 2012, 5:34:16 PM7/27/12
to gener...@googlegroups.com
Well, my cc to Dave bounced, so I guess he's not still at SCO. That
leaves me with no idea of who's maintaining the doc and the registry
of machine codes.

-cary

Ali Bahrami

unread,
Sep 19, 2012, 5:08:02 PM9/19/12
to gener...@googlegroups.com, Cary Coutant
Following up on this thread from last July...

It's unfortunate that the document on the SCO website is adrift.
I hope that the community will fork a copy and continue to maintain
it, as it's really important for ELF that we prevent the generic
ranges from fragmenting.

Although the current situation makes me a little nervous, I'm going to
press ahead, and add this new flag in the generic range to Solaris,
with the value 0x800. I'll add it to the <sys/elf.h> header file,
and will document it in the Solaris Linker and Libraries Guide.

I'm going to make one change, and name it SHF_COMPRESSED, rather than
SHF_COMPRESS. This reflects that the fact that we're labeling already
compressed data, rather than making a request that data be compressed
by the link-editor. So:

#define SHF_COMPRESSED 0x800

In the absence of a gABI document, it would be great if one of you
with GNU commit privileges would add this line to the GNU <elf.h>
header. That would stake out the assignment, and prevent someone
from accidentally assigning a different meaning to the bit.

Thanks for the help...

- Ali

Mats Wichmann

unread,
Sep 20, 2012, 10:21:58 AM9/20/12
to gener...@googlegroups.com
On 09/19/2012 03:08 PM, Ali Bahrami wrote:

>>>> Is the gABI document being maintained? If so, who do I contact?

> Following up on this thread from last July...
>
> It's unfortunate that the document on the SCO website is adrift.
> I hope that the community will fork a copy and continue to maintain
> it, as it's really important for ELF that we prevent the generic
> ranges from fragmenting.

We can't fork in the classic OSS sense, since the documents are under a
copyright with no license attached to grant such a right.

But I'm sure we can find someplace to host a "canonical list of
additions to the document", similar to the way LANANA operates (well
used to, as there seems to be less need now for reserved device numbers
for the Linux kernel).


By the way, has anyone explored where the outcome of the infamous legal
wrangling over UNIX copyrights might have implied this material ought
actually to have been copyright Novell?


Ali Bahrami

unread,
Sep 20, 2012, 11:51:51 AM9/20/12
to gener...@googlegroups.com, Mats Wichmann
I may have been wrong about this document being adrift. Joseph sent
me mail yesterday pointing me at the new maintainer, and I sent a
request message. We'll see what happens, and I'll report back here.

I hadn't considered the issue with copyright and forking. An
"overlay" document would be slightly less convenient, but would
certainly be preferable to having each ELF platform go their own
way with the generic space.

I'll also mention that Part IV of the Solaris Linker and Libraries Guide
(ELF Application Binary Interface) is based on the same original SVR4
document as the gABI, and that we try hard to keep it current and to
clearly separate the generic and Solaris-specific stuff. This clearly
won't do as a replacement for the gABI (for many reasons that we all
understand), but it is another resource for the ELF community.
The best way to find the latest copy is to google the title. Here's
the current one:

http://docs.oracle.com/cd/E23824_01/html/819-0690/glcfv.html#scrolltoc

I've wondered the same thing about the copyright assignments, but have
not heard that anyone has looked at it. That would make sense, but as
with all copyright issues, common sense may, or may not, be what
wins the day.

- Ali

H.J. Lu

unread,
Sep 20, 2012, 12:14:00 PM9/20/12
to gener...@googlegroups.com, Mats Wichmann
If the current gABI doc can't be updated for whatever reason, we
should create another generic ELF ABI document based on the
gABI contents in Solaris psABI if Oracle will contribute it to the
ELF community under appropriate copyright.

Thanks.

--
H.J.

Ali Bahrami

unread,
Sep 20, 2012, 3:57:05 PM9/20/12
to gener...@googlegroups.com, H.J. Lu, Mats Wichmann
On 09/20/12 10:14, H.J. Lu wrote:
> On Thu, Sep 20, 2012 at 8:51 AM, Ali Bahrami<Ali.B...@oracle.com> wrote:
...
>> I'll also mention that Part IV of the Solaris Linker and Libraries Guide
>> (ELF Application Binary Interface) is based on the same original SVR4
>> document as the gABI, and that we try hard to keep it current and to
>> clearly separate the generic and Solaris-specific stuff. This clearly
>> won't do as a replacement for the gABI (for many reasons that we all
>> understand), but it is another resource for the ELF community.
>> The best way to find the latest copy is to google the title. Here's
>> the current one:
>>
>> http://docs.oracle.com/cd/E23824_01/html/819-0690/glcfv.html#scrolltoc
>>
>> I've wondered the same thing about the copyright assignments, but have
>> not heard that anyone has looked at it. That would make sense, but as
>> with all copyright issues, common sense may, or may not, be what
>> wins the day.
>>
>
> If the current gABI doc can't be updated for whatever reason, we
> should create another generic ELF ABI document based on the
> gABI contents in Solaris psABI if Oracle will contribute it to the
> ELF community under appropriate copyright.
>
> Thanks.
>

I appreciate your willingness to consider that. Fortunately, the
rumors of the gABIs demise (which I unfortunately helped fan) are
false. I've had an email reply to my request at regi...@uxsglobal.com.
The lights are on, there's someone home, and the request is being evaluated.

Long Live the gABI! :-)

I'll follow up when there's news.

- Ali
Message has been deleted

Cary Coutant

unread,
Sep 25, 2012, 12:53:01 PM9/25/12
to gener...@googlegroups.com, regi...@uxsglobal.com
> 1. The compression header specifically states big endian for both items and the length is a 64-bit value. This seems contrary to the last two lines of the first paragraph of "Data Representation"
>
> "Object files therefore represent some control data with a machine-independent format, making it possible to identify object files and interpret their contents in a common way. Remaining data in an object file use the encoding of the target processor, regardless of the machine on which the file was created."
>
> The 1st 16 bytes of the ELF header (e_ident) is the machine-independent portion; it specifies file class (32-bit or 64-bit) and data encoding. All remaining data uses the target processor encoding.
>
> While LP64 is the current trend, we should not exclude ELFCLASS32 objects. A 64-bit length with other than zeroes in the upper 32 bits would be unmanageable for the 32-bit object file. My opinion is that the length should be either Elf32_off or Elf64_off based on the setting in e_ident[EI_CLASS].
>
> That is my initial reaction, but feel free to persuade my otherwise.

What you suggest sounds reasonable to me. I think that the original
was 64-bit big-endian purely for expedience.

Note that the 4-byte "magic string" identifying the compression scheme
isn't really big-endian -- it's just a byte sequence like the one at
the beginning of the ELF header.

> 2. Should this compression information be formalized into a compression header that:
>
> - allows room for additional information should it be needed at a later date
> - padded out to 16 bytes - allowing object tools to have the compressed data
> on a 16-byte alignment for better handling
>
> This, I presume would be incompatible with what is currently being done with the GNU toolchain??

It would be different, but we could use the legacy scheme for the
".zdebug_*" sections, and the standard scheme for SHF_COMPRESSED
sections. This would preclude any overlap where we set the
SHF_COMPRESSED flag for ".zdebug_*" sections, though. (Not that I
think that's a problem.)

I don't believe any additional padding is necessary, though. Other
compression schemes may specify additional information beyond the
ELF-required information, but I don't think we need to provide the
space for that. I'd propose that for ELFCLASS32, we use a 4-byte
compression scheme followed by a 4-byte uncompressed length, and for
ELFCLASS64, the same 4-byte compression scheme, 4 bytes of padding,
and an 8-byte uncompressed length.

> Has there been any input or reaction from the DWARF Standards Committee about the ELF compression of DWARF 2, 3, or 4?

We haven't discussed this specifically in the workgroup, although it's
been mentioned a couple of times. I think the position of the DWARF
workgroup would be that this is a feature of the container, and would
be outside the scope of the DWARF standard. Having the SHF_COMPRESSED
flag would actually be better in my mind because the section names
could then remain unchanged, and the section names are part of the
DWARF spec. I can bring this up at the next DWARF workgroup meeting
and let you know what the others on the committee have to say.

As an aside, I'm disappointed that compression of debug information is
so effective -- DWARF is supposed to be a compressed representation
already, and ideally a generalized compression algorithm would not
have much effect. The string tables are obviously highly compressible,
and we haven't done anything to address that, but the DIE structure is
more compressible than I think it ought to be. As a member of the
DWARF workgroup, I'd like to put more effort into improving the
natural compression of the DWARF format.

-cary

Ali Bahrami

unread,
Sep 25, 2012, 1:16:36 PM9/25/12
to gener...@googlegroups.com, regi...@uxsglobal.com
Hi John,

   I agree with the points you're making. The problem however is that GNU already
has a compressed section mechanism in place. Doing something different than what
they're already doing will entail a switching cost to them, which they may not be
willing to incur.

At the same time, much of our (Solaris) interest in these sections stems from
our compiler group's desire to support both Solaris and Linux, and so, we have
a motivation to not stray far from what GNU is doing, issues of purity notwithstanding.

When I studied the existing GNU format, three things caught my attention:

    1) The use of pattern matching on section names to identify potentially
        compressed data.

    2) The specification of big endian integers, ignoring the byte order
        specified by the ELF header.

    3) The use of 64-bit integers for the length, ignoring the ELFCLASS
        specified by the ELF header.

None of these are choices I would have made, as an ELF centric
developer, though one has to recognize that the GNU folks support
non-ELF formats, and undoubtedly had reasons for these choices.
In any case, it's done, so the question is, do we formalize the existing
practice, or try to invent something better that may not be widely
adopted?

Looking at my three issues:

    (1) We can do something about this, hence this proposal, which adds
         SHF_COMPRESSED to the otherwise unchanged GNU definition.
 
    (2) Unfortunate, but not the end of the world.

    (3) Dubious, but also not the end of the world. Since the mid-90's,
         32-bit C compilers have supported a "long long" 64-bit int type,
         implemented using 2 32-bit words. 32-bit platforms are not really
         locked out by this, though it's hard to see any benefit to them.

With that as background, let me answer your questions:


1. The compression header specifically states big endian for both items and the length is a 64-bit value.   This seems contrary to the last two lines of the first paragraph of "Data Representation"

"Object files therefore represent some control data with a machine-independent format, making it possible to identify object files and interpret their contents in a common way. Remaining data in an object file use the encoding of the target processor, regardless of the machine on which the file was created."

The 1st 16 bytes of the ELF header (e_ident) is the machine-independent portion; it specifies file class (32-bit or 64-bit) and data encoding. All remaining data uses the target processor encoding.

While LP64 is the current trend, we should not exclude ELFCLASS32 objects.  A 64-bit length with other than zeroes in the upper 32 bits would be unmanageable for the 32-bit object file.  My opinion is that the length should be either Elf32_off or Elf64_off based on the setting in e_ident[EI_CLASS].

That is my initial reaction, but feel free to persuade my otherwise.


Covered above --- I agree, but we have a defacto standard already that does it
this way, and creating a different format has real world costs.

 

2. Should this compression information be formalized into a compression header that:

  - allows room for additional information should it be needed at a later date
  - padded out to 16 bytes - allowing object tools to have the compressed data
    on a 16-byte alignment for better handling


I think the existing design allows for adding new compression forms, which can
add whatever additional information they need. The one format currently specified
(ZLIB) doesn't need anything more than what it has.

I don't think there's a big advantage to aligning compressed bytes on a
16 byte boundary, at least not for ZLIB. Other formats can make their own
rules, as they can with their header.

 

This, I presume would be incompatible with what is currently being done with the GNU toolchain??

That is the crux of the issue.

 

Has there been any input or reaction from the DWARF Standards Committee about the ELF compression of DWARF 2, 3, or 4?  



Not that I know of. Although this compression is currently applied to DWARF,
it's not specific to DWARF, and could potentially be applied to any section type.
There are no DWARF-specific dependencies.

- Ali

Ali Bahrami

unread,
Sep 25, 2012, 1:35:53 PM9/25/12
to gener...@googlegroups.com, regi...@uxsglobal.com
On Tuesday, September 25, 2012 10:53:01 AM UTC-6, Cary wrote:

What you suggest sounds reasonable to me. I think that the original
was 64-bit big-endian purely for expedience.

Note that the 4-byte "magic string" identifying the compression scheme
isn't really big-endian -- it's just a byte sequence like the one at
the beginning of the ELF header.


My reply to John was posted before seeing this. I'd be happy to adopt
these changes if that is acceptable to the GNU developers. I was trying
to come up with the least disruptive proposal that would meet my
needs, but I prefer this.

So, any solution along these lines will be great...

I don't believe any additional padding is necessary, though. Other
compression schemes may specify additional information beyond the
ELF-required information, but I don't think we need to provide the
space for that. I'd propose  that for ELFCLASS32, we use a 4-byte
compression scheme followed by a 4-byte uncompressed length, and for
ELFCLASS64, the same 4-byte compression scheme, 4 bytes of padding,
and an 8-byte uncompressed length.

Sounds reasonable. Does this imply a minimum value of 4/8 for the sh_addralign
field of the section header, with larger values reflecting the requirements of
the uncompressed data?

Thanks...

- Ali

John Wolfe

unread,
Sep 25, 2012, 2:25:00 PM9/25/12
to gener...@googlegroups.com, regi...@uxsglobal.com
I am trying to participate in the discussions. Have applied for membership in the group from a new gmail account last Saturday. Applied for membership in the group from by company e-mail account on Monday.

To day when replying to Cary and Ali on an e-mail thread to registry_at_sco_dot_com, my responses are being bounced from the generic-abi forum group.

Can someone assist?

-- John Wolfe (UnXis, Inc.)

H.J. Lu

unread,
Sep 25, 2012, 2:42:06 PM9/25/12
to gener...@googlegroups.com, regi...@uxsglobal.com
Hi John,

For some reason, I didn't get any email for requesting to join the group.
You should be OK now. Please me know if you still have problems.
Sorry for that.

--
H.J.

John Wolfe

unread,
Sep 25, 2012, 4:00:27 PM9/25/12
to gener...@googlegroups.com
My original response to Cary and Ali was bounced.   Resending to the generic-abi group for completeness.

-- John Wolfe 

Cary,

What you have proposed for the ELFCLASS64 is essentially a 16 byte header.    The 4 bytes of padding to maintain alignment could be label as reserved for "future" additions.   If the ELFCLASS32 compression header (using the term loosely here) had the same reserved bytes (4 to 7) and 4 bytes padding that also would be a 16 byte header.  

As noted and I agree, 16-byte alignment is note critical for ZLIB compression, but who is to say what future uses such compression flag may be put to use and on what architectures.

As far as the legacy .zdebug scheme.   Do you think that it would be acceptable have:

  •  .zdebug (et all) without an SHF_COMPRESSED flag indicate the legacy compression methodology 
  • any ELF section with an SHF_COMPRESSED flag use the gABI specified format/header.  The would include .zdebug sections as GNU starts to implement the flag.
That doesn't help Ali out in coming up with a single flag to test for a compression.   If he proposes to support .zdebug from the GNU Toolchain, absence of the SHF_COMPRESSED would mean the legacy GNU header.   As the link editor combines .zdebug section or .debug sections, it would have to read each section and write a new section which could be converted to the ELF gABI spec.

GNU would have to do something similar.  Your aligned ELFCLASS64 suggestion already breaks GNU's current practice.   As GNU implemented use of an SHF_COMPRESSED flag, they too would need to deal with legacy compression that existed in previously built object files.

My take on the compression flag is also that it is an attribute of the container.  I just wanted to know that the DWARF workgroup was not adverse to any such mechanism.

As Ali noted in another response, the uncompressed length in any header would dictate either a 4 or 8 byte alignment on the section.

-- John


On 9/25/2012 12:53 PM, Cary Coutant wrote:
1. The compression header specifically states big endian for both items and the length is a 64-bit value.   This seems contrary to the last two lines of the first paragraph of "Data Representation"

"Object files therefore represent some control data with a machine-independent format, making it possible to identify object files and interpret their contents in a common way. Remaining data in an object file use the encoding of the target processor, regardless of the machine on which the file was created."

The 1st 16 bytes of the ELF header (e_ident) is the machine-independent portion; it specifies file class (32-bit or 64-bit) and data encoding. All remaining data uses the target processor encoding.

While LP64 is the current trend, we should not exclude ELFCLASS32 objects..  A 64-bit length with other than zeroes in the upper 32 bits would be unmanageable for the 32-bit object file.  My opinion is that the length should be either Elf32_off or Elf64_off based on the setting in e_ident[EI_CLASS]..

That is my initial reaction, but feel free to persuade my otherwise.

Ali Bahrami

unread,
Mar 7, 2013, 12:18:13 PM3/7/13
to gener...@googlegroups.com, John Wolfe, regi...@uxsglobal.com
It took awhile, delayed by a hurricane, holidays, and work, but
I've been working with John to put together a revised proposal,
which is now ready for wider consideration. Although it will be helpful
to review the earlier messages in this thread for context, please
read this as a fresh start.

This version retains the basic structure of the existing GNU-style
section compression, in that there is a compression header followed
by compressed data bytes. The differences are in how compressed
sections are identified (section flag rather than name), and in
the layout/contents of the header.

I've implemented a prototype of this scheme, as well as support
for the existing GNU scheme, in the Solaris ld, and I found that
the amount of common code was on the order of 90%. Based on my
experience, I think you'll find it relatively simple to support
both, and that there's no ambiguity between them. It should be
possible maintain support for both formats indefinitely without
great cost.

The proposal follows. I'll send a second message shortly
with some simple "hello world" illustrations.

Thanks.

- Ali

---------------------------------------------------------------
I would like to propose adding SHF_COMPRESSED to the gABI,
assigning it the value 0x800.

The relevant section of the gABI can be seen at

http://www.sco.com/developers/gabi/latest/ch4.sheader.html#sh_flags

The SHF_COMPRESSED value needs to be added to Figure 4-11. Following is
proposed text for the definition that follows the table. This proposal
is based on the current GNU implementation, but differs in the following
ways:

- There is no dependence on section name matching to identify
compressed sections. As such, SHF_COMPRESSED may be applied to
arbitrary sections.

- The compression type is specified as an enumerated integer
type that obeys ELFDATA and ELFCLASS, rather than as a 4-byte
ELF header style ident array.

- The uncompressed size is represented as an ELFDATA/ELFCLASS compliant
integer, rather than as a 64-bit big endian integer.

- sh_addralign refers to the compressed data. Section alignment for the
uncompressed data is explicitly specified separately.

As with the existing GNU definition, this is an extensible design that
allows for additional compression algorithms to be added in the future.

-----

SHF_COMPRESSED

Identifies a section containing compressed data. SHF_COMPRESSED
applies only to non-allocable sections, and cannot be used in
conjunction with SHF_ALLOC. In addition, SHF_COMPRESSED cannot be
applied to sections of type SHT_NOBITS.

All relocations to a compressed section specify offsets to the
uncompressed section data. It is therefore necessary to decompress
section data before relocations can be applied. Each compressed
section specifies the algorithm independently. It is permissible
for different sections in a given ELF object to employ different
compression algorithms.

Compressed sections start with a compression header structure that
identifies the compression algorithm.

typedef struct {
Elf32_Word ch_type;
Elf32_Word ch_size;
Elf32_Word ch_addralign;
} Elf32_Chdr;

typedef struct {
Elf64_Word ch_type;
Elf64_Word ch_reserved;
Elf64_Xword ch_size;
Elf64_Xword ch_addralign;
} Elf64_Chdr;

ch_type
Specifies the compression algorithm. Supported algorithms
and their descriptions are listed in table XXX-YYY.

ch_size
The size in bytes of the uncompressed data. See sh_size.

ch_addralign
Required alignment for the uncompressed data. See sh_addralign.


The sh_size and sh_addralign fields of the section header for a
compressed section reflect the requirements of the compressed section.
The ch_size and ch_addralign fields of the compression header provide
the corresponding values for the uncompressed data, thereby supplying
the values that sh_size and sh_addralign would have had if the section
had not been compressed.

The layout and interpretation of the data that follows the compression
header is specific to each algorithm, and is defined below for each
value of ch_type. This area may contain algorithm specific parameters
and alignment padding in addition to compressed data bytes.

A compression header's ch_type member specifies the compression algorithm
employed, as shown in the following table.

Table XXX-YYY ELF Compression Types, ch_type
---------------------------------------------------
Name Value
---------------------------------------------------
ELFCOMPRESS_ZLIB 1
ELFCOMPRESS_LOOS 0x60000000
ELFCOMPRESS_HIOS 0x6fffffff
ELFCOMPRESS_LOPROC 0x70000000
ELFCOMPRESS_HIPROC 0x7fffffff
---------------------------------------------------

ELFCOMPRESS_ZLIB
The section data is compressed with the ZLIB compression algorithm.
The compressed ZLIB data bytes begin with the byte immediately
following the compression header, and extend to the end of the
section. Additional documentation for ZLIB may be found at
http://zlib.net/.

ELFCOMPRESS_LOOS - ELFCOMPRESS_HIOS
Values in this inclusive range are reserved for operating
system-specific semantics.

ELFCOMPRESS_LOPROC - ELFCOMPRESS_HIPROC
Values in this inclusive range are reserved for
processor-specific semantics.

Ali Bahrami

unread,
Mar 7, 2013, 12:25:41 PM3/7/13
to gener...@googlegroups.com, John Wolfe, regi...@uxsglobal.com
On 03/ 7/13 10:18 AM, Ali Bahrami wrote:
>
> The proposal follows. I'll send a second message shortly
> with some simple "hello world" illustrations.


The following uses "hello world" to show the proposed SHF_COMPRESSED
section compression in action, along with the existing GNU style. I'm
using Solaris tools here, but ELF is ELF, and you can easily translate
this to the equivalent GNU tools.

% cat hello.c
#include <stdio.h>

int
main(int argc, char **argv)
{
(void) printf("hello\n");
}

Build with the proposed compression, and display the resulting
debug section headers:

% cc -g hello.c -z compress-debug-sections
% elfdump -c a.out
...
Section Header[24]: sh_name: .debug_info
sh_addr: 0 sh_flags: [ SHF_COMPRESSED ]
sh_size: 0x150 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x12d8 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x4
ch_size: 0x19f ch_type: [ ELFCOMPRESS_ZLIB ]
ch_addralign: 0x1

Section Header[25]: sh_name: .debug_line
sh_addr: 0 sh_flags: 0
sh_size: 0x58 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x1428 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1

Section Header[26]: sh_name: .debug_abbrev
sh_addr: 0 sh_flags: [ SHF_COMPRESSED ]
sh_size: 0x79 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x1480 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x4
ch_size: 0x7c ch_type: [ ELFCOMPRESS_ZLIB ]
ch_addralign: 0x1

Section Header[27]: sh_name: .debug_pubnames
sh_addr: 0 sh_flags: 0
sh_size: 0x1b sh_type: [ SHT_PROGBITS ]
sh_offset: 0x14f9 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1
...

Things to note:

- SHF_COMPRESSED is set. Section names are not changed to
start with .zdebug.

- In a compressed section, sh_size and sh_addrsize reflect the
requirements of the compressed data, including that of the
compression header. The values for the uncompressed data are
shifted to the ch_size and ch_addrsize fields, respectively.

- Section compression is specified on a per-section basis. I've taken
advantage of that fact to only apply compression when the result would
be smaller than the original. In this toy example, only 2 sections
met that standard.

- As these are DWARF sections, their names start with .debug. However,
they could have been named anything, and the decision to compress them
was based on their type (SHF_PROGBITS) and the fact that they are
not allocable (SHF_ALLOC not set).

Now, build with the GNU-style, and display the resulting headers:

% cc -g hello.c -z compress-debug-sections=zlib-gnu
% elfdump -c a.out
...
Section Header[24]: sh_name: .zdebug_info
sh_addr: 0 sh_flags: 0
sh_size: 0x159 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x12d8 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1
ch_size: 0x1a8 ch_type: ZLIB (GNU format)

Section Header[25]: sh_name: .debug_line
sh_addr: 0 sh_flags: 0
sh_size: 0x58 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x1431 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1

Section Header[26]: sh_name: .zdebug_abbrev
sh_addr: 0 sh_flags: 0
sh_size: 0x79 sh_type: [ SHT_PROGBITS ]
sh_offset: 0x1489 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1
ch_size: 0x7c ch_type: ZLIB (GNU format)

Section Header[27]: sh_name: .debug_pubnames
sh_addr: 0 sh_flags: 0
sh_size: 0x1b sh_type: [ SHT_PROGBITS ]
sh_offset: 0x1502 sh_entsize: 0
sh_link: 0 sh_info: 0
sh_addralign: 0x1
...

When using this form of compression, SHF_COMPRESSED is not set,
and compression is detected via the .zdebug section name combined
with the presence of ['Z', 'L', 'I', 'B'] at the head of the data.

- Ali

John Wolfe

unread,
Mar 7, 2013, 7:53:01 PM3/7/13
to gener...@googlegroups.com, regi...@uxsglobal.com
I would like to acknowledge Ali's great work to arrive at this proposal and to prototype an implementation of ELFCOMPRESS_ZLIB in the Solaris tools.   I would also like to thank him for his patience working with me and around the outages inflicted by the hurricane.

There was a lot of back and forth to make reasonably certain that this proposal will:
  • work immediately with a ZLIB compression implementation.
  • conform to existing gABI specifications.  
  • provide the structure and flexibility to work with future section compression algorithms that might come along no matter what the architecture, OS or even the compression algorithm alignment requirements might be.

-- John

Reply all
Reply to author
Forward
0 new messages