Hi Mark,
I didn't hear anything back from my earlier comments from
Wednesday, but having looked at this proposal in a bit more detail,
I think I wasn't too far off in my assumptions. I'm now ready to have
a deeper conversation about this.
I find that both of our proposals are similar. I regret to say that I
don't love either proposal very much. I don't regret the effort, as I
think we needed to go through the exercise of defining them in order to
explore the issues. I also think that the problems are all about syntax,
and so, the code we've written will not be wasted.
The problem with both is that they wrestle somewhat unsuccessfully with
how to integrate support for the original GNU format. We believe that the
gABI format will eclipse the older one, and that the old one will fade away.
However, our proposed APIs cement the support for that old stuff into
core libelf routines in a way that will persist. In the best case, we'll
end up with options that no one uses.
It's been months, so let's start with a quick review. I proposed this:
typedef enum {
ELF_ZSCN_T_NONE = 0, /* No compression applied. must be first */
ELF_ZSCN_T_ZLIB_GNU, /* ZLIB (GNU-style, deprecated) */
ELF_ZSCN_T_ZLIB, /* ZLIB (ELF gABI) */
ELF_ZSCN_T_NUM /* must be last */
} Elf_ZScn_Type;
int elf_zscn_identify(Elf_Scn *scn,
const char *scn_name);
int elf_zscn_compressible(Elf_Scn *scn,
Elf_ZScn_Type ztype,
const char *scn_name);
int elf_zscn_compress(Elf_Scn *scn,
Elf_ZScn_Type ztype,
const char *scn_name);
int elf_zscn_decompress(Elf_Scn *scn,
const char *scn_name);
Note that these are ELFCLASS neutral, and don't require 32/64/gelf variants.
You are countering with this:
typedef enum {
ELF_ZSCN_T_NONE, /* No compression. */
ELF_ZSCN_T_GNU, /* Old GNU style (Always uses ZLIB, no real Chdr) */
ELF_ZSCN_T_ELF, /* ELF style (with Chdr giving compression type) */
ELF_ZSCN_T_NUM /* Number of different compression types. */
} Elf_ZScn_Type;
Elf32_Chdr *elf32_getchdr (Elf_Scn *scn, int *type);
Elf64_Chdr *elf64_getchdr (Elf_Scn *scn, int *type);
GElf_Chdr *gelf_getchdr (Elf_Scn *scn, GElf_Chdr *dest,
int *type);
Elf32_Shdr *elf32_zscn_compress (Elf_Scn *scn, int type, int ch_type);
Elf64_Shdr *elf64_zscn_compress (Elf_Scn *scn, int type, int ch_type);
GElf_Shdr * gelf_zscn_compress (Elf_Scn *scn, GElf_Shdr *dest, int type,
int ch_type);
You also mentioned some internal implementation details (ELF_F_DIRTY,
elf_strptr) that we have already discussed and agreed upon. To me, it
looks like we are in conceptual agreement, but are still dancing around
issues of syntax, and again, largely trying to decide what to do with
the old GNU format.
I'd summarize your changes, along with my initial reactions, as follows:
(1) Combine compress and decompress into 1 routine
Assuming this can work out cleanly, I think it's OK.
(2) Return a section header from compress operations
This might be a convenience, but it requires ELFCLASS-specific
routines, which I think isn't worth the other convenience.
(3) Specify compression as 2 values: (1) Style (GNU or gABI) and
algorithm (none, zlib, etc...).
A couple of things:
a) It suggests that additional styles, in addition to
ELF_ZSCN_T_GNU and ELF_ZSCN_T_ELF, might be possible.
I want to discourage that.
b) 2 arguments to specify a type seems like 1 too many.
This seems like the first issue we need to focus on, as it
ripples through a lot of things. See below.
(4) Eliminate the compressible query routine.
You've made the point before that libelf should not be in the
position of policing what gets compressed, and what cannot. I
agree with that principle. However, I am not content to force
individual developers to make these decisions, because I think
that will result in unnecessary confusion and chaos. Most
developers just want to do the standard correct thing, and would
value some guidance. If we don't provide it here, who will?
I think the original GNU format support is really at the root
of this concern as well. We wouldn't need to provide guidance
if things were simpler.
(5) Express the identify operation differently, with routines,
elfXX_getchdr(), that return the compression header, and give
the style (GNU vs gABI).
Returning the Chdr is a convenience, but note that in a gABI
compressed section, it's at the top of the data, and easily
accessible. There are 2 things about this routine that I would
rather avoid:
(a) the need to allocate and manage the memory for
a fake Chdr struct in the GNU format case.
(b) The need for 3 ELFCLASS-specific variants.
At a deeper level, this is really a symptom of the same problem
we face in (3).
It's clear that the original GNU format is the main complication
that is causing otherwise simple APIs to mutate unpleasantly, and the
need for Elf_ZScn_Type lies at the heart of this. In my proposal, we're
mapping the ELFCOMPRESS types, with an extra GNU type, into yet another
"unified" type. That's an unwelcome indirection.
In your proposal, you have avoided the ugliness I accepted, but have
accepted a different one by requiring 2 types to be specified (GNU vs gABI,
and the type of compression algorithm).
Maybe we're trying too hard to unify this stuff. Let's pause, and consider
what an API that supports only the gABI form of compression might be. We
might start with a stripped down version of my original proposal, which
no longer needs section names, nor an invented new Elf_ZScn_Type:
int elf_zscn_identify(Elf_Scn *scn);
int elf_zscn_compressible(Elf_Scn *scn, int type);
int elf_zscn_compress(Elf_Scn *scn, int type);
int elf_zscn_decompress(Elf_Scn *scn);
The type argument in both cases is now simply one of the gABI ELFCOMPRESS_*
constants. At this time, that means ELFCOMPRESS_ZLIB, since that's the only
one we've defined:
/* Legal values for ch_type (compression algorithm). */
#define ELFCOMPRESS_ZLIB 1 /* ZLIB/DEFLATE algorithm. */
Note that while I didn't define an ELFCOMPRESS_NONE code when I proposed
the gABI format, I did reserve the value of 0 for that meaning. We could
amend the gABI to formalize that:
#define ELFCOMPRESS_NONE 0 /* Uncompressed data */
#define ELFCOMPRESS_ZLIB 1 /* ZLIB/DEFLATE algorithm */
The reason I didn't define NONE in the first place is that I didn't want
to see objects creating SHF_COMPRESSED sections with a Chdr, and then
find ELFCOMPRESS_NONE being specified the specified type. Let's assume
however, that no one would be dumb enough to do that. Then, we could take
your idea and have just one routine for compress and decompress.
In this simpler scenario, my concerns about identifying the compression
type go away: It's just looking for SHF_COMPRESSED, followed by looking
at the compression header at the top of the Data. We no longer need an
identify function.
Similarly, my concern about giving developers guidance on what to compress
and what not to become rather less important. The general advice that
"you should compress non-SHF_ALLOC with names starting with .debug" seems
good enough. We can dispense with "compressible".
Put it all together and you end up with this compression API:
int elf_compress(Elf_Scn *scn, int type);
That's genuinely simple. I really like this far more than
any option we've considered so far. What do you think?
-----
What about the old GNU format?
I wish we could pretend that it never happened. On Solaris, we almost
can, as we don't have an installed base of compressed objects in that
format. On GNU, perhaps we could, but it feels a bit more aggressive
than we might like. My decision 2-3 years ago to support that format
on Solaris reflects my judgment that while we don't want to encourage
it, and we hope it dies out, we don't want to force anyone off a cliff.
So, I think we have to handle it. And, I think that each of us should want
to have the code that does that in libelf, for 2 reasons:
- The implementation of ZLIB for the GNU and gABI forms is 98%
the same code.
- Centralized location
Solaris already has that code in libelf, but its all private, and used
only by our ld and elfcompress utilities.
We could agree to each solve this our own way. Standardizing the gABI
is very important to us. Standardizing support for the old GNU format
is considerably less so.
On the other hand, it's clearly good overall to reduce arbitrary
differences that don't matter, so maybe we should try. We could agree
to a small purpose-built API to handle the old format. Perhaps this
would suffice?
int elf_gnu_zsec_identify(Elf_Scn *scn, *name);
Return True (1) if the given section, with its name
given by name, is compressed in the original GNU ZLIB
format, and False (0) otherwise.
int elf_gnu_zsec_compress(Elf_Scn *scn, int inflate);
Set inflate to True (1) to compress, and False (0) to
decompress. The caller is responsible for ensuring that
scn has a name starting with .debug when uncompressed,
and .zdebug when compressed. The caller is also responsible
for changing the name when the compression status is changed.
Supporting the GNU format this way has 2 benefits:
- Allows support for the gABI format to consist of one
function.
- Provides the support for the old GNU format with a couple
of routines that can fade from view as that old format dies,
and which might even be deprecated some day.
Let me know what you think. If this seems promising, I'll try
to implement it, as should you, and we can compare notes. Unlike
my previous proposal, this is just a paper exercise so far, so we're
going to want to try it before committing fully. It should be a
short leap from where we are now to this, so that shouldn't take long.
Thank you for your work on this.
- Ali