libelf gelf_newehdr and gelf_newphdr return types

20 views
Skip to first unread message

Mark Wielaard

unread,
Dec 2, 2016, 4:42:37 AM12/2/16
to gener...@googlegroups.com, Ali Bahrami, elfutil...@lists.fedorahosted.org
Hi,

Someone was porting elfutils libelf to Windows64 and noticed that the
return types of gelf_newehdr and gelf_newphdr (unsigned long int) is not
appropriate on that platform. It uses the LLP64 data model where int and
long are both 32bits, while pointers are 64bits. Instead of the more
common LP64 model where both long and pointer are 64bits. This obviously
breaks that interface.

I couldn't find the history behind this return type. Both elfutils and
solaris libelf share the same return type, but some other libelf
implementations, like the freebsd one, return a void *. Which does make
more sense IMHO. Does someone remember the background?

I have been pondering just changing the return type to void *, which
should be abi compatible on any platform that elfutils currently
supports. But it might cause some compiler warnings about needed or
unneeded casts when existing code relies on the the return type being an
integral type. So an alternative would be to make the functions return
an uintptr_t, which should work in all data models.

Any opinions if this is something to clarify/fix across
implementations/platforms supporting an libelf implementation?

Thanks,

Mark

Ali Bahrami

unread,
Dec 2, 2016, 3:26:06 PM12/2/16
to Mark Wielaard, gener...@googlegroups.com, elfutil...@lists.fedorahosted.org, Rod Evans
Hi Mark,

I'm currently searching code to see what the impacts are,
but I suspect that there are none, and that we should follow
FreeBSD's lead. They've done us a service by already testing
out the viability.

The reason for using unsigned long was undoubtedly so that
pre-K&R C compilers can compile the code. gelf was invented
at a time in history where ansi C was still pretty new, and
not everyone was using it. 'unsigned long' was the pre-ansi
idiom for "generic pointer" that could be used in either mode.

All of the code I've found in Solaris using these functions so
far, and there is very little, treat the return value as a
pointer, and even compare it to NULL, so switching to 'void *'
would be transparent and binary compatible.

I'll follow up to this as soon as I'm done digging. In the meantime,
might I suggest that you make this change in a test environment,
do a 'make world', and see if you hit any issues?

Thanks.

- Ali

Ali Bahrami

unread,
Dec 2, 2016, 3:39:57 PM12/2/16
to Josh Stone, Mark Wielaard, gener...@googlegroups.com, elfutil...@lists.fedorahosted.org, Rod Evans
On 12/ 2/16 12:00 PM, Josh Stone wrote:
> I'd favor uintptr_t to keep better compatibility with current elfutils.
>
> That's not perfect either, since glibc uses "unsigned int" on 32-bit
> platforms, so that could still cause warnings for things that strictly
> expect long, like printf "%lx". But directly printing a newehdr return
> value (without saving it somewhere) is a weird thing to do, so I think
> we can dismiss that case, at least. Off the top of my head, I can't
> think of anything else that's so strict about long vs. int.
>
> Another option is to make the change conditional on __LP64__, so
> existing platforms are totally unaffected. I'd still lean toward
> uintptr_t for the new case, so existing libelf code can port easily.


In a 32-bit environment, int, long, and (void *) are all a 32-bit
integer at the machine level. That's true for any unix I know about
(including Linux) and also for Win32 and Win64 (and I presume that
no one cares about Win16). As a result, printf "%x" and "%lx"
are equivalent. The compiler might warn, but the code it generates
would be correct, and so, the noise is a good thing --- it will
encourage you to go change your printf to use "%p", but if you
ignore it, nothing bad will occur as a result.

I expect very little compiler noise for C programs, because in C,
'void *' is very promiscuous, and adapts without complaint to any
assignment. C++ might be a bigger issue, and I guess the question there
is: How much C++ code is out there using these, and why would it not be
better for that code to just get fixed? Even in C++, existing old C++
binaries would continue to run without issue.

Although uintptr_t would work, it seems unnecessary. Note that freebsd
already uses 'void *', and surely they build most of the same ELF-related
FOSS code that we might be concerned about, so I think that the libelf
code already has ported easily. As such, and given that 'void *' is the
common C idiom for this sort of use, I think it would make sense to
use it.

We should all do the experiment and report back.

- Ali

Ali Bahrami

unread,
Dec 3, 2016, 4:05:49 PM12/3/16
to gener...@googlegroups.com, Mark Wielaard, elfutil...@lists.fedorahosted.org, Rod Evans, Joseph Koshy, mic...@mr511.de
On 12/ 2/16 01:25 PM, Ali Bahrami wrote:
>
> I'm currently searching code to see what the impacts are,
> but I suspect that there are none, and that we should follow
> FreeBSD's lead. They've done us a service by already testing
> out the viability.
>
> The reason for using unsigned long was undoubtedly so that
> pre-K&R C compilers can compile the code. gelf was invented
> at a time in history where ansi C was still pretty new, and
> not everyone was using it. 'unsigned long' was the pre-ansi
> idiom for "generic pointer" that could be used in either mode.
>
> All of the code I've found in Solaris using these functions so
> far, and there is very little, treat the return value as a
> pointer, and even compare it to NULL, so switching to 'void *'
> would be transparent and binary compatible.
>
> I'll follow up to this as soon as I'm done digging. In the meantime,
> might I suggest that you make this change in a test environment,
> do a 'make world', and see if you hit any issues?


Hi Mark,

I'm done with my experiments, and I think that we should
adopt 'void *' as the return type for these functions. That
might take awhile, but I see no reason why the folks doing the
Windows port shouldn't move ahead immediately, using a few
ifdefs, which can be later removed when we all catch up.

I already described the machine-level reasons why I think this
change is safe. 'void *' and 'unsigned long' generate the same
machine code for systems where

sizeof(long) == sizeof(pointer)

which generally holds for unix, and on any systems that are
sill likely to be running. For these systems, these are equivalent
alternative syntaxes that generate the same machine code, and and
here should be no binary level incompatibilities. The FreeBSD experiment
provides real world evidence of this.

That's the theory, here are my experiments...

The first thing to report is that use of these 2 functions
is very rare. In Solaris, they are called once each in a small
set of programs:

- The mcs/strip/elfcompress programs, which are really
one program, hardlinked together, using the argv[0]
trick to figure out what they are at runtime.

- A demo program we ship under /usr/demo/ELF

- A couple of we use to build Solaris, but which are not
shipped.

In all of these cases, our code calls them as such:

if (gelf_newehdr(elf, class) == NULL)
bail();

and then accesses them using gelf get/set routines. We know that
the return value is a valid pointer, but we never access it via
that pointer, probably because that requires the programmer to
care about which ELFCLASS they're dealing with. To do that defeats
the only real reason to use GElf rather than just calling the
"real" class-specific libelf routines. For that reason, it would
have made sense for these functions to return a boolean error/success
indication rather than a pointer. I don't think we should make that
change at this late date --- it's just an observation in passing.

I also googled for these APIs, and found very little of note,
other than manpages.

I made the ['void *' change, to the <gelf.h> header, and to libelf,
and then without modifying any calling code, did a full build of our
core OS workspace. There were no errors. I then installed those
bits on my build machine, and used it to do a second full build,
and that one was also completely clean. I therefore have no concerns
about this regarding any code that we produce.

I also grepped the sources for a majority of the open source that
we deliver (including gcc and binutils), and found that none of it
calls these functions. If they did, I'm confident that they would
"just work", with the possible exception of some easily fixed C++.
However, as nothing calls it, there nothing to be concerned about.

In your environment, you may have some different consumers, so I'll
wait to hear what you find, but 'void *' seems OK from here. In the
final analysis, our main concern is to keep the various libelf's
unified, so we'll go along with the group consensus.

- Ali

Mark Wielaard

unread,
Dec 6, 2016, 8:47:36 AM12/6/16
to Ali Bahrami, Josh Stone, gener...@googlegroups.com, elfutil...@lists.fedorahosted.org, Rod Evans
On Fri, 2016-12-02 at 13:39 -0700, Ali Bahrami wrote:
> Although uintptr_t would work, it seems unnecessary. Note that freebsd
> already uses 'void *', and surely they build most of the same ELF-related
> FOSS code that we might be concerned about, so I think that the libelf
> code already has ported easily. As such, and given that 'void *' is the
> common C idiom for this sort of use, I think it would make sense to
> use it.

I was leaning towards using uintptr_t. But given that other libelf
implementations use void * already for the return type of these two
functions it does seem better to just consolidate on that. It also seems
more in line with the original intent of the gelf functions (return a
pointer to the newly constructed header or NULL of failure).

Cheers,

Mark

Mark Wielaard

unread,
Dec 6, 2016, 9:31:57 AM12/6/16
to Kurt Roeckx, gener...@googlegroups.com, elfutil...@lists.fedorahosted.org, Rod Evans, Joseph Koshy, mic...@mr511.de
On Sat, 2016-12-03 at 23:02 +0100, Kurt Roeckx wrote:
> On Sat, Dec 03, 2016 at 02:05:41PM -0700, Ali Bahrami wrote:
> > I also googled for these APIs, and found very little of note,
> > other than manpages.
>
> You can search the source in Debian here:
> https://codesearch.debian.net/search?q=gelf_newehdr
> https://codesearch.debian.net/search?q=gelf_newphdr
>
> The only things really are:
> - dwz casts it to a char *.
> - prelink has it's own implementation that uses unsigned long
>
> All the rest seems to either compare it to NULL, or not even do
> anything with the return value.

That is a nice code search interface. I wish I had something like that
for Fedora. I haven't done a "make world" but I also didn't find
anything that relied on these functions returning an unsigned long. At
most code compares the result to 0 or NULL.

I did find the following in dwz. As Kurt did in Debian. But the cast
isn't really to (char *) that is only in comments. What the code really
does is:

/* Some gelf_newehdr implementations don't return the resulting
ElfNN_Ehdr, so we have to do it the hard way instead of:
e_ident = (char *) gelf_newehdr (elf, gelf_getclass (dso->elf)); */
switch (gelf_getclass (dso->elf))
{
case ELFCLASS32:
e_ident = (char *) elf32_newehdr (elf);
break;
case ELFCLASS64:
e_ident = (char *) elf64_newehdr (elf);
break;
default:
e_ident = NULL;
break;
}

Which is slightly horrible. But neither the Solaris documentation, nor
the elfutils libelf documentation seem to actually make any promise
about the return value. It is only implied that, like all gelf
functions, zero will be returned on failure.

I wanted to update the elfutils gelf documentation for these functions
to state that the returned pointer is to the appropriate header for the
ELF class. Which I believe is the only sane thing to do. I don't know of
any libelf implementation where the comment from dwz is true.

Cheers,

Mark

Ali Bahrami

unread,
Dec 6, 2016, 11:45:40 AM12/6/16
to gener...@googlegroups.com, Kurt Roeckx, elfutil...@lists.fedorahosted.org, Rod Evans, Joseph Koshy, mic...@mr511.de
On 12/ 6/16 07:31 AM, Mark Wielaard wrote:
> /* Some gelf_newehdr implementations don't return the resulting
> ElfNN_Ehdr, so we have to do it the hard way instead of:
> e_ident = (char *) gelf_newehdr (elf, gelf_getclass (dso->elf)); */
> switch (gelf_getclass (dso->elf))
> {
> case ELFCLASS32:
> e_ident = (char *) elf32_newehdr (elf);
> break;
> case ELFCLASS64:
> e_ident = (char *) elf64_newehdr (elf);
> break;
> default:
> e_ident = NULL;
> break;
> }
>
> Which is slightly horrible. But neither the Solaris documentation, nor
> the elfutils libelf documentation seem to actually make any promise
> about the return value. It is only implied that, like all gelf
> functions, zero will be returned on failure.
>
> I wanted to update the elfutils gelf documentation for these functions
> to state that the returned pointer is to the appropriate header for the
> ELF class. Which I believe is the only sane thing to do. I don't know of
> any libelf implementation where the comment from dwz is true.


I'm not aware of libelf implementations where the return
value isn't the pointer to the underlying class specific
header, but I suppose that there could be one.

I never thought I'd argue against clarifying documentation,
but I'm not sure that making this any more explicit is really
helping anyone. It really doesn't make sense that a library
that exists to provide class independent access to objects
returns a class dependent pointer, and I think the intent
really is that callers only pay attention to 0 vs non-zero.
Perhaps the documentation should just say that, while the
internals return the pointer as they always have, to maintain
compatibility with the past.

- Ali

Mark Wielaard

unread,
Dec 6, 2016, 6:09:10 PM12/6/16
to Ali Bahrami, gener...@googlegroups.com, Kurt Roeckx, elfutil...@lists.fedorahosted.org, Rod Evans, Joseph Koshy, mic...@mr511.de
On Tue, Dec 06, 2016 at 09:45:31AM -0700, Ali Bahrami wrote:
> > Which is slightly horrible. But neither the Solaris documentation, nor
> > the elfutils libelf documentation seem to actually make any promise
> > about the return value. It is only implied that, like all gelf
> > functions, zero will be returned on failure.
> >
> > I wanted to update the elfutils gelf documentation for these functions
> > to state that the returned pointer is to the appropriate header for the
> > ELF class. Which I believe is the only sane thing to do. I don't know of
> > any libelf implementation where the comment from dwz is true.
>
> I'm not aware of libelf implementations where the return
> value isn't the pointer to the underlying class specific
> header, but I suppose that there could be one.
>
> I never thought I'd argue against clarifying documentation,
> but I'm not sure that making this any more explicit is really
> helping anyone. It really doesn't make sense that a library
> that exists to provide class independent access to objects
> returns a class dependent pointer, and I think the intent
> really is that callers only pay attention to 0 vs non-zero.
> Perhaps the documentation should just say that, while the
> internals return the pointer as they always have, to maintain
> compatibility with the past.

Grin. I am not sure that will deter people from assuming
what it really points to. But OK, I'll only document that
NULL means failure.

Cheers,

Mark

Ali Bahrami

unread,
Dec 6, 2016, 6:41:11 PM12/6/16
to Mark Wielaard, gener...@googlegroups.com, Kurt Roeckx, elfutil...@lists.fedorahosted.org, Rod Evans, Joseph Koshy, mic...@mr511.de
On 12/ 6/16 04:09 PM, Mark Wielaard wrote:
> Grin. I am not sure that will deter people from assuming
> what it really points to. But OK, I'll only document that
> NULL means failure.


We can't stop them, but we don't have to help... :-)

It sounds like we're agreed then, so I'll move ahead
with changing these 2 routines to use 'void *' on
Solaris. Thanks for bring up the issue.

- Ali

Reply all
Reply to author
Forward
0 new messages