What if the result of elf_hash is larger than UINT32_MAX?

412 views
Skip to first unread message

Fangrui Song

unread,
Apr 12, 2023, 1:24:39 AM4/12/23
to Generic System V Application Binary Interface
I never paid close attention to the ELF hashing function but I just noticed a patch https://reviews.llvm.org/D147890 , which mentioned that llvm-project's ELF hashing function is incorrect (char vs unsigned char) and inefficient.

According to https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the following code fragment shows the hashing function which is used by many projects.

```
Figure 5-13: Hashing Function

unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
```

When `long` represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
It is unclear whether a value larger than UINT32_MAX is intended.

If not, it may be appropriate to replace this function with Nathan Sidwell's version.

```
unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
                h ^= (h >> 24) & 0xf0;
}
return h & 0x0fffffff;
}
```

I then investigated the binutils-gdb implementation and found that in a 2003 commit (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=32dfa85d9015feea4a06d423fe58f6eaf841456e),
Andrew Haley appeared to have noticed the issue and made a change to "Mask lower 32 bits of hash."
This change made me more convinced that the generic ABI code fragment had an oversight.

Fangrui Song

unread,
Apr 12, 2023, 1:49:25 AM4/12/23
to Generic System V Application Binary Interface
On Tuesday, April 11, 2023 at 10:24:39 PM UTC-7 Fangrui Song wrote:
I never paid close attention to the ELF hashing function but I just noticed a patch https://reviews.llvm.org/D147890 , which mentioned that llvm-project's ELF hashing function is incorrect (char vs unsigned char) and inefficient.

According to https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the following code fragment shows the hashing function which is used by many projects.

```
Figure 5-13: Hashing Function

unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
```

When `long` represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
It is unclear whether a value larger than UINT32_MAX is intended.


It seems obvious that on 32-bit and 64-bit systems, the function should not give different results. Values larger than UINT32_MAX are unintended.

Fangrui Song

unread,
Apr 12, 2023, 4:00:00 AM4/12/23
to Generic System V Application Binary Interface
OK. It looks like musl has provided this elegant version since June 2011. I've recorded my notes in https://maskray.me/blog/2023-04-12-elf-hash-function  

Michael Matz

unread,
Apr 12, 2023, 10:00:36 AM4/12/23
to 'Fangrui Song' via Generic System V Application Binary Interface
Hello,

> According to
> https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the
> following code fragment shows the hashing function which is used by many
> projects.
>
> ```
> Figure 5-13: Hashing Function
>
> unsigned long
> elf_hash(const unsigned char *name)

This was simply written (and copied around to other documents) in a time
when all the ELF world was 32bit. If nbuckets is a power of two the size
of 'long' also makes no difference (as the hash functions result will
always be only used to index into the buckets[] array). But yes, the
proper range of hash values is [0, UINT32_MAX], and the proper return type
meanwhile is uint32_t.


Ciao,
Michael.

Fangrui Song

unread,
Feb 3, 2026, 1:34:05 AM (14 days ago) Feb 3
to Generic System V Application Binary Interface
Now that https://gabi.xinuos.com/elf/08-dynamic.html is available, the "Listing 8.2 Hashing Function" should  be fixed.
As my https://maskray.me/blog/2023-04-12-elf-hash-function shows, glibc/binutils-gdb/FreeBSD rtld-elf/OpenBSD rtld-elf were all misled by this example and had bugs.

Cary Coutant

unread,
Feb 5, 2026, 6:41:18 PM (11 days ago) Feb 5
to gener...@googlegroups.com
I opened Issue #13 for this:


If there are no objections to Michael's solution of changing 'unsigned long' to 'uint32_t', I'll make that change.

-cary


--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/generic-abi/22263276-9848-48f9-9d6f-b1676a37091en%40googlegroups.com.

Suprateeka R Hegde

unread,
Feb 5, 2026, 11:16:41 PM (11 days ago) Feb 5
to gener...@googlegroups.com
IMO, we should change everything to stdint (exact-wdith types), and not
just this. Anyway, most headers from GNU and LLVM define Elf32_* Elf64_*
using definitions from stdint.

---
Supra

On 06-Feb-2026 05:11 am, Cary Coutant wrote:
> I opened Issue #13 for this:
>
> https://github.com/xinuos/gabi/issues/13 <https://github.com/xinuos/
> gabi/issues/13>
>
> If there are no objections to Michael's solution of changing 'unsigned
> long' to 'uint32_t', I'll make that change.
>
> -cary
>
>
> On Mon, Feb 2, 2026 at 10:34 PM Fangrui Song <emac...@gmail.com
> <mailto:emac...@gmail.com>> wrote:
>
> On Wednesday, April 12, 2023 at 7:00:36 AM UTC-7 Michael Matz wrote:
>
> Hello,
>
> > According to
> > https://www.sco.com/developers/gabi/latest/
> ch5.dynamic.html#hash <https://www.sco.com/developers/gabi/
> latest/ch5.dynamic.html#hash>, the
> > following code fragment shows the hashing function which is
> used by many
> > projects.
> >
> > ```
> > Figure 5-13: Hashing Function
> >
> > unsigned long
> > elf_hash(const unsigned char *name)
>
> This was simply written (and copied around to other documents)
> in a time
> when all the ELF world was 32bit. If nbuckets is a power of two
> the size
> of 'long' also makes no difference (as the hash functions result
> will
> always be only used to index into the buckets[] array). But yes,
> the
> proper range of hash values is [0, UINT32_MAX], and the proper
> return type
> meanwhile is uint32_t.
>
>
> Ciao,
> Michael.
>
>
> Now that https://gabi.xinuos.com/elf/08-dynamic.html <https://
> gabi.xinuos.com/elf/08-dynamic.html> is available, the "Listing 8.2
> Hashing Function" should  be fixed.
> As my https://maskray.me/blog/2023-04-12-elf-hash-function <https://
> maskray.me/blog/2023-04-12-elf-hash-function> shows, glibc/binutils-
> gdb/FreeBSD rtld-elf/OpenBSD rtld-elf were all misled by this
> example and had bugs.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to generic-abi...@googlegroups.com
> <mailto:generic-abi...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> generic-abi/22263276-9848-48f9-9d6f-b1676a37091en%40googlegroups.com
> <https://groups.google.com/d/msgid/generic-
> abi/22263276-9848-48f9-9d6f-b1676a37091en%40googlegroups.com?
> utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to generic-abi...@googlegroups.com <mailto:generic-
> abi+uns...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/generic-
> abi/CAJimCsGNiRCY8bb4oAC4wQynHtvX_-
> Q%3DwyC5iWe%3DETrfeJbVUw%40mail.gmail.com <https://groups.google.com/d/
> msgid/generic-abi/CAJimCsGNiRCY8bb4oAC4wQynHtvX_-
> Q%3DwyC5iWe%3DETrfeJbVUw%40mail.gmail.com?
> utm_medium=email&utm_source=footer>.

Florian Weimer

unread,
Feb 6, 2026, 4:09:49 AM (10 days ago) Feb 6
to Suprateeka R Hegde, gener...@googlegroups.com
* Suprateeka R. Hegde:

> IMO, we should change everything to stdint (exact-wdith types), and
> not just this. Anyway, most headers from GNU and LLVM define Elf32_*
> Elf64_* using definitions from stdint.

It would be nice to do this. It's annoying to have a type with “64” in
its name and then realize it's actually 16 bits. However, for the
platforms that don't do this, this wouldn't necessarily be a
backwards-compatible change for C++ due to mangling changes.

Thanks,
Florian

Suprateeka R Hegde

unread,
Feb 6, 2026, 9:10:56 AM (10 days ago) Feb 6
to Florian Weimer, gener...@googlegroups.com
That is an interesting and valid interpretation of what I wrote. But I
meant something else :-)

What I meant is to update the Data Representation table at
https://gabi.xinuos.com/elf/01-intro.html#data-representation

My thinking: The gABI spec existed way before C99 too. Several
implementations have used old styled types to match the description
under "Purpose" column. I have seen implementations where there are:

typedef unsigned short Elf64_Half;
etc.

With Standard specifications like C99 (ISO/IEC 9899:1999), we have
better and precise ways to describe the "Purpose" column:

Under the Purpose column, instead of describing "small integer", "medium
integer", etc., I suggest to specify "16-Bit Unsigned Integer
(uint16_t)", "32-Bit Unsigned Integer (uint32_t)", etc.

Although I agree with you on "64" in the name of a "16" bit type, I do
like this "Elf64" or "Elf32" convention. In several old code, or even
some modern code, there are separate functions to handle 64-Bit ELF vs
32-Bit ELF. With the convention, I can easily understand what function I
am reading, based on variable declarations that use these conventions,
and without having scroll up/down for the function names or macro
guards, etc.

--
Supra

Ali Bahrami

unread,
Feb 6, 2026, 1:00:38 PM (10 days ago) Feb 6
to gener...@googlegroups.com
I don't have an issue with using a stdint type for
this particular case, though we use 'long' going all
the way back to New Jersey, and probably won't actually
change our implementation now. However, it seems to
me that using ELF types would be more consistent, and less
confusing overall.

Perhaps the hash function should be written in terms of
the ELF types we've already established, Elf32_Word, and
Elf64_word. That keeps the discussion within the ELF world,
and doesn't require other standards.

Yes, Elf32_Word and Elf64_Word are really both the same thing,
and this does cause some initial confusion, but it's something
that anyone working with ELF needs to already understand. We
could simply document the hash with Elf32_Word, and mention
that this applies equally to ELF64.

Or, we could just leave it as 'long'. I suspect that the messiness
of these alternatives is probably why this wasn't fixed ages ago.
I'm being semi-facetious, but it really doesn't seem worse, and
it's a simpler story.

- Ali
Reply all
Reply to author
Forward
0 new messages