What if the result of elf_hash is larger than UINT32_MAX?

335 views
Skip to first unread message

Fangrui Song

unread,
Apr 12, 2023, 1:24:39 AM4/12/23
to Generic System V Application Binary Interface
I never paid close attention to the ELF hashing function but I just noticed a patch https://reviews.llvm.org/D147890 , which mentioned that llvm-project's ELF hashing function is incorrect (char vs unsigned char) and inefficient.

According to https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the following code fragment shows the hashing function which is used by many projects.

```
Figure 5-13: Hashing Function

unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
```

When `long` represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
It is unclear whether a value larger than UINT32_MAX is intended.

If not, it may be appropriate to replace this function with Nathan Sidwell's version.

```
unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
                h ^= (h >> 24) & 0xf0;
}
return h & 0x0fffffff;
}
```

I then investigated the binutils-gdb implementation and found that in a 2003 commit (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=32dfa85d9015feea4a06d423fe58f6eaf841456e),
Andrew Haley appeared to have noticed the issue and made a change to "Mask lower 32 bits of hash."
This change made me more convinced that the generic ABI code fragment had an oversight.

Fangrui Song

unread,
Apr 12, 2023, 1:49:25 AM4/12/23
to Generic System V Application Binary Interface
On Tuesday, April 11, 2023 at 10:24:39 PM UTC-7 Fangrui Song wrote:
I never paid close attention to the ELF hashing function but I just noticed a patch https://reviews.llvm.org/D147890 , which mentioned that llvm-project's ELF hashing function is incorrect (char vs unsigned char) and inefficient.

According to https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the following code fragment shows the hashing function which is used by many projects.

```
Figure 5-13: Hashing Function

unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
```

When `long` represents a 64-bit integer, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, larger than UINT32_MAX.
It is unclear whether a value larger than UINT32_MAX is intended.


It seems obvious that on 32-bit and 64-bit systems, the function should not give different results. Values larger than UINT32_MAX are unintended.

Fangrui Song

unread,
Apr 12, 2023, 4:00:00 AM4/12/23
to Generic System V Application Binary Interface
OK. It looks like musl has provided this elegant version since June 2011. I've recorded my notes in https://maskray.me/blog/2023-04-12-elf-hash-function  

Michael Matz

unread,
Apr 12, 2023, 10:00:36 AM4/12/23
to 'Fangrui Song' via Generic System V Application Binary Interface
Hello,

> According to
> https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash, the
> following code fragment shows the hashing function which is used by many
> projects.
>
> ```
> Figure 5-13: Hashing Function
>
> unsigned long
> elf_hash(const unsigned char *name)

This was simply written (and copied around to other documents) in a time
when all the ELF world was 32bit. If nbuckets is a power of two the size
of 'long' also makes no difference (as the hash functions result will
always be only used to index into the buckets[] array). But yes, the
proper range of hash values is [0, UINT32_MAX], and the proper return type
meanwhile is uint32_t.


Ciao,
Michael.
Reply all
Reply to author
Forward
0 new messages