Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Update on the glibc segfault issue on Alpha

2 views
Skip to first unread message

John Paul Adrian Glaubitz

unread,
Jan 2, 2023, 6:50:02 AM1/2/23
to
Hello!

Just a quick update on the glibc issue we're seeing on Alpha!

My previous bisecting result was wrong and the segfault I ran into was the result
of combining ld.so and libc.so files from different builds by not including all
necessary paths in my LD_LIBRARY_PATH. So, my previously reported bug report was
invalid [1].

However, as can be seen from the build logs, there are still clearly issues with
the testsuite [2]. Further analysis showed that the problem are static binaries
which segfault. Anything dynamically linked, works.

This becomes obvious when building the glibc Debian package manually with the testsuite
disabled and then installing these packages inside a test chroot. While the libc-bin
package is being configured, ldconfig is run - a statically linked binary - and crashes
with a segfault.

Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
to this issue as it needs to be debugged further. I will try to bisect which particular
introduced the regression and file a new upstream bug report.

During our discussion, Adhemerval pointed out that this change [3] might be the culprit
but I have not been able to verify this yet.

Adrian

> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=29899
> [2] https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=alpha&ver=2.36-6&stamp=1669706903&raw=0
> [3] https://sourceware.org/pipermail/glibc-cvs/2020q2/069444.html

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

John Paul Adrian Glaubitz

unread,
Jan 2, 2023, 12:40:03 PM1/2/23
to
Hello!

On 1/2/23 12:44, John Paul Adrian Glaubitz wrote:
> Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
> to this issue as it needs to be debugged further. I will try to bisect which particular
> introduced the regression and file a new upstream bug report.
>
> During our discussion, Adhemerval pointed out that this change [3] might be the culprit
> but I have not been able to verify this yet.

My latest bisecting has shown this change to be the responsible change, CC'ing the author:

commit 73fc4e28b9464f0e13edc719a5372839970e7ddb (refs/bisect/bad)
Author: Florian Weimer <fwe...@redhat.com>
Date: Mon Feb 28 11:50:41 2022 +0100

Linux: Consolidate auxiliary vector parsing (redo)

And optimize it slightly.

This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.

In _dl_aux_init in elf/dl-support.c, use an explicit loop
and -fno-tree-loop-distribute-patterns to avoid memset.

Reviewed-by: Szabolcs Nagy <szabol...@arm.com>

Adrian

Adhemerval Zanella Netto

unread,
Jan 3, 2023, 7:30:03 AM1/3/23
to


On 02/01/23 14:37, John Paul Adrian Glaubitz wrote:
> Hello!
>
> On 1/2/23 12:44, John Paul Adrian Glaubitz wrote:
>> Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
>> to this issue as it needs to be debugged further. I will try to bisect which particular
>> introduced the regression and file a new upstream bug report.
>>
>> During our discussion, Adhemerval pointed out that this change [3] might be the culprit
>> but I have not been able to verify this yet.
>
> My latest bisecting has shown this change to be the responsible change, CC'ing the author:
>
> commit 73fc4e28b9464f0e13edc719a5372839970e7ddb (refs/bisect/bad)
> Author: Florian Weimer <fwe...@redhat.com>
> Date:   Mon Feb 28 11:50:41 2022 +0100
>
>     Linux: Consolidate auxiliary vector parsing (redo)
>         And optimize it slightly.
>         This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
>         In _dl_aux_init in elf/dl-support.c, use an explicit loop
>     and -fno-tree-loop-distribute-patterns to avoid memset.
>         Reviewed-by: Szabolcs Nagy <szabol...@arm.com>

Thanks, this commits helps narrow down the issue. The 73fc4e28b9464f0e refactor did not
add the GL(dl_phdr) and GL(dl_phnum) for static case, relying on the __ehdr_start symbol
to get the correct values.

The issue is for some archs, alpha for instance, the hidden weak reference is not making
the static linker to define the __ehdr_start address correctly: it is being set to 0 and
thus GL(dl_phdr) and GL(dl_phnum) are set to invalid values.

And I am not sure if the hidden weak __ehdr_start does work on all architectures, so I
think it would be safer to just restore the previous behavior to setup GL(dl_phdr) and
GL(dl_phnum) for static and we can simplify __ehdr_start fallback case to not use
a weak ref (as for PIE). I am checking if the following patch trigger any regression,
at least for alpha it fixes the static failures:

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 543560f36c..63a3eceaea 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
So we can set up _dl_phdr and _dl_phnum even without any
information from auxv. */

- extern const ElfW(Ehdr) __ehdr_start
-# if BUILD_PIE_DEFAULT
- __attribute__ ((visibility ("hidden")));
-# else
- __attribute__ ((weak, visibility ("hidden")));
- if (&__ehdr_start != NULL)
-# endif
- {
- assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
- GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
- GL(dl_phnum) = __ehdr_start.e_phnum;
- }
+ extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
+ assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
+ GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
+ GL(dl_phnum) = __ehdr_start.e_phnum;
}

__tunables_init (__environ);
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
index bf9374371e..5913c9d6e5 100644
--- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
@@ -56,6 +56,10 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
if (GLRO(dl_sysinfo_dso) != NULL)
GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
#endif
+#ifndef SHARED
+ GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
+ GL(dl_phnum) = auxv_values[AT_PHENT];
+#endif

DL_PLATFORM_AUXV
}

Adhemerval Zanella Netto

unread,
Jan 3, 2023, 9:10:02 AM1/3/23
to


On 03/01/23 10:29, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
>
>> diff --git a/csu/libc-start.c b/csu/libc-start.c
>> index 543560f36c..63a3eceaea 100644
>> --- a/csu/libc-start.c
>> +++ b/csu/libc-start.c
>> @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>> So we can set up _dl_phdr and _dl_phnum even without any
>> information from auxv. */
>>
>> - extern const ElfW(Ehdr) __ehdr_start
>> -# if BUILD_PIE_DEFAULT
>> - __attribute__ ((visibility ("hidden")));
>> -# else
>> - __attribute__ ((weak, visibility ("hidden")));
>> - if (&__ehdr_start != NULL)
>> -# endif
>> - {
>> - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
>> - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
>> - GL(dl_phnum) = __ehdr_start.e_phnum;
>> - }
>> + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
>> + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
>> + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
>> + GL(dl_phnum) = __ehdr_start.e_phnum;
>
> There's a separate thread that e.ph_off is not actually correct in this
> context because it's a file offset that doesn't necessarily match the
> virtual memory offset:>
> [PATCH 0/1] __libc_start_main() now uses auxv to obtain phdr's address
> [BZ #29864]
> <https://sourceware.org/pipermail/libc-alpha/2022-December/143874.html>


Yes, but this is a fallback case where the kernel does not provide AT_PHDR
and AT_PHNUM. As I added on the thread, I think it is better to use the
kernel provided value since they will always present and it leverages a
kernel fix for a similar issue [1].

>
> I think this needs to be cleaned up so that the static and dynamic cases
> are aligned. That is, we probably want to do the equivalent of
>
> GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
> GL(dl_phnum) = __ehdr_start.e_phnum;
>

I think we should use the __ehdr_start only as fallback.

> in common code. Ideally, we don't use global variables for that because
> in both cases, we only briefly need these variables.

Right, this make sense.

>
> Thanks,
> Florian
>

[1] https://sourceware.org/pipermail/libc-alpha/2022-December/143942.html

Florian Weimer

unread,
Jan 3, 2023, 9:20:03 AM1/3/23
to
* Adhemerval Zanella Netto:

> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index 543560f36c..63a3eceaea 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> So we can set up _dl_phdr and _dl_phnum even without any
> information from auxv. */
>
> - extern const ElfW(Ehdr) __ehdr_start
> -# if BUILD_PIE_DEFAULT
> - __attribute__ ((visibility ("hidden")));
> -# else
> - __attribute__ ((weak, visibility ("hidden")));
> - if (&__ehdr_start != NULL)
> -# endif
> - {
> - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
> - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
> - GL(dl_phnum) = __ehdr_start.e_phnum;
> - }
> + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
> + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
> + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
> + GL(dl_phnum) = __ehdr_start.e_phnum;

There's a separate thread that e.ph_off is not actually correct in this
context because it's a file offset that doesn't necessarily match the
virtual memory offset:

[PATCH 0/1] __libc_start_main() now uses auxv to obtain phdr's address
[BZ #29864]
<https://sourceware.org/pipermail/libc-alpha/2022-December/143874.html>

I think this needs to be cleaned up so that the static and dynamic cases
are aligned. That is, we probably want to do the equivalent of

GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
GL(dl_phnum) = __ehdr_start.e_phnum;

in common code. Ideally, we don't use global variables for that because
in both cases, we only briefly need these variables.

Thanks,
Florian

John Paul Adrian Glaubitz

unread,
Jan 4, 2023, 4:30:02 AM1/4/23
to
Hi Adhemerval!
I can confirm that this patch fixes the problem for me. I have uploaded a manually built glibc
package with the patch applied to Debian »unreleased« so that the buildds can resume building
packages again.

Would it be possible to backport this patch to 2.36 once it has been merged with the current
development tree? The reason is that I don't think that Debian is going to switch to anything
beyond 2.36 anytime soon due to the upcoming feature freeze in January.

Thanks,

Adhemerval Zanella Netto

unread,
Jan 4, 2023, 7:20:02 AM1/4/23
to
I have sent a more detailed patch [1] and I will probably backport to all affected releases
once it is accepted.

John Paul Adrian Glaubitz

unread,
Jan 5, 2023, 6:30:03 AM1/5/23
to
Hi Adhemveral!

On 1/4/23 13:00, Adhemerval Zanella Netto wrote:
>> I can confirm that this patch fixes the problem for me. I have uploaded a manually built glibc
>> package with the patch applied to Debian »unreleased« so that the buildds can resume building
>> packages again.
>>
>> Would it be possible to backport this patch to 2.36 once it has been merged with the current
>> development tree? The reason is that I don't think that Debian is going to switch to anything
>> beyond 2.36 anytime soon due to the upcoming feature freeze in January.
>
> I have sent a more detailed patch [1] and I will probably backport to all affected releases
> once it is accepted.

Great, thanks a lot! This means, the fix will eventually be available in Debian and we won't have
to carry the patch for a long time.
0 new messages