Bug#815974: Segmentation fault in libresolv triggered by php5-fpm

Fabian Niepelt

unread,

Feb 26, 2016, 4:10:04 AM2/26/16

to

Package: libc6
Version: 2.13-38+deb7u10

Dear maintainer,

since the latest update for glibc we keep observing occasional
segmentation faults in libresolv [1]. They are triggered (for us) by
php5-fpm which runs an Owncloud instance when logging in. After the
segfault happens, I can relogin successfuly for about 20 minutes at
which point the segfault happens again. Restarting php5-fpm or
rebooting also does not influence the occurence of it.

We were using the 5.5 packages from the dotdeb repository, but the
segfaults persist in the 5.6 packages and the official wheezy 5.4
packages.

Attaching to the php5-fpm worker process with GDB yields [2] at
segfault time. (for debugging purposes I set the amount of pool workers
to 1 so I would not attach to the wrong process)

Ubuntu seems to have a similar problem since the update:
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1546459

I'll be gladly providing additional info if you require it.

Thank you for your time.

Greetings

[1]
[57348.111866] php5-fpm[20421]: segfault at 200000001 ip
00007fd339eb74fa sp 00007fff9f055700 error 4 in libresolv-
2.13.so[7fd339eaf000+13000]
[62889.617877] php5-fpm[20420]: segfault at 270752f65 ip
00007fd339eb74fa sp 00007fff9f055700 error 4 in libresolv-
2.13.so[7fd339eaf000+13000]
[64717.111099] php5-fpm[20753]: segfault at 270752f65 ip
00007ff6819ef4fa sp 00007fff0d576a90 error 4 in libresolv-
2.13.so[7ff6819e7000+13000]
[66684.547776] php5-fpm[21385]: segfault at 270752f65 ip
00007fd55be4f4fa sp 00007fffe6a3dcd0 error 4 in libresolv-
2.13.so[7fd55be47000+13000]

[2]
[many symbols being loaded messages]
82      ../sysdeps/unix/syscall-template.S: No such file or directory.
Traceback (most recent call last):
File "/usr/lib/debug/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17-
gdb.py", line 62, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named libstdcxx.v6.printers
(gdb) continue
Continuing.
warning: Could not load shared library symbols for
/lib/libnss_dns.so.2.
Do you need "set solib-search-path" or "set sysroot"?

Program received signal SIGSEGV, Segmentation fault.
0x00007f146545e4fa in *__GI___libc_res_nsearch (statp=0x7f14659f7300,
name=<optimized out>, class=<optimized out>, type=<optimized out>,
answer=0x7fff6d6c0df0 "2", anslen=<optimized out>,
answerp=0x7fff6d6c1660,
    answerp2=0x7fff6d6c1658, nanswerp2=0x7fff6d6c167c,
resplen2=0x7fff6d6c1678, answerp2_malloced=0x200000032) at
res_query.c:393
393     res_query.c: No such file or directory.

Carlos O'Donell

unread,

Feb 26, 2016, 5:10:03 AM2/26/16

to

On Fri, Feb 26, 2016 at 3:57 AM, Fabian Niepelt <F.Ni...@mittwald.de> wrote:
> I'll be gladly providing additional info if you require it.

> Program received signal SIGSEGV, Segmentation fault.
> 0x00007f146545e4fa in *__GI___libc_res_nsearch (statp=0x7f14659f7300,
> name=<optimized out>, class=<optimized out>, type=<optimized out>,
> answer=0x7fff6d6c0df0 "2", anslen=<optimized out>,
> answerp=0x7fff6d6c1660,
> answerp2=0x7fff6d6c1658, nanswerp2=0x7fff6d6c167c,
> resplen2=0x7fff6d6c1678, answerp2_malloced=0x200000032) at
> res_query.c:393
> 393 res_query.c: No such file or directory.

1) Download the tarball from the official CVE-2015-7547 tests here:
https://sourceware.org/ml/libc-alpha/2016-02/msg00418.html

2) Comment out BUILDDIR (to build against your system libraries)

3) Run 'make' to build the test, and run them all one-by-one.

Do any of them fail on your system?

Cheers,
Carlos.

Fabian Niepelt

unread,

Feb 26, 2016, 6:00:03 AM2/26/16

to

Am Freitag, den 26.02.2016, 05:01 -0500 schrieb Carlos O'Donell:
> On Fri, Feb 26, 2016 at 3:57 AM, Fabian Niepelt <F.Niepelt@mittwald.d

Hello,

indeed most of them fail. I attached a text file with the output of the
failed tests. If a test is not included in the text file it is because
it didn't throw any errors.

I tried them on the system in question, another updated system and out
of curiosity on two opensuse systems, the results were the same. Is
there another prerequisite to running these tests?

Greetings

glibc_tests_failedtests.log

Fabian Niepelt

unread,

Feb 26, 2016, 8:00:04 AM2/26/16

to

This is the correct output, the older one contains a test I thought was
in an endless loop but succeeded after a few minutes.

Greetings

glibc_tests_failedtests.log

Aurelien Jarno

unread,

Feb 26, 2016, 10:40:04 AM2/26/16

to

On 2016-02-26 08:57, Fabian Niepelt wrote:
> Package: libc6
> Version: 2.13-38+deb7u10
>
> Dear maintainer,
>
> since the latest update for glibc we keep observing occasional
> segmentation faults in libresolv [1]. They are triggered (for us) by
> php5-fpm which runs an Owncloud instance when logging in. After the
> segfault happens, I can relogin successfuly for about 20 minutes at
> which point the segfault happens again. Restarting php5-fpm or
> rebooting also does not influence the occurence of it.
>
> We were using the 5.5 packages from the dotdeb repository, but the
> segfaults persist in the 5.6 packages and the official wheezy 5.4
> packages.
>
> Attaching to the php5-fpm worker process with GDB yields [2] at
> segfault time. (for debugging purposes I set the amount of pool workers
> to 1 so I would not attach to the wrong process)

Would it be possible to get a full backtrace to get an idea from where
__libc_res_nsearch is called? You can get it running the command "bt
full" in GDB.

> Ubuntu seems to have a similar problem since the update:
> https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1546459

I am not fully sure it's the same bug, it looks like more a mismatch
between the nss libraries and the libc, at least for the
ubuntu-installer issue.

> I'll be gladly providing additional info if you require it.

When you do such a test do you restart all the processes after upgrading
the libc? It wonder if it could be that the process is started with the
old libc and is later dlopening the new nss libraries.

This clearly shows that the crash is due the answerp2_malloced pointing
at a random location in the following code:

if (answerp2 && *answerp2_malloced)

Well not so random if you look at the kernel logs and the GDB entry. We
have 0x200000001, 0x200000032 and 3 times 0x270752f65.

Aurelien

--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aure...@aurel32.net http://www.aurel32.net

signature.asc

Carlos O'Donell

unread,

Feb 26, 2016, 1:40:03 PM2/26/16

to

On Fri, Feb 26, 2016 at 7:46 AM, Fabian Niepelt <F.Ni...@mittwald.de> wrote:
> This is the correct output, the older one contains a test I thought was
> in an endless loop but succeeded after a few minutes.

The glibc maintainers for debian need to review those failures. They
indicate serious deviation from expected behaviour. At the very least
the bug 18665* tests should not fail. However, the tests are sensitive
to response order.

-address: STREAM/TCP 10.0.3.6 80
-address: STREAM/TCP 2001:db8::4:6 80
+error: Name or service not known

This is a weird failure.

Cheers,
CArlos.

Aurelien Jarno

unread,

Feb 26, 2016, 3:50:03 PM2/26/16

to

The failures in this testsuite do not pass due to the patch we have that
dynamically reloads /etc/resolv.conf when it changes. Just after the
fake servers have been initialized, our libc reloads the configuration
from /etc/resolv.conf, and thus the tests fail. Once removing the
corresponding patch the tests pass, at least on my system.

Anyway I don't think it's related to the problem reported here. The
problem lies in the backport of the following patch, which is a
prerequisite for fixing CVE-2015-7547.

commit ab09bf616ad527b249aca5f2a4956fd526f0712f
Author: Andreas Schwab <sch...@suse.de>
Date: Tue Feb 18 10:57:25 2014 +0100

Properly fix memory leak in _nss_dns_gethostbyname4_r with big DNS answer

Instead of trying to guess whether the second buffer needs to be freed
set a flag at the place it is allocated

This patch changes the ABI of the __libc_res_nsearch function, adding
the ansp2_malloced argument. When this function is called by
_nss_dns_gethostbyname4_r from a libc without the patch (ie the one
installed before applying the security fix), the argument contains
random values, leading to a segfault.

IMHO making sure that programs are restarted after applying the security
update should be enough, but I am not fully sure about my analysis, so a
confirmation would be nice to have.

Fabian Niepelt

unread,

Feb 26, 2016, 5:10:03 PM2/26/16

to

> IMHO making sure that programs are restarted after applying the security
> update should be enough, but I am not fully sure about my analysis, so a
> confirmation would be nice to have.

The machines in question have been rebooted a few times after upgrading.
I will try to get a full backtrace next week. Sadly, I won't have access to the systems over the weekend.

> It wonder if it could be that the process is started with the
> old libc and is later dlopening the new nss libraries.

Going to investigate if there are old libs lying around somewhere in the system on monday.

Greetings

Aurelien Jarno

unread,

Feb 27, 2016, 6:10:03 PM2/27/16

to

On 2016-02-26 22:03, Fabian Niepelt wrote:
> > IMHO making sure that programs are restarted after applying the security
> > update should be enough, but I am not fully sure about my analysis, so a
> > confirmation would be nice to have.
>
> The machines in question have been rebooted a few times after upgrading.

Ok then my scenario might be wrong.

> I will try to get a full backtrace next week. Sadly, I won't have access to the systems over the weekend.

Ok, no problem.

> > It wonder if it could be that the process is started with the
> > old libc and is later dlopening the new nss libraries.
>
> Going to investigate if there are old libs lying around somewhere in the system on monday.

I am able to trigger similar (but slightly different) segmentation fault
by doing name resolving with the new libc (ie 2.13-38+deb7u10) but with
the old /lib/x86_64-linux-gnu/libnss_dns.so.2 (ie from 2.13-38+deb7u9).
Do you have any nss modules which do not come from the libc6 package
installed (either from another package or manually installed)?

Thanks for your help in debugging.

Fabian Niepelt

unread,

Feb 29, 2016, 3:10:03 AM2/29/16

to

Yep, this was it. Searching for the lib yielded an old version of it
that is not managed by package management...
Thank you for giving me the hint.

> Thanks for your help in debugging.

Thank you all for your time and sorry for the noise!

Greetings

Florian Weimer

unread,

Mar 1, 2016, 7:40:03 AM3/1/16

to

* Aurelien Jarno:

> On 2016-02-26 13:31, Carlos O'Donell wrote:
>> On Fri, Feb 26, 2016 at 7:46 AM, Fabian Niepelt <F.Ni...@mittwald.de> wrote:
>> > This is the correct output, the older one contains a test I thought was
>> > in an endless loop but succeeded after a few minutes.
>>
>> The glibc maintainers for debian need to review those failures. They
>> indicate serious deviation from expected behaviour. At the very least
>> the bug 18665* tests should not fail. However, the tests are sensitive
>> to response order.
>>
>> -address: STREAM/TCP 10.0.3.6 80
>> -address: STREAM/TCP 2001:db8::4:6 80
>> +error: Name or service not known
>>
>> This is a weird failure.
>
> The failures in this testsuite do not pass due to the patch we have that
> dynamically reloads /etc/resolv.conf when it changes. Just after the
> fake servers have been initialized, our libc reloads the configuration
> from /etc/resolv.conf, and thus the tests fail. Once removing the
> corresponding patch the tests pass, at least on my system.

Correct, the version Carlos posted does not have the compensation I
added for that, sorry. I added this after the call to res_init in
resolv_redirect:

/* Debian's local-dynamic-resolvconf.diff breaks name server
overrides by application code. The following triggers lazy
initialization of the /etc/resolv.conf mtime value because
res_mkquery calls __res_maybe_init internally. Subsequent calls
to this function will not try reload /etc/resolv.conf as a
result. */
{
unsigned char query[512];
if (res_mkquery (QUERY, "query.example", C_IN, T_A,
NULL, 0, NULL,
query, sizeof (query)) < 0)
{
printf ("error: res_mkquery: %m\n");
abort ();

}
}

> IMHO making sure that programs are restarted after applying the security
> update should be enough, but I am not fully sure about my analysis, so a
> confirmation would be nice to have.

This report <https://bugzilla.redhat.com/show_bug.cgi?id=1309665>
is about an incomplete chroot update. See comment 4 in particular
(although I wrote it without access to the actual installation).