Hello,
Recent version of libc6 seems to include a libintl that regularly
crashes when gettext is invoked from different threads simultaneously.
This renders gettext mostly unusable on multi-threaded software.
I had been suspecting a bug in VLC and banging my head around, but it
appears that this can be reproduced with code as simple as the piece
above. It triggers a segmentation fault on a very time-dependant basis.
It seems a lot easier to reproduce under valgrind, though I also get
segfaults when run without debuggers:
#include <stdio.h>
#include <pthread.h>
#include <locale.h>
#include <libintl.h>
static void *run (void *dummy)
{
(void)dummy;
for (;;)
printf ("Translation code: %s\n", dgettext("vlc", "C"));
}
int main (void)
{
unsigned i;
setlocale (LC_ALL, "");
bindtextdomain ("vlc", "/usr/share/locale");
pthread_t threads[300];
for (i = 0; i < sizeof (threads) / sizeof (threads[0]); i++)
pthread_create (threads + i, NULL, run, NULL);
run (NULL);
return 0;
}
When the problem occurs under valgrind, it complains:
==3535== Thread 3:
==3535== Invalid read of size 4
==3535== at 0x4063F0B: _nl_find_msg (dcigettext.c:862)
==3535== by 0x4064A41: __dcigettext (dcigettext.c:639)
==3535== by 0x4063972: dcgettext (dcgettext.c:53)
==3535== by 0x406399F: dgettext (dgettext.c:54)
==3535== by 0x80484DD: run (in /home/remi/a.out)
==3535== by 0x402D2D2: start_thread (pthread_create.c:296)
==3535== by 0x41124ED: clone (in /usr/lib/debug/libc-2.6.1.so)
==3535== Address 0x418C91C is 0 bytes after a block of size 12 alloc'd
==3535== at 0x4024862: realloc (vg_replace_malloc.c:306)
==3535== by 0x4063FF1: _nl_find_msg (dcigettext.c:876)
==3535== by 0x4064A41: __dcigettext (dcigettext.c:639)
==3535== by 0x4063972: dcgettext (dcgettext.c:53)
==3535== by 0x406399F: dgettext (dgettext.c:54)
==3535== by 0x80484DD: run (in /home/remi/a.out)
==3535== by 0x402D2D2: start_thread (pthread_create.c:296)
==3535== by 0x41124ED: clone (in /usr/lib/debug/libc-2.6.1.so)
There appears to be a similar issue with strerror_r() also.
Regards,
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (100, 'unstable'), (100, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.20.15 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages libc6 depends on:
ii libgcc1 1:4.2.1-5 GCC support library
libc6 recommends no packages.
-- debconf information:
glibc/restart-failed:
glibc/restart-services:
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
> ==3535== Address 0x418C91C is 0 bytes after a block of size 12 alloc'd
> ==3535== at 0x4024862: realloc (vg_replace_malloc.c:306)
> ==3535== by 0x4063FF1: _nl_find_msg (dcigettext.c:876)
> ==3535== by 0x4064A41: __dcigettext (dcigettext.c:639)
> ==3535== by 0x4063972: dcgettext (dcgettext.c:53)
> ==3535== by 0x406399F: dgettext (dgettext.c:54)
> ==3535== by 0x80484DD: run (in /home/remi/a.out)
> ==3535== by 0x402D2D2: start_thread (pthread_create.c:296)
> ==3535== by 0x41124ED: clone (in /usr/lib/debug/libc-2.6.1.so)
This one though looks fishy.
> There appears to be a similar issue with strerror_r() also.
Have you looked at the code ? I can tell it's thread safe, except that
it calls _("Unknown error") at some point, which would indicate to be
the same bug as yours, the rest is definitely thread safe.
--
·O· Pierre Habouzit
··O madc...@debian.org
OOO http://www.madism.org
It's the same error! It only means that _nl_find_msg from dcigettext.c:862
tries to read at an address right after the end of a realloc() done at line
876 in the same file, as far as I understand.
After the above errors, I usually get this:
==29015== Invalid read of size 1
==29015== at 0x40255DE: strcmp (mc_replace_strmem.c:341)
==29015== by 0x4063F18: _nl_find_msg (dcigettext.c:862)
==29015== by 0x4064A41: __dcigettext (dcigettext.c:639)
==29015== by 0x4063972: dcgettext (dcgettext.c:53)
==29015== by 0x406399F: dgettext (dgettext.c:54)
==29015== by 0x80484DD: run (in /home/remi/a.out)
==29015== by 0x402D2D2: start_thread (pthread_create.c:296)
==29015== by 0x41124ED: clone (in /usr/lib/debug/libc-2.6.1.so)
==29015== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==29015==
==29015== Process terminating with default action of signal 11 (SIGSEGV)
==29015== Access not within mapped region at address 0x0
==29015== at 0x40255DE: strcmp (mc_replace_strmem.c:341)
==29015== by 0x4063F18: _nl_find_msg (dcigettext.c:862)
==29015== by 0x4064A41: __dcigettext (dcigettext.c:639)
==29015== by 0x4063972: dcgettext (dcgettext.c:53)
==29015== by 0x406399F: dgettext (dgettext.c:54)
==29015== by 0x80484DD: run (in /home/remi/a.out)
==29015== by 0x402D2D2: start_thread (pthread_create.c:296)
==29015== by 0x41124ED: clone (in /usr/lib/debug/libc-2.6.1.so)
Looks like strcmp tries to compare with NULL.
--
Rémi Denis-Courmont
http://www.remlab.net/
yes, speculative is that. As libc knows that memory can be read
outside from an allocated block as soon as you don't go after the end of
a page, it sometimes reads outside from the buffer to be able to compute
lengths of strings or string comparisons, so I wouldn't care a lot. THe
second block _is_ probably the issue, because the realloc could be
performed in many thread at a time, hence corrupt the struct. I already
reported the bug upstream, we'll see what Uli will say.
I still have the exact same pseudo-reproducible crashes inside strerror_r with
glibc 2.7-1 in unstable.
Looks like VLC Linux port is going to not have localization anymore :(