--- osname= linux osvers= 2.4.21-14.elsmp arch= i386-linux-thread-multi cc= gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-37) --- Flags: category=core severity=high ack=no --- I have evidence that DOD runs can miss noticing local variable pointers to live objects on x86 Linux. This is happening while running ponie, but the problem is during a single call to string_make. The gdb traces are from a copy of the parrot source code which has had a few debugging printf()s and static variables added, but the problem shows up with vanilla parrot source code. Specifically this variable at the top of string_make:
STRING *s = NULL;
(which experimentally is "volatile" in my copy, but the failure is the same without) gets assigned a header by new_string_header, called at line 789 (or so) of string.c:
Note that 0x85ba81c, the first thing to be added, is the string header I mentioned earlier, pointed to by s 2 frames up.
145 if (pool->compact) { (gdb) up #1 0x0821c772 in Parrot_allocate_string (interpreter=0x844f388, str=0x85ba81c, size=9) at src/resources.c:656 656 PObj_bufstart(str) = mem_allocate(interpreter, &req_size, pool, (gdb) up #2 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9, encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829 829 Parrot_allocate_string(interpreter, s, len); (gdb) p s $9 = (volatile STRING *) 0x85ba81c (gdb) where #0 mem_allocate (interpreter=0x844f388, req_size=0xbfff8ca0, pool=0x844fb90, align_1=15) at src/resources.c:145 #1 0x0821c772 in Parrot_allocate_string (interpreter=0x844f388, str=0x85ba81c, size=9) at src/resources.c:656 #2 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9, encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829 #3 0x08172dfa in string_from_cstring (interpreter=0x844f388, buffer=0x83a50b0, len=0) at src/string.c:658 #4 0x081b9d97 in Parrot_PMC_typenum (interp=0x844f388, class=0x83a50b0 "Perl5NULL") at src/extend.c:600 #5 0x080d20dd in S_new_SV () at sv.c:164
So why is the x86 stack walking code missing the pointer to 0x85ba81c held in local variable s 2 stack frames higher?
It's a bit of a bugger as it causes a SEGV. :-(
Looks like the configuration information needs a bit more:
gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-42) Linux switch.work.fotango.com 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux libc-2.3.2.so
Nicholas Clark
PS It's occurred to me that this constant calling of Parrot_PMC_typenum without caching (or precomputing) the result is woefully inefficient, but I'd like this problem resolved before I change anything. And I'm stuck. --- Summary of my parrot 0.1.1 configuration: configdate='Tue Oct 19 11:14:39 2004' Platform: osname=linux, archname=i686-linux jitcapable=1, jitarchname=i386-linux, jitosname=LINUX, jitcpuarch=i386 execcapable=1 perl=/usr/local/perl-5.8.5/bin/perl Compiler: cc='cc', ccflags=' -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', Linker and Libraries: ld='cc', ldflags=' -L/usr/local/lib', cc_ldflags='', libs='-lnsl -ldl -lm -lcrypt -lutil -lpthread -lrt' Dynamic Linking: share_ext='.so', ld_share_flags='-shared -L/usr/local/lib -fPIC', load_ext='.so', ld_load_flags='-shared -L/usr/local/lib -fPIC' Types: iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4, ptrsize=4, ptr_alignment=1 byteorder=1234, nv=double, numvalsize=8, doublesize=8
--- Environment: HOME LANG LANGUAGE LD_LIBRARY_PATH LOGDIR PATH SHELL
Nicholas Clark <parrotbug-follo...@parrotcode.org> wrote: > I have evidence that DOD runs can miss noticing local variable pointers to > live objects on x86 Linux. This is happening while running ponie, but > the problem is during a single call to string_make. The gdb traces are from > a copy of the parrot source code which has had a few debugging printf()s and > static variables added, but the problem shows up with vanilla parrot source > code.
Please set a breakpoint at dod.c:trace_mem_block, which should be called by the stack walking code. The variables lo_var_ptr and hi_var_ptr should be set and contain the auto variable on stack.
Do you have a test for Parrot showing this behavior?
> PS It's occurred to me that this constant calling of Parrot_PMC_typenum without > caching (or precomputing) the result is woefully inefficient,
Yep. It's a hash lookup. Just run it during ponie init, remember the types you are interested in and use Parrot_PMC_new().
Nicholas Clark <parrotbug-follo...@parrotcode.org> wrote: > I have evidence that DOD runs can miss noticing local variable pointers to > live objects on x86 Linux. This is happening while running ponie, but > the problem is during a single call to string_make. The gdb traces are from > a copy of the parrot source code which has had a few debugging printf()s and > static variables added, but the problem shows up with vanilla parrot source > code.
Please set a breakpoint at dod.c:trace_mem_block, which should be called by the stack walking code. The variables lo_var_ptr and hi_var_ptr should be set and contain the auto variable on stack.
Do you have a test for Parrot showing this behavior?
> PS It's occurred to me that this constant calling of Parrot_PMC_typenum without > caching (or precomputing) the result is woefully inefficient,
Yep. It's a hash lookup. Just run it during ponie init, remember the types you are interested in and use Parrot_PMC_new().
On Tue, Oct 26, 2004 at 10:44:35AM +0200, Leopold Toetsch wrote: > Nicholas Clark <parrotbug-follo...@parrotcode.org> wrote:
> > I have evidence that DOD runs can miss noticing local variable pointers to > > live objects on x86 Linux. This is happening while running ponie, but > > the problem is during a single call to string_make. The gdb traces are from > > a copy of the parrot source code which has had a few debugging printf()s and > > static variables added, but the problem shows up with vanilla parrot source > > code.
> Please set a breakpoint at dod.c:trace_mem_block, which should be called > by the stack walking code. The variables lo_var_ptr and hi_var_ptr > should be set and contain the auto variable on stack.
#9 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9, encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829 829 Parrot_allocate_string(interpreter, s, len); (gdb) p &s $22 = (volatile STRING **) 0xbfff7d60 (gdb) down 9 #0 trace_mem_block (interpreter=0x844f388, lo_var_ptr=3221192128, hi_var_ptr=3221191540) at src/dod.c:960 960 size_t buffer_min = get_min_buffer_address(interpreter); (gdb) p lo_var_ptr - 0xbfff7d60 $23 = 96 (gdb) p hi_var_ptr - 0xbfff7d60 $24 = 4294966804 (gdb) p 0xbfff7d60 - hi_var_ptr $25 = 492
They contain it.
The for loop inside trace_mem_block steps right over it. This if fails:
/* Do a quick approximate range check by bit-masking */ if ((ptr & mask) == prefix || !prefix) {
This is understandable, given the value of mask:
(gdb) printf "%x\n", mask ffffffff
Is that value of mask what you expect from these inputs?
(gdb) p buffer_min $67 = 138870296 (gdb) p buffer_max $68 = 140530992 (gdb) p pmc_min $69 = 138940536 (gdb) p pmc_max $70 = 3065406416
> Do you have a test for Parrot showing this behavior?
No, I can't even re-created it on a different x86 Linux machine with the same ponie source.
On Tue, Oct 26, 2004 at 10:44:35AM +0200, Leopold Toetsch wrote: > Nicholas Clark <parrotbug-follo...@parrotcode.org> wrote:
> > I have evidence that DOD runs can miss noticing local variable pointers to > > live objects on x86 Linux. This is happening while running ponie, but > > the problem is during a single call to string_make. The gdb traces are from > > a copy of the parrot source code which has had a few debugging printf()s and > > static variables added, but the problem shows up with vanilla parrot source > > code.
> Please set a breakpoint at dod.c:trace_mem_block, which should be called > by the stack walking code. The variables lo_var_ptr and hi_var_ptr > should be set and contain the auto variable on stack.
#9 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9, encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829 829 Parrot_allocate_string(interpreter, s, len); (gdb) p &s $22 = (volatile STRING **) 0xbfff7d60 (gdb) down 9 #0 trace_mem_block (interpreter=0x844f388, lo_var_ptr=3221192128, hi_var_ptr=3221191540) at src/dod.c:960 960 size_t buffer_min = get_min_buffer_address(interpreter); (gdb) p lo_var_ptr - 0xbfff7d60 $23 = 96 (gdb) p hi_var_ptr - 0xbfff7d60 $24 = 4294966804 (gdb) p 0xbfff7d60 - hi_var_ptr $25 = 492
They contain it.
The for loop inside trace_mem_block steps right over it. This if fails:
/* Do a quick approximate range check by bit-masking */ if ((ptr & mask) == prefix || !prefix) {
This is understandable, given the value of mask:
(gdb) printf "%x\n", mask ffffffff
Is that value of mask what you expect from these inputs?
(gdb) p buffer_min $67 = 138870296 (gdb) p buffer_max $68 = 140530992 (gdb) p pmc_min $69 = 138940536 (gdb) p pmc_max $70 = 3065406416
> Do you have a test for Parrot showing this behavior?
No, I can't even re-created it on a different x86 Linux machine with the same ponie source.
Nicholas Clark wrote: > The for loop inside trace_mem_block steps right over it. This if fails:
> /* Do a quick approximate range check by bit-masking */ > if ((ptr & mask) == prefix || !prefix) {
Argh, yes. I have pointed out quite a time ago that this mask check isn't ok. Small and big memory chunks are allocated from different malloc pools so that the check doesn't really work.
I'd drop the mask test - or if possible you can fix it ;)
Nicholas Clark wrote: > The for loop inside trace_mem_block steps right over it. This if fails:
> /* Do a quick approximate range check by bit-masking */ > if ((ptr & mask) == prefix || !prefix) {
Argh, yes. I have pointed out quite a time ago that this mask check isn't ok. Small and big memory chunks are allocated from different malloc pools so that the check doesn't really work.
I'd drop the mask test - or if possible you can fix it ;)
On Tue, Oct 26, 2004 at 03:21:18PM +0200, Leopold Toetsch wrote: > Nicholas Clark wrote:
> >The for loop inside trace_mem_block steps right over it. This if fails:
> > /* Do a quick approximate range check by bit-masking */ > > if ((ptr & mask) == prefix || !prefix) {
> Argh, yes. I have pointed out quite a time ago that this mask check > isn't ok. Small and big memory chunks are allocated from different > malloc pools so that the check doesn't really work.
> I'd drop the mask test - or if possible you can fix it ;)
It also doesn't help that a << 8 * sizeof(a) is undefined behaviour in C so gcc is not wrong returning 0xFFFFFFFF for ~(size_t)0 << i; when i=32 [Thanks to Abhijit on IRC for confirming my suspicion on this one]
so I checked in a change to find_common_mask to remove that undefined behaviour and correctly return 0 for that case. This is enough to fix the DOD problem. I've not changed any of the code further, but I assume that the problem with small and big chunks is that their pointers differ so much that prefix will almost always be zero, and many many random pointers found during the stack walk will have to be checked to eliminated. Is it worth having separate entries for large and small buffer min and max in the interpreter struct?
> Thanks for your thorough analysis,
No problem. Thanks for your support when I got stuck.
On Tue, Oct 26, 2004 at 03:21:18PM +0200, Leopold Toetsch wrote: > Nicholas Clark wrote:
> >The for loop inside trace_mem_block steps right over it. This if fails:
> > /* Do a quick approximate range check by bit-masking */ > > if ((ptr & mask) == prefix || !prefix) {
> Argh, yes. I have pointed out quite a time ago that this mask check > isn't ok. Small and big memory chunks are allocated from different > malloc pools so that the check doesn't really work.
> I'd drop the mask test - or if possible you can fix it ;)
It also doesn't help that a << 8 * sizeof(a) is undefined behaviour in C so gcc is not wrong returning 0xFFFFFFFF for ~(size_t)0 << i; when i=32 [Thanks to Abhijit on IRC for confirming my suspicion on this one]
so I checked in a change to find_common_mask to remove that undefined behaviour and correctly return 0 for that case. This is enough to fix the DOD problem. I've not changed any of the code further, but I assume that the problem with small and big chunks is that their pointers differ so much that prefix will almost always be zero, and many many random pointers found during the stack walk will have to be checked to eliminated. Is it worth having separate entries for large and small buffer min and max in the interpreter struct?
> Thanks for your thorough analysis,
No problem. Thanks for your support when I got stuck.
- /* Shifting a value by its size (in bits) or larger is undefined behaviour. - so need an explict check to return 0 if there is no prefix, rather than - attempting to rely on (say) 0xFFFFFFFF << 32 being 0 */ - for (i = 0; i < bound; i++) { - if (val1 == val2) { - return ~(size_t)0 << i; - } - val1 >>= 1; - val2 >>= 1; - } - if (val1 == val2) { - assert(i == bound); - return 0; - } + while (diff & mask) + mask <<= 1;