---
osname= linux
osvers= 2.4.21-14.elsmp
arch= i386-linux-thread-multi
cc= gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-37)
---
Flags:
category=core
severity=high
ack=no
---
I have evidence that DOD runs can miss noticing local variable pointers to
live objects on x86 Linux. This is happening while running ponie, but
the problem is during a single call to string_make. The gdb traces are from
a copy of the parrot source code which has had a few debugging printf()s and
static variables added, but the problem shows up with vanilla parrot source
code. Specifically this variable at the top of string_make:
STRING *s = NULL;
(which experimentally is "volatile" in my copy, but the failure is the same
without) gets assigned a header by new_string_header, called at line 789
(or so) of string.c:
(gdb)
new_string_header (interpreter=0x844f388, flags=0) at src/headers.c:355
355 PObj_get_FLAGS(string) |= flags | PObj_is_string_FLAG;
(gdb)
357 return string;
(gdb) p string
$4 = (STRING *) 0x85ba81c
string_make then gets to the call to Parrot_allocate_string which calls
into mem_allocate, which triggers a DOD run:
656 PObj_bufstart(str) = mem_allocate(interpreter, &req_size, pool,
(gdb)
mem_allocate (interpreter=0x844f388, req_size=0xbfff8ca0, pool=0x844fb90,
align_1=15) at src/resources.c:128
128 size_t size = aligned_size(*req_size, align_1);
(gdb)
aligned_size (size=9, align=15) at src/resources.c:111
111 if (size < sizeof(void *))
(gdb)
116 size += sizeof(struct Buffer_Tail);
(gdb)
119 size = (size + align) & ~align;
(gdb)
120 return size;
(gdb)
121 }
(gdb)
mem_allocate (interpreter=0x844f388, req_size=0xbfff8ca0, pool=0x844fb90,
align_1=15) at src/resources.c:131
131 if (pool->top_block == NULL) {
(gdb)
135 if (pool->top_block->free < size) {
(gdb) p pool->top_block
$7 = (struct Memory_Block *) 0xb5483008
(gdb) p *pool->top_block
$8 = {free = 0, size = 655360, prev = 0x0, next = 0x0,
start = 0xb5483020 "(?^\b????\001", top = 0xb5523020 ""}
(gdb) s
142 Parrot_do_dod_run(interpreter, DOD_trace_stack_FLAG);
At which point I let the DOD run go in one. My copy of gc_mc_add_free_object
has some printf()s in it, with a counter hard coded to trigger just here:
static void
gc_ms_add_free_object(Interp *interpreter,
struct Small_Object_Pool *pool, void *to_add)
{
static int counter;
if (pool->object_size==36 && count > 3 && counter < 5) {
printf ("Add '%s' %p %d %p\n", pool->name, to_add,
pool->num_free_objects, pool->free_list);
counter++;
}
*(void **)to_add = pool->free_list;
pool->free_list = to_add;
if (problem_seen && to_add == problem) {
printf ("Adding problem %p to free %d\n", problem, problem_seen);
}
}
(gdb) n
Add 'string_header' 0x85ba81c 2962 0x85ba7f8
Add 'string_header' 0x85ba840 2962 0x85ba81c
Add 'string_header' 0x85ba864 2962 0x85ba840
Add 'string_header' 0x85ba888 2962 0x85ba864
Add 'string_header' 0x85ba8ac 2962 0x85ba888
Note that 0x85ba81c, the first thing to be added, is the string header
I mentioned earlier, pointed to by s 2 frames up.
145 if (pool->compact) {
(gdb) up
#1 0x0821c772 in Parrot_allocate_string (interpreter=0x844f388,
str=0x85ba81c, size=9) at src/resources.c:656
656 PObj_bufstart(str) = mem_allocate(interpreter, &req_size, pool,
(gdb) up
#2 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9,
encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829
829 Parrot_allocate_string(interpreter, s, len);
(gdb) p s
$9 = (volatile STRING *) 0x85ba81c
(gdb) where
#0 mem_allocate (interpreter=0x844f388, req_size=0xbfff8ca0, pool=0x844fb90,
align_1=15) at src/resources.c:145
#1 0x0821c772 in Parrot_allocate_string (interpreter=0x844f388,
str=0x85ba81c, size=9) at src/resources.c:656
#2 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9,
encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829
#3 0x08172dfa in string_from_cstring (interpreter=0x844f388,
buffer=0x83a50b0, len=0) at src/string.c:658
#4 0x081b9d97 in Parrot_PMC_typenum (interp=0x844f388,
class=0x83a50b0 "Perl5NULL") at src/extend.c:600
#5 0x080d20dd in S_new_SV () at sv.c:164
So why is the x86 stack walking code missing the pointer to 0x85ba81c
held in local variable s 2 stack frames higher?
It's a bit of a bugger as it causes a SEGV. :-(
Looks like the configuration information needs a bit more:
gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-42)
Linux switch.work.fotango.com 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux
libc-2.3.2.so
Nicholas Clark
PS It's occurred to me that this constant calling of Parrot_PMC_typenum without
caching (or precomputing) the result is woefully inefficient, but I'd
like this problem resolved before I change anything. And I'm stuck.
---
Summary of my parrot 0.1.1 configuration:
configdate='Tue Oct 19 11:14:39 2004'
Platform:
osname=linux, archname=i686-linux
jitcapable=1, jitarchname=i386-linux,
jitosname=LINUX, jitcpuarch=i386
execcapable=1
perl=/usr/local/perl-5.8.5/bin/perl
Compiler:
cc='cc', ccflags=' -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
Linker and Libraries:
ld='cc', ldflags=' -L/usr/local/lib',
cc_ldflags='',
libs='-lnsl -ldl -lm -lcrypt -lutil -lpthread -lrt'
Dynamic Linking:
share_ext='.so', ld_share_flags='-shared -L/usr/local/lib -fPIC',
load_ext='.so', ld_load_flags='-shared -L/usr/local/lib -fPIC'
Types:
iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4,
ptrsize=4, ptr_alignment=1 byteorder=1234,
nv=double, numvalsize=8, doublesize=8
---
Environment:
HOME LANG LANGUAGE LD_LIBRARY_PATH LOGDIR PATH SHELL
> I have evidence that DOD runs can miss noticing local variable pointers to
> live objects on x86 Linux. This is happening while running ponie, but
> the problem is during a single call to string_make. The gdb traces are from
> a copy of the parrot source code which has had a few debugging printf()s and
> static variables added, but the problem shows up with vanilla parrot source
> code.
Please set a breakpoint at dod.c:trace_mem_block, which should be called
by the stack walking code. The variables lo_var_ptr and hi_var_ptr
should be set and contain the auto variable on stack.
Do you have a test for Parrot showing this behavior?
> PS It's occurred to me that this constant calling of Parrot_PMC_typenum without
> caching (or precomputing) the result is woefully inefficient,
Yep. It's a hash lookup. Just run it during ponie init, remember the
types you are interested in and use Parrot_PMC_new().
leo
#9 0x08173067 in string_make (interpreter=0x844f388, buffer=0x83a50b0, len=9,
encoding_name=0x83b2038 "iso-8859-1", flags=0) at src/string.c:829
829 Parrot_allocate_string(interpreter, s, len);
(gdb) p &s
$22 = (volatile STRING **) 0xbfff7d60
(gdb) down 9
#0 trace_mem_block (interpreter=0x844f388, lo_var_ptr=3221192128,
hi_var_ptr=3221191540) at src/dod.c:960
960 size_t buffer_min = get_min_buffer_address(interpreter);
(gdb) p lo_var_ptr - 0xbfff7d60
$23 = 96
(gdb) p hi_var_ptr - 0xbfff7d60
$24 = 4294966804
(gdb) p 0xbfff7d60 - hi_var_ptr
$25 = 492
They contain it.
The for loop inside trace_mem_block steps right over it. This if fails:
/* Do a quick approximate range check by bit-masking */
if ((ptr & mask) == prefix || !prefix) {
This is understandable, given the value of mask:
(gdb) printf "%x\n", mask
ffffffff
Is that value of mask what you expect from these inputs?
(gdb) p buffer_min
$67 = 138870296
(gdb) p buffer_max
$68 = 140530992
(gdb) p pmc_min
$69 = 138940536
(gdb) p pmc_max
$70 = 3065406416
> Do you have a test for Parrot showing this behavior?
No, I can't even re-created it on a different x86 Linux machine with the same
ponie source.
Nicholas Clark
> The for loop inside trace_mem_block steps right over it. This if fails:
>
> /* Do a quick approximate range check by bit-masking */
> if ((ptr & mask) == prefix || !prefix) {
Argh, yes. I have pointed out quite a time ago that this mask check
isn't ok. Small and big memory chunks are allocated from different
malloc pools so that the check doesn't really work.
I'd drop the mask test - or if possible you can fix it ;)
> Nicholas Clark
Thanks for your thorough analysis,
leo
> The for loop inside trace_mem_block steps right over it. This if fails:
>
> /* Do a quick approximate range check by bit-masking */
> if ((ptr & mask) == prefix || !prefix) {
Argh, yes. I have pointed out quite a time ago that this mask check
It also doesn't help that a << 8 * sizeof(a) is undefined behaviour in C
so gcc is not wrong returning 0xFFFFFFFF for ~(size_t)0 << i; when i=32
[Thanks to Abhijit on IRC for confirming my suspicion on this one]
so I checked in a change to find_common_mask to remove that undefined
behaviour and correctly return 0 for that case. This is enough to fix the
DOD problem. I've not changed any of the code further, but I assume that
the problem with small and big chunks is that their pointers differ so much
that prefix will almost always be zero, and many many random pointers found
during the stack walk will have to be checked to eliminated. Is it worth
having separate entries for large and small buffer min and max in the
interpreter struct?
> Thanks for your thorough analysis,
No problem. Thanks for your support when I got stuck.
Nicholas Clark
static size_t
find_common_mask(size_t val1, size_t val2)
{
size_t mask = ~0;
size_t diff = val1 ^ val2;
while (diff & mask)
mask <<= 1;
return mask;
}
Bit twiddling is such fun. And error prone. So I won't commit this; I'll
just attach the patch.