Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[perl #37308] Parrot gobbles up all the memory

1 view
Skip to first unread message

Andy Dougherty

unread,
Sep 29, 2005, 1:28:36 PM9/29/05
to bugs-bi...@rt.perl.org
# New Ticket Created by Andy Dougherty
# Please include the string: [perl #37308]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37308 >


With a a fresh checkout (r9274) I get a number of errors where parrot eventually
gobbles up all the memory on the system. Here's the first such one:

t/op/gc........................
# Failed test (t/op/gc.t at line 279)
# got: 'Null PMC access in clone()
# current instr.: '(null)' pc 199 (/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir:123)
# called from Sub '(null)' pc 199 (/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir:123)
# Parrot VM: PANIC: Out of mem!
# C file src/memory.c, line 92
# Parrot file (not available), line (not available)
#
# We highly suggest you notify the Parrot team if you have not been working on
# Parrot. Use parrotbug (located in parrot's root directory) or send an
# e-mail to perl6-i...@perl.org.
# Include the entire text of this error message and the text of the script that
# generated the error. If you've made any modifications to Parrot, please
# describe them as well.
#
# Version : 0.2.3-devel
# Configured : Thu Sep 29 09:59:37 2005
# Architecture: sun4-solaris
# JIT Capable : Yes
# Interp Flags: (no interpreter)
# Exceptions : (missing from core)
#
# Dumping Core...
# Quit - core dumped
# '
# expected: '3 * 5 == 15!
# '
# './parrot --gc-debug "/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir"' failed with exit code 131
# Looks like you failed 1 test of 22.

--
Andy Dougherty doug...@lafayette.edu

Jerry Gay

unread,
Sep 29, 2005, 4:57:42 PM9/29/05
to perl6-i...@perl.org
On 9/29/05, via RT Andy Dougherty <parrotbug...@parrotcode.org> wrote:
> # '
> # expected: '3 * 5 == 15!
> # '
> # './parrot --gc-debug "/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir"' failed with exit code 131
> # Looks like you failed 1 test of 22.

this same test fails on win32, however the error message is less enlightening:

> perl t/harness t/op/gc.t
t/op/gc....ok 12/22
t/op/gc....NOK 13# Failed test (t/op/gc.t at line 279)
# got: '3 * 5 == 15!
# Can't spawn ".\parrot.exe
"D:\usr\local\parrot-bug\trunk\t\op\gc_13.pir"": Bad file descriptor
at lib/Parrot/Test.pm line 238.


# '
# expected: '3 * 5 == 15!
# '

# '.\parrot.exe "D:\usr\local\parrot-bug\trunk\t\op\gc_13.pir"'
failed with exit code 255
t/op/gc....ok 22/22# Looks like you failed 1 test of 22.
t/op/gc....dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 13
Failed 1/22 tests, 95.45% okay
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/op/gc.t 1 256 22 1 4.55% 13
Failed 1/1 test scripts, 0.00% okay. 1/22 subtests failed, 95.45% okay.


perhaps a fix will affect both platforms, and the method used to fix
this problem may be applied to the few other windows bugs with the
same cryptic message.

~jerry

Leopold Toetsch

unread,
Sep 30, 2005, 5:56:17 AM9/30/05
to perl6-i...@perl.org, bugs-bi...@netlabs.develooper.com
Andy Dougherty (via RT) wrote:

> With a a fresh checkout (r9274) I get a number of errors where parrot eventually
> gobbles up all the memory on the system. Here's the first such one:
>
> t/op/gc........................
> # Failed test (t/op/gc.t at line 279)

> # './parrot --gc-debug "/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir"' failed with exit code 131


> # Looks like you failed 1 test of 22.

Strange. The test succeeds on linux/86 and OS/X 10.3 darwin. Running it
through valgrind on the linux box doesn't show any indication of an error.

t/op/gc_13 is using continuations for backtracking and a few closures.
Maybe you can compare used features of other failing tests, so that the
error reason can be narrowed a bit.
A debug session could also reveal some error cause.

leo

Andy Dougherty

unread,
Sep 30, 2005, 3:09:18 PM9/30/05
to Leopold Toetsch via RT

After resetting my ulimit so that the tests can run without adversely
impacting other uses of the system, I end up with 267 test failures. I
haven't had time to look for common themes. (This is all on Solaris
8/SPARC).

Failed 26/162 test scripts, 83.95% okay. 267/2734 subtests failed, 90.23% okay.


Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------

t/dynclass/gdbmhash.t 13 3328 13 13 100.00% 1-13
t/examples/japh.t 1 256 15 1 6.67% 12
t/library/dumper.t 27 6912 27 27 100.00% 1-27
t/library/getopt_long.t 1 256 1 1 100.00% 1
t/library/md5.t 6 1536 6 6 100.00% 1-6
t/library/parrotlib.t 5 1280 6 5 83.33% 1-4 6
t/library/pcre.t 1 256 1 1 100.00% 1
t/library/pge.t 4 1024 6 4 66.67% 2 4-6
t/library/streams.t 18 4608 20 18 90.00% 1-17 19
t/op/calling.t 1 256 37 1 2.70% 35


t/op/gc.t 1 256 22 1 4.55% 13

t/op/string_cclass.t 2 512 6 2 33.33% 5-6
t/op/trans.t 1 256 19 1 5.26% 13
t/p6rules/anchors.t 26 6656 26 26 100.00% 1-26
t/p6rules/backtrack.t 15 3840 15 15 100.00% 1-15
t/p6rules/builtins.t 41 10496 41 41 100.00% 1-41
t/p6rules/capture.t 38 9728 38 38 100.00% 1-38
t/p6rules/cclass.t 18 4608 18 18 100.00% 1-18
t/p6rules/escape.t 19 4864 19 19 100.00% 1-19
t/p6rules/subrules.t 5 1280 5 5 100.00% 1-5
t/p6rules/ws.t 19 4864 21 19 90.48% 1-15 18-21
t/pmc/delegate.t 1 256 9 1 11.11% 9
t/pmc/fixedpmcarray.t 1 256 13 1 7.69% 10
t/pmc/mmd.t 1 256 30 1 3.33% 27
t/pmc/namespace.t 1 256 15 1 6.67% 12
t/src/hash.t 1 256 10 1 10.00% 6
5 tests and 100 subtests skipped.

So far, I've identified 14 tests that panic with 'Out of mem!'. These all
get a null access internal exception, and then try to exit. During
Parrot_exit, the exit handlers get called. The very first one apparently
tries to do a backtrace, and that backtrace ends up gobbling up all the
memory.

Here are some examples:

# got: 'Null PMC access in clone()
# current instr.: '(null)' pc 199 (/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir:123)
# called from Sub '(null)' pc 199 (/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir:123)
# Parrot VM: PANIC: Out of mem!

# Null PMC access in get_string()
# current instr.: 'delegate :: __get_string' pc 50 (/home/doughera/src/parrot/parrot-andy/t/pmc/delegate_9.pir:27)
# called from Sub 'delegate :: __get_string' pc 50 (/home/doughera/src/parrot/parrot-andy/t/pmc/delegate_9.pir:27)


# Parrot VM: PANIC: Out of mem!

# Null PMC access in get_iter()
# current instr.: 'cmp_fun' pc 80 (/home/doughera/src/parrot/parrot-andy/t/pmc/fixedpmcarray_10.pir:27)
# called from Sub 'cmp_fun' pc 80 (/home/doughera/src/parrot/parrot-andy/t/pmc/fixedpmcarray_10.pir:27)


# Parrot VM: PANIC: Out of mem!

# got: 'Null PMC access in set_integer_keyed_int()
# current instr.: 'Digest :: _md5_init' pc 72 (runtime/parrot/library/Digest/MD5.pir:81)
# called from Sub 'Digest :: _md5_init' pc 72 (runtime/parrot/library/Digest/MD5.pir:81)


# Parrot VM: PANIC: Out of mem!

(all the t/library/md5_*.pir tests fail in the same way).

Here's a backtrace from t/op/gc_13.pir

t@1 (l@1) terminated by signal QUIT (Quit)
(dbx) where
current thread: t@1
=>[1] __sigprocmask(0x0, 0xffbecff0, 0x0, 0x0, 0x0, 0x0), at 0xff0d91f0
[2] _resetsig(0xff0db7f4, 0x0, 0x0, 0x225c98, 0xff0ec000, 0x0), at 0xff0ce56c
[3] _sigon(0x225c98, 0xff0f38a8, 0x3, 0xffbed0c4, 0x225c98, 0x5), at 0xff0cdd0c
[4] _thrp_kill(0x0, 0x1, 0x3, 0xff0ec000, 0x1, 0x1dbd80), at 0xff0d0d4c
[5] raise(0x3, 0x224800, 0x1dbc00, 0x1dbc00, 0x40d320, 0x21f), at 0xff14bce0
[6] mem__internal_allocate_zeroed(0x0, 0x1ccf30, 0x3b, 0x43fc40, 0x226b78, 0x0), at 0x4cdf0
[7] compact_pool(0x225f10, 0x226b78, 0x0, 0xffffffff, 0x226af0, 0xa3670), at 0xa3714
[8] mem_allocate(0x225f10, 0xffbed2ec, 0x226b78, 0x90, 0x100, 0x0), at 0xa35f8
[9] Parrot_allocate_string(0x225f10, 0x3cdb48, 0x80, 0xfffffff8, 0x2100, 0x247838), at 0xa3da4
[10] string_make_empty(0x3cdb48, 0x1, 0x80, 0x247798, 0x225c00, 0x1), at 0x53624
[11] Parrot_sprintf_format(0x225f10, 0x3cdb70, 0xffbef4b0, 0x7c200, 0x0, 0xffbef52c), at 0x7ad94
[12] Parrot_sprintf_c(0x225f10, 0x20ad68, 0x1dc580, 0x0, 0xc7, 0x247ba8), at 0x7a8a8
[13] Parrot_Context_infostr(0x0, 0x4d8c28, 0x263598, 0x4d8c20, 0x40d320, 0x4d5e40), at 0xd658c
[14] PDB_backtrace(0x225f10, 0x40d2f0, 0x4fa5a, 0xff191c14, 0x40, 0x0), at 0x76e28
[15] Parrot_exit(0x2b, 0xff1c3a54, 0xff1bfca8, 0xa, 0x1e12a0, 0x21e400), at 0x7a58c
[16] internal_exception(0x2b, 0x1e12a0, 0x0, 0xb1eb0, 0x52e7c, 0x52e94), at 0xd0f2c
[17] Parrot_Null_clone(0x225f10, 0x263598, 0x38, 0xe, 0x4d5e40, 0x17a920), at 0x17a93c
[18] Parrot_clone_p_p(0x4d5cb4, 0xf, 0x4d5e40, 0x92, 0x4d5e40, 0x84ee0), at 0x84f0c
[19] runops_slow_core(0x4d5cb4, 0x1dc400, 0x0, 0xd6400, 0x0, 0xd2800), at 0xd6674
[20] runops_int(0x225f10, 0x4d5998, 0x226058, 0x1, 0x0, 0x1), at 0xd2688
[21] runops(0x225f10, 0x0, 0x1bc1eb, 0x226058, 0x21e400, 0x247bf0), at 0xd5830
[22] Parrot_runcode(0x225f10, 0x443748, 0x70, 0x8, 0x488200, 0x0), at 0x71d54
[23] Parrot_runcode(0x225f10, 0x1, 0xffbefaf4, 0x0, 0x42ae48, 0x226af0), at 0x71aa4
[24] main(0x21e400, 0x225f10, 0xffbefafc, 0x2248ac, 0x0, 0x0), at 0x4aa84
(dbx) quit

Though it doesn't run out of memory, op/calling_35.pir also dumps core.
Here's its backtrace:

(dbx) run t/op/calling_35.pir
Running: parrot t/op/calling_35.pir
(process id 5567)
Foo ok 1
t@1 (l@1) signal SEGV (no mapping at the fault address) in Parrot_free_context at 0x4be10
0x0004be10: Parrot_free_context+0x0030: st %g1, [%o0 - 0x64]
(dbx) where
current thread: t@1
=>[1] Parrot_free_context(0x0, 0x248400, 0x1, 0x42863c, 0x4f, 0x138), at 0x4be10
[2] Parrot_RetContinuation_invoke(0x225f10, 0x2483f0, 0x563ee8, 0x3e99c0, 0x247e90, 0x253c40), at 0x1841e0
[3] Parrot_returncc(0x563ee4, 0x225f10, 0x5198a0, 0x8, 0x5198a0, 0x826b0), at 0x826c8
[4] runops_slow_core(0x563ee4, 0x1dc400, 0x0, 0xd6400, 0x0, 0xd2800), at 0xd6674
[5] runops_int(0x225f10, 0x563df0, 0x226058, 0x1, 0x0, 0x1), at 0xd2688
[6] runops(0x225f10, 0x0, 0x1bc1eb, 0x226058, 0x21e400, 0x4b6d38), at 0xd5830
[7] Parrot_runcode(0x225f10, 0x443748, 0x70, 0x3e99c0, 0x8, 0x253508), at 0x71d54
[8] Parrot_runcode(0x225f10, 0x1, 0xffbefa78, 0x0, 0x42ae48, 0x226af0), at 0x71aa4
[9] main(0x21e400, 0x225f10, 0xffbefa80, 0x2248ac, 0x0, 0x0), at 0x4aa84
(dbx) quit

One common theme I see is that the interpreter arguments to Parrot_Context_infostr()
and Parrot_free_context() are both null. I don't know why.

I don't know what to make of it all yet, but anyone running tests on
shared systems should probably consider setting a conservative ulimit
value.

--
Andy Dougherty doug...@lafayette.edu

Andrew Dougherty

unread,
Oct 4, 2005, 1:06:16 PM10/4/05
to Andy Dougherty, Leopold Toetsch via RT
On Fri, 30 Sep 2005, Andy Dougherty wrote:

> On Fri, 30 Sep 2005, Leopold Toetsch via RT wrote:
>
> > Andy Dougherty (via RT) wrote:
> >
> > > With a a fresh checkout (r9274) I get a number of errors where parrot eventually
> > > gobbles up all the memory on the system. Here's the first such one:
> > >
> > > t/op/gc........................
> > > # Failed test (t/op/gc.t at line 279)
> >
> > > # './parrot --gc-debug "/home/doughera/src/parrot/parrot-andy/t/op/gc_13.pir"' failed with exit code 131
> > > # Looks like you failed 1 test of 22.
> >
> > Strange. The test succeeds on linux/86 and OS/X 10.3 darwin. Running it
> > through valgrind on the linux box doesn't show any indication of an error.

Ok, I've finally found the cause of this one, but I don't have a portable
patch at hand.

Buried in amongst the 6827 warnings emitted by gcc is one that actually
correctly identifies the problem:

src/inter_create.c:400: warning: dereferencing type-punned pointer will
break strict-aliasing rules

And indeed that appears to be the problem. You can even reproduce the
problem under Linux/x86 with gcc-3.4 or newer. Simply compile with
optimization level of -O3.

The *temporary workaround* for *gcc only* is to supply gcc with the
-fno-strict-aliasing flag.

For those not familiar with aliasing, I found this article

http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html

to be useful. Another relevant page (specific to gcc) is at

http://gcc.gnu.org/bugs.html#nonbugs_c

Hope this helps,

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,
Oct 4, 2005, 2:02:59 PM10/4/05
to Andrew Dougherty, Leopold Toetsch via RT

On Oct 4, 2005, at 19:06, Andrew Dougherty wrote:

> Ok, I've finally found the cause of this one, but I don't have a
> portable
> patch at hand.
>
> Buried in amongst the 6827 warnings emitted by gcc is one that actually
> correctly identifies the problem:

There must be some really heavily used macros that cause that huge
amount of warnings, e.g. used in ops files. Getting rid of these would
really help I presume.

> src/inter_create.c:400: warning: dereferencing type-punned pointer
> will
> break strict-aliasing rules

The line reads:

LVALUE_CAST(char *, p) += ALIGNED_CTX_SIZE;

The intent is of course, to bump the context pointer by the needed
size. The problem is that the needed size does not correlate with the
size of the struct.

> And indeed that appears to be the problem. You can even reproduce the
> problem under Linux/x86 with gcc-3.4 or newer. Simply compile with
> optimization level of -O3.
>
> The *temporary workaround* for *gcc only* is to supply gcc with the
> -fno-strict-aliasing flag.
>
> For those not familiar with aliasing, I found this article
>
> http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html

When I get this right, the article also states:

There exist an important exception to the rule above: char* may alias
all
types (too much code would break if ISO had prevented this...)

Anyway, does:

p = (struct Parrot_Context *) ( (char *) p + ALIGNED_CTX_SIZE );

help, or better is it "more correct"?

We have a lot of similar code e.g. inside GC, where pointers to PMCs or
to PObjs are icnremented by the actual size of the object and not by
the size of some structure.

leo

Andy Dougherty

unread,
Oct 5, 2005, 12:20:04 PM10/5/05
to Leopold Toetsch via RT
On Tue, 4 Oct 2005, Leopold Toetsch via RT wrote:

> On Oct 4, 2005, at 19:06, Andrew Dougherty wrote:

> > src/inter_create.c:400: warning: dereferencing type-punned pointer
> > will
> > break strict-aliasing rules
>
> The line reads:
>
> LVALUE_CAST(char *, p) += ALIGNED_CTX_SIZE;
>
> The intent is of course, to bump the context pointer by the needed
> size. The problem is that the needed size does not correlate with the
> size of the struct.
>

> Anyway, does:
>
> p = (struct Parrot_Context *) ( (char *) p + ALIGNED_CTX_SIZE );
>
> help, or better is it "more correct"?

While this does indeed replace the warning by a different warning ("cast
increases required alignment of target type"), it doesn't fix the problem
-- parrot still panics. (And since we're not accessing the memory through
a (char *), I'm not sure it should make any difference. I'm not a
language lawyer, and I haven't read the standard closely.)

Anyway, from what I've been able to gather, gcc's warnings on aliasing are
neither complete nor always 100% on-the-mark. In this case, for example,
I took the apparently offending function, moved it to a different file
(inter_create2.c) and recompiled that function with and without
-fno-strict-aliasing. It made no difference. However, recompiling the
remaining functions in inter_create.c with -fno-strict-aliasing *did* make
the problem go away.

So the compiler's doing some optimization somewhere else in the file that
it's not warning about, and that optimization only happens with
-fstrict-aliasing. I suspect by continuing my divide and conquer strategy
on inter_create.c, one could probably isolate it to a single function, and
then, perhaps, understand whether it's a compiler optimizer error or
whether it's a programming error.

Since this can be reproduced with gcc-3.4 on Intel, I'd appreciate it if
someone with a faster machine and/or a deeper understanding of what the
code is actually trying to do could hunt it down.

> We have a lot of similar code e.g. inside GC, where pointers to PMCs or
> to PObjs are icnremented by the actual size of the object and not by
> the size of some structure.

Yes, I know, but I'm not fluent enough with parrot's internals to follow
it well, so I get lost every time I try to dig in deeply.

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,
Oct 6, 2005, 7:50:18 AM10/6/05
to Andy Dougherty, Leopold Toetsch via RT
Andy Dougherty wrote:
> On Tue, 4 Oct 2005, Leopold Toetsch via RT wrote:

>>Anyway, does:
>>
>> p = (struct Parrot_Context *) ( (char *) p + ALIGNED_CTX_SIZE );
>>
>>help, or better is it "more correct"?
>
>
> While this does indeed replace the warning by a different warning ("cast
> increases required alignment of target type"), it doesn't fix the problem
> -- parrot still panics. (And since we're not accessing the memory through
> a (char *), I'm not sure it should make any difference. I'm not a
> language lawyer, and I haven't read the standard closely.)

Sh...
Another idea: The context (struct Parrot_Context) is almost only
accessed by the CONTEXT() macro. When now this pointer (ctx.rctx) is
declared being 'void *' it should be compatible with any other pointer
to a structure.

Anyway, the current code is an intermediate step only towards variable
sized register frames. I have to reactivate the alloction code present
also in inter_create.c, when CHUNKED_CTX_MEM is defined as true.

This will very likely also need a split of ctx union into 2 distinct
pointers: ctx.bp (register base pointer) and ctx.rctx (pointer to
Parrot_Context). The allocation will still be like sketched in
inter_create.c:79, i.e. as one block, just the addressing of the 2 items
will be split.

It would be great, if some folks with a stronger C-fu, then I have,
could have a look at it.

> ... I'd appreciate it if

> someone with a faster machine and/or a deeper understanding of what the
> code is actually trying to do could hunt it down.

Yep. My Athlon 800 also needs ages to compile it.

leo

Leopold Toetsch

unread,
Oct 6, 2005, 10:32:26 AM10/6/05
to Andy Dougherty, Leopold Toetsch via RT
Leopold Toetsch wrote:

> ... When now this pointer (ctx.rctx) is

> declared being 'void *' it should be compatible with any other pointer
> to a structure.

I've now rewritten the questioanable code to use a (void*) allocation
pointer. I hope it's better now.

Could you please try r9367, and report success ;-) - thanks.

leo

Andy Dougherty

unread,
Oct 7, 2005, 2:52:26 PM10/7/05
to Leopold Toetsch via RT

It cleared up all the warnings for src/inter_create.c, but since I had
already determined that the problem was elsewhere in that file, I wasn't
too optimistic it would make a difference.

It now fails differently. Here are the results from three different
tries:

Solaris 8/SPARC, gcc-3.4, built with:
perl Configure.pl --optimize=-O3 --debugging=0 --cc=gcc --ld=gcc --link=gcc

Failed 7/167 test scripts, 95.81% okay. 27/2746 subtests failed, 99.02% okay.


Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------

imcc/t/syn/labels.t 1 256 7 1 14.29% 3


t/dynclass/gdbmhash.t 13 3328 13 13 100.00% 1-13
t/examples/japh.t 1 256 15 1 6.67% 12

t/op/jit.t 9 2304 60 9 15.00% 5 8 13 16 21 24 28 33 36


t/op/trans.t 1 256 19 1 5.26% 13

t/pmc/mmd.t 1 256 30 1 3.33% 27

t/src/hash.t 1 256 10 1 10.00% 6

5 tests and 103 subtests skipped.

Solaris 8/SPARC, Sun's cc:
perl Configure.pl --optimize --debugging=0

The test hangs in an (apparently) infinite loop in t/op/jitn_8.pasm.
After killing that manually, I get

Failed 6/167 test scripts, 96.41% okay. 26/2746 subtests failed, 99.05% okay.


Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/dynclass/gdbmhash.t 13 3328 13 13 100.00% 1-13
t/examples/japh.t 1 256 15 1 6.67% 12

t/op/jit.t 9 2304 60 9 15.00% 5 8 13 16 21 24 28 33 36
t/op/jitn.t 1 256 13 1 7.69% 8


t/pmc/mmd.t 1 256 30 1 3.33% 27

t/src/hash.t 1 256 10 1 10.00% 6

5 tests and 103 subtests skipped.


Intel x86/gcc-3.3.5, built with
perl Configure.pl --optimize=-O0 --debugging=0

Failed 2/167 test scripts, 98.80% okay. 3/2749 subtests failed, 99.89% okay.


Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------

imcc/t/syn/labels.t 1 256 7 1 14.29% 3
t/op/jit.t 2 512 60 2 3.33% 52-53
(1 subtest UNEXPECTEDLY SUCCEEDED), 4 tests and 98 subtests skipped.
make: *** [test] Error 255

I'm afraid I can't offer any specific advice. I've tried to follow the
code some, but I confess I haven't really been able to follow it very far.
It may well be because I haven't had time to print out all the relevant
header structures and stay focused on it long enough to get my brain
wrapped all the way around all the different structures and how they
access memory. Nor have I found the relevant documentation yet.

One thing I really don't understand is why the CONTEXT macro has to play
the "-1" trick to access memory to the "left". Similarly, I don't
understand why the ALIGNED_CTX_SIZE macro has a NUMVAL_SIZE buried in it,
and how that fits in with attempting to do things like p[-1]. I guess I
just don't understand what padding assumptions are built in to the code
and why we can't let the compiler compute all the relevant addresses and
offsets for us.

I don't mean to be critical -- there may well be quite sound and
simple reasons -- I just haven't grasped them yet, and as my time is
limited, I'm not optimistic about doing so any time soon.

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,
Oct 7, 2005, 4:39:27 PM10/7/05
to Andy Dougherty, Leopold Toetsch via RT

On Oct 7, 2005, at 20:52, Andy Dougherty wrote:

> perl Configure.pl --optimize=-O3 --debugging=0 --cc=gcc --ld=gcc
> --link=gcc

...
Andy slowly please. No --optimize tests yet. Let's first look at plain
default build.

> Intel x86/gcc-3.3.5, built with
> perl Configure.pl --optimize=-O0 --debugging=0

This seems to be w/o optimizations.


>
> Failed 2/167 test scripts, 98.80% okay. 3/2749 subtests failed, 99.89%
> okay.
> Failed Test Stat Wstat Total Fail Failed List of Failed
> -----------------------------------------------------------------------
> --------
> imcc/t/syn/labels.t 1 256 7 1 14.29% 3
> t/op/jit.t 2 512 60 2 3.33% 52-53
> (1 subtest UNEXPECTEDLY SUCCEEDED), 4 tests and 98 subtests skipped.
> make: *** [test] Error 255

The 2 jit tests don't have an 'end' opcode and rely on nullified I regs
- quite clearly these can fail. The labels test has the same problem.
I'll fix these RSN.

> One thing I really don't understand is why the CONTEXT macro has to
> play
> the "-1" trick to access memory to the "left".

There is currently just one base pointer (praise x86 jit). Parrot
registers are to the right of it, context is at the left side
(src/inter_create.c has a picture describing this).

> Similarly, I don't
> understand why the ALIGNED_CTX_SIZE macro has a NUMVAL_SIZE buried in
> it,
> and how that fits in with attempting to do things like p[-1].

Context + registers are allocated as one chunk. Registers especially
the FLOATVAL ones have to be aligned at FLOATVAL alignment needs.
Therefore there can be a gap between the context and the registers.
Above macro takes care about this fact by increasing the allocation
size.

> I guess I
> just don't understand what padding assumptions are built in to the code
> and why we can't let the compiler compute all the relevant addresses
> and
> offsets for us.

I don't think that current failures are related to this at all - see
explanation for above errors. It's of course true that optimized build
will cause more troubles, but we'll have a look at these later.

leo

Andy Dougherty

unread,
Oct 10, 2005, 2:24:47 PM10/10/05
to Leopold Toetsch via RT
On Fri, 7 Oct 2005, Leopold Toetsch wrote:

>
> On Oct 7, 2005, at 20:52, Andy Dougherty wrote:
>
> > perl Configure.pl --optimize=-O3 --debugging=0 --cc=gcc --ld=gcc
> > --link=gcc
>
> ...
> Andy slowly please. No --optimize tests yet. Let's first look at plain default
> build.

Why? --optimize does at least two different things: First, obviously, it
allows the compiler to optimize. This is often a good strategy for
exposing faulty assumptions in code. Second, it enables the
DISABLE_GC_DEBUG define, which changes the sizes of several
structures, including PMCs and Stack_Chunks. Changing the sizes of
those structures can expose alignment and size assumptions.

> > One thing I really don't understand is why the CONTEXT macro has to play
> > the "-1" trick to access memory to the "left".
>
> There is currently just one base pointer (praise x86 jit). Parrot registers
> are to the right of it, context is at the left side (src/inter_create.c has a
> picture describing this).

I know about the picture. I don't know why you chose to use a pointer
pointing to the middle of the structure instead of the beginning. I'm
afraid "praise x86 jit" doesn't mean anything to me.

> > Similarly, I don't
> > understand why the ALIGNED_CTX_SIZE macro has a NUMVAL_SIZE buried in it,
> > and how that fits in with attempting to do things like p[-1].
>
> Context + registers are allocated as one chunk. Registers especially the
> FLOATVAL ones have to be aligned at FLOATVAL alignment needs. Therefore there
> can be a gap between the context and the registers. Above macro takes care
> about this fact by increasing the allocation size.

But you don't include the gap in your picture, and the code actually
assumes that the gap is at the beginning of the chunk, not in the middle
as you indicate here. Placing the gap at the beginning allows p[-1] to
work, but then runs into the problem that the context structures might, in
principle, be incorrectly aligned. I also don't know if there are
any garbage collection issues, since we don't actually carry around a
pointer to the beginning of the allocated chunk -- but that may
simply be my ignorance of garbage collection.

(This isn't an issue now because, at least for the default configuration
on common architectures, ALIGNED_CTX_SIZE == sizeof(struct
Parrot_Context), so there actually isn't any padding at all. Further,
since the Parrot_Context structure consists of integers and pointers, its
alignment constraints are easily satisfied.)

If, instead, you allocated a single structure containing both the Context
plus the registers, then the compiler would correctly ensure all the
correct padding and compute all the offsets for you.

> I don't think that current failures are related to this at all - see
> explanation for above errors. It's of course true that optimized build will
> cause more troubles, but we'll have a look at these later.

You may well be right, since there is actually 0 padding at present.
However, the tests in question worked in the trunk before the merge, and
fail after it, so this was a natural place to look. Also, alignment
issues have been at the root of a number of problems in the past, gcc
warned about aliasing problems with the code in this area, and it was just
plain puzzling to me.

None of which is terribly urgent to me, however, as I don't expect to have
time to follow up on this for quite a while.

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,
Oct 10, 2005, 4:47:46 PM10/10/05
to Andy Dougherty, Leopold Toetsch via RT

On Oct 10, 2005, at 20:24, Andy Dougherty wrote:

> Why? --optimize does at least two different things: First,
> obviously, it
> allows the compiler to optimize. This is often a good strategy for
> exposing faulty assumptions in code. Second, it enables the
> DISABLE_GC_DEBUG define, which changes the sizes of several
> structures, including PMCs and Stack_Chunks. Changing the sizes of
> those structures can expose alignment and size assumptions.

Sure, all ACK. And actually I'm compiling --optimized quite often to
check perfomance. But I currently don't care about failing tests due to
opimizations, the more that the involved structures are only
intermittend steps towards variable sized register frames.

> I know about the picture. I don't know why you chose to use a pointer
> pointing to the middle of the structure instead of the beginning. I'm
> afraid "praise x86 jit" doesn't mean anything to me.

This structure will also change soon again.

> If, instead, you allocated a single structure containing both the
> Context
> plus the registers, then the compiler would correctly ensure all the
> correct padding and compute all the offsets for you.

This will not work for variable sized register frame, as there is no
real register structure. We will have to fake it enough that the
compiler thinks there is a structure, but actually there is none.

I'm currently working on an outline of the final thingy. Given that,
I'm very glad about hints WRT the sanest way to convince compilers to
DTRT.

> None of which is terribly urgent to me, however, as I don't expect to
> have
> time to follow up on this for quite a while.

I need some time too, to convert to the 'final' design. I'll appreciate
comments, tests, ... very much.

leo

0 new messages