This patch moves args_signature, params_signature, returns_signature,
current_args, current_params, and current_returns from Parrot_Interp
to Parrot_Context. This makes the interpreter more reentrant, which
is always a good thing.
Since these variables are no longer global, we need to keep track of
who belongs to whom. In a sub call, current_args and current_results
are set for the caller, while current_params and current_returns are
set for the callee. The same goes for *_signature, mutatis mutandis
(an excuse to use Latin! yes!).
This patch seems to break some tests in t/dynoplibs, and I have no
idea why. It also *looks* like it breaks t/examples/shootout.t, but
if the examples are run manually, they work fine. I am completely
baffled. If anyone can solve these, they get a hug.
Thanks,
Alek Storm
Nope. The current_ ptrs and friends are deliberately not in the Context
structure.
Rational:
1) This ism't needed, these pointers are only valid and needed up to the next
sub call/return.
2) The Context struct should be as small as possible because each sub call
needs one.
3) Reentrancy isn't a problem as new threads get new interpreter structs
anyway.
leo
The current_results member already lives in Parrot_Context; this patch
just moves the rest of them there as well. Variables can't be global
simply because of when they're valid. It's a hack.
> 2) The Context struct should be as small as possible because each sub call
> needs one.
Compactness does not supersede good design.
> 3) Reentrancy isn't a problem as new threads get new interpreter structs
> anyway.
Reentrancy isn't only a threading issue. From src/inter_run.c, in
Parrot_runops_fromc_args_event (line 318):
/* running code from event handlers isn't fully reentrant due to
* these interpreter variables - mainly related to calls */
The code then has to manually save these variables, then restore them
after the PIR sub is called. Thus, reentrancy *is* a problem. With
this patch, the number of variables we have to do this with is reduced
to one.
Thanks,
Alek
current_results is valid until the sub returns. Therefore it's in Context.
This isn't a reason for putting similar variables into the same place, which
are just valid for a range of 3 or some such opcodes.
> > 2) The Context struct should be as small as possible because each sub
> > call needs one.
>
> Compactness does not supersede good design.
Basically & in theorie yes, but don't always forget performance. Any
interpreter built on good & clean design principles based on e.g. SKI
combinator calculus will suck speed-wise. More towards reality ;) we also
have to take compromises in the implementation, always. (And please see
below)
> > 3) Reentrancy isn't a problem as new threads get new interpreter structs
> > anyway.
>
> Reentrancy isn't only a threading issue. From src/inter_run.c, in
> Parrot_runops_fromc_args_event (line 318):
> /* running code from event handlers isn't fully reentrant due to
> * these interpreter variables - mainly related to calls */
> The code then has to manually save these variables, then restore them
> after the PIR sub is called. Thus, reentrancy *is* a problem. With
> this patch, the number of variables we have to do this with is reduced
> to one.
First [0]. But indeed this is a bug in the event code (which isn't really
specced and just there for experimenting with it).
The Plan(tm) is to switch to other event's code at "safe points". This is for
sure not inside C code, but also not inside a function call sequence, where
these pointers are used currently. Such a call/return sequence shall be
considered as one "basic operation", where no code switch might occur.
The current implementation is buggy WRT that. Depending on the actual runloop
we have possible code switches to event handlers at each opcode down to
practically no [1] event handling at all inside JIT-only code.
I've long ago mentioned that we should mark opcodes [2] that might switch to a
different (event handler) runloop with some metasyntax in the .ops files.
There's already a macro that does the necessary flag checking and event loop
handling if needed.
The "good design" here (IMHO) is to fix the event code and not to try to
adjust other parts of the interpreter because of the mentioned reasons.
> Thanks,
> Alek
leo
[0] Well, you would have to prove that there are typicall a lot of event code
switches and that the related save/resore sequence of current_ ptrs does hurt
more than joe average sub call performance penalties your patch imposes, but
lets forget that for a while ;)
[1] except the sleep opcode.
[2] backward loop ops, whatever longer taking opcode ... (too be decided)
> > Compactness does not supersede good design.
> Basically & in theorie yes, but don't always forget performance.
Until we get completeness and correctness and cleanliness, I really think we
should forget performance. It's awfully difficult to profile code that
doesn't do everything it needs to do, and it's comparitively easy to optimize
simple code.
-- c
I think I've clearly outlined where the real problems are. Moving a pointer
_now_ here or there isn't the issue. And re optimizaion again:
[0] Well, you would have to prove that there are typically a lot of event code
switches and that the related save/resore sequence of current_ ptrs does hurt
more than joe average sub call performance penalties your patch imposes, but
lets forget that for a while ;)
> -- c
leo
Since leo was concerned about memory usage, by running valgrind:
$ valgrind --tool=massif parrot bench.pir 100000
I found that, over several runs, the spacetime (time * bytes) usage is
slightly better with the patch than without. Can someone confirm this
also?
Thanks,
Alek Storm
The results look statistically noisy to me, as of r17872, on x86 Linux.
-- c
$ time parrot bench.pir 100000 (before)
0.126924
real 0m0.141s
user 0m0.136s
sys 0m0.000s
$ time parrot bench.pir 100000 (after)
0.136295
real 0m0.151s
user 0m0.144s
sys 0m0.008s
$ time parrot bench.pir 1000000 (before)
1.268441
real 0m1.283s
user 0m1.260s
sys 0m0.012s
$ time parrot bench.pir 1000000 (after)
1.265152
real 0m1.279s
user 0m1.272s
sys 0m0.000s
$ time parrot bench.pir 4000000 (before)
4.998641
real 0m5.013s
user 0m4.992s
sys 0m0.012s
$ time parrot bench.pir 4000000 (after)
5.167657
real 0m5.182s
user 0m5.004s
sys 0m0.012s
$ valgrind --tool=massif parrot bench.pir 100000 (before)
==23475== Massif, a space profiler.
==23475==
==23475== Total spacetime: 5,538,772,556 ms.B
==23475== heap: 98.3%
==23475== heap admin: 1.2%
==23475== stack(s): 0.4%
$ valgrind --tool=massif parrot bench.pir 100000 (after)
==24621== Massif, a space profiler.
==24621==
==24621== Total spacetime: 5,644,447,560 ms.B
==24621== heap: 98.3%
==24621== heap admin: 1.1%
==24621== stack(s): 0.4%
$ valgrind --tool=massif parrot bench.pir 4000000 (before)
==23480== Massif, a space profiler.
==23480==
==23480== Total spacetime: 163,932,225,655 ms.B
==23480== heap: 98.2%
==23480== heap admin: 1.3%
==23480== stack(s): 0.4%
$ valgrind --tool=massif parrot bench.pir 4000000 (after)
==24626== Massif, a space profiler.
==24626==
==24626== Total spacetime: 163,975,392,847 ms.B
==24626== heap: 98.2%
==24626== heap admin: 1.3%
==24626== stack(s): 0.4%
> On Friday 30 March 2007 13:59, Alek Storm wrote:
>
>> I used a simple benchmark to compare the relative speeds of Parrot
>> with and without the patch, and I was surprised to find that the test
>> script runs (very roughly) 10% faster *with* the patch. Can someone
>> confirm this? Running revision 17860; benchmark script attached; run
>> as:
>> $ parrot bench.pir 100000
>>
>> Since leo was concerned about memory usage, by running valgrind:
>> $ valgrind --tool=massif parrot bench.pir 100000
>> I found that, over several runs, the spacetime (time * bytes) usage is
>> slightly better with the patch than without. Can someone confirm this
>> also?
>
> The results look statistically noisy to me, as of r17872, on x86 Linux.
>
> -- c
I'm on ppc, and after making a change to get it to compile(not a
portable patch, but that's another issue), and I'm not noticing any
statistical difference. I didn't run your minibenchmark, but there are
other benchmarks I used, such as those in examples/shootout. In the
cases where I think I get a result, a few runs later of both eliminate
any difference(which could be due to caching). Also, judging by
chromatic's results, any benchmark that finishes in under a fifth of a
second isn't a good benchmark. So needless to say, it doesn't seem as
though your patch does any good for performance(more often it seemed
worse but I didn't get enough results to really check). For subtle
things like this, try reading
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/
testing.html and see if it helps you any on how to benchmark changes.
Now, with regard to the other issues, final decisions about what is
best for how to handle reentrancy depend upon the makeup on whatever
requires reentrancy. Your patch may or may not be the ideal solution
to the problem.
P.S. Thanks for the link! That'll come in handy.
Alek Storm
My guess is that the patch was never resubmitted. If so, I will close
the ticket after this week's release. Speak up now, ladies and gentlemen!
Thank you very much.
kid51
kid51