Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

a premature optimization

5 views
Skip to first unread message

Will Coleda

unread,
Aug 6, 2006, 1:03:21 PM8/6/06
to parrot-...@perl.org
Ok. I've recently converted much of partcl's test suite over to "pure
tcl". Instead of using a perl based test script, the tests are now
written in tcl, using a very small test::more like script. Now we can
more easily run the tests from tcl's own test suite, and easily use
tclsh to run our own tests.

This also turned out to be a speed win for the smaller test scripts:
only one parrot had to be invoked.

However, for larger test files (like t/cmd_expr.t), it's a definite
slowdown.

If we run:

../../parrot tcl.pbc --pir t/cmd_expr.t

we get a 9K line PIR sub. Running this through parrot dies after
about 2.5m on an runtime error on one of the tests.

Here's a snippet from near the end of the code:

start_1559:
dynamic_1559:
.local pmc command
$P1566 = new .String
$P1566 = 'foreach'
push_eh invalid_1559
command = get_hll_global '&foreach'
clear_eh
if_null command, invalid_1559
$P1565 = command($P1560, $P1561, $P1563)
goto end_1559
invalid_1559:
.local pmc interactive
interactive = get_root_global ['tcl'], '$tcl_interactive'
unless interactive goto err_command1559
.local pmc unk
unk=find_global '&unknown'
unk($P1566, $P1560, $P1561, $P1563)
goto end_1559
err_command1559:
$S0 = $P1566
$S0 = concat "invalid command name \"", $S0
$S0 .= "\""
.throw($S0)
end_1559:

.return ($P1565)

note the .local declarations there - command, interactive, unk -
they're duplicated many times throughout the code. In an effort to
make this better (both more correct and faster), I made a local
modification to the compiler to avoid re-using named pmcs. So, with
this change, we get:

start_1559:
dynamic_1559:
.local pmc command_1559
$P1566 = new .String
$P1566 = 'foreach'
push_eh invalid_1559
command_1559 = get_hll_global '&foreach'
clear_eh
if_null command_1559, invalid_1559
$P1565 = command_1559($P1560, $P1561, $P1563)
goto end_1559
invalid_1559:
.local pmc interactive_1559
interactive_1559 = get_hll_global '$tcl_interactive'
unless interactive_1559 goto err_command1559
.local pmc unk_1559
unk_1559=get_hll_global '&unknown'
unk_1559($P1566, $P1560, $P1561, $P1563)
goto end_1559
err_command1559:
$S0 = $P1566
$S0 = concat "invalid command name \"", $S0
$S0 .= "\""
.throw($S0)
end_1559:

So, this adds (rough guess) about 4600 unique variables to the sub.
This makes the run time go from about 2.5m to nearly 7m. (until the
same eventual runtime failure which isn't an issue for the purposes
of this post. =-).

So... not sure if this is more a question for the partcl end (how can
we generate PIR more suitable to parrot), or for the parrot end (how
can we more efficiently handle this kind of PIR).

I know the usual caveats about speed (get it working first), but this
is one of those times where the current speed makes it somewhat
difficult.

One potential change to parrot was suggested by Matt Diephouse on
IRC: have a way to declare "this variable is no longer used, I
promise" to limit the scope of its consideration when doing register
allocation.

Thoughts?
--
Will "Coke" Coleda
wi...@coleda.com


Leopold Toetsch

unread,
Aug 6, 2006, 4:08:16 PM8/6/06
to perl6-i...@perl.org
Am Sonntag, 6. August 2006 19:03 schrieb Will Coleda:
> we get a 9K line PIR sub. Running this through parrot dies after
> about 2.5m on an runtime error on one of the tests.

Great progress. While Parrot shouldn't die or - in the long run - take really
long to compile that, you could work a bit on the compiler in the meantime.

> Here's a snippet from near the end of the code:
>
> start_1559:
> dynamic_1559:
> .local pmc command
> $P1566 = new .String
> $P1566 = 'foreach'

# why create a new String here - the line below clearly shows, that the
compiler wants to call '&foreach'.

> push_eh invalid_1559
> command = get_hll_global '&foreach'

# why '&foreach' the tcl name is 'foreach' - just install that sym in the
runtime namespace and use it. Why the 'get_hll_global'? You are setting up an
exception frame anyway. Why not just compile to:

push_eh invalid_xyz
$P1565 = 'for_each'($P1560, $P1561, $P1563)
clear_eh
...

> invalid_1559:
> .local pmc interactive
> interactive = get_root_global ['tcl'], '$tcl_interactive'

# you might cache that once - it's probably used in zillion of places

> unless interactive goto err_command1559
> .local pmc unk
> unk=find_global '&unknown'

# same here and just do ...

'unknown'(...)

... call it instead of ...

> unk($P1566, $P1560, $P1561, $P1563)

my 2 c
leo

Leopold Toetsch

unread,
Aug 6, 2006, 4:28:19 PM8/6/06
to perl6-i...@perl.org
Am Sonntag, 6. August 2006 22:08 schrieb Leopold Toetsch:
>       $P1565 = 'for_each'($P1560, $P1561, $P1563)

'foreach'

of course. Sorry,
leo

Will Coleda

unread,
Aug 6, 2006, 5:43:32 PM8/6/06
to Leopold Toetsch, perl6-i...@perl.org

On Aug 6, 2006, at 4:08 PM, Leopold Toetsch wrote:

> Am Sonntag, 6. August 2006 19:03 schrieb Will Coleda:
>> we get a 9K line PIR sub. Running this through parrot dies after
>> about 2.5m on an runtime error on one of the tests.
>
> Great progress. While Parrot shouldn't die or - in the long run -
> take really
> long to compile that, you could work a bit on the compiler in the
> meantime.
>

I just wanted to make sure I mentioned the error here because I knew
someone would try to run this and get the error: I think there's a
ticket opened for this particular error already...

>> Here's a snippet from near the end of the code:
>>
>> start_1559:
>> dynamic_1559:
>> .local pmc command
>> $P1566 = new .String
>> $P1566 = 'foreach'
>
> # why create a new String here - the line below clearly shows, that
> the
> compiler wants to call '&foreach'.

This should be moved down to the unknown section below - if the .sub
isn't found at runtime, we need to dispatch the name to the unknown
handler. So this string needs to be here, but it's in the wrong spot.
I'll fix that.

>
>> push_eh invalid_1559
>> command = get_hll_global '&foreach'
>
> # why '&foreach' the tcl name is 'foreach' - just install that sym
> in the
> runtime namespace and use it. Why the 'get_hll_global'? You are
> setting up an
> exception frame anyway. Why not just compile to:
>

Because tcl allows both [foo] and $foo. So we (internally) adopted
the sigils to keep these two things segregated. We could have two
separate namespaces for them, but the thought was that this is a
better choice for interoperability.

> push_eh invalid_xyz
> $P1565 = 'for_each'($P1560, $P1561, $P1563)
> clear_eh
> ...
>

I suppose we could have the exception handler check to see if the
exception was due to a missing sub, and unknown() if so, or rethrow
if not. That might work, though I'd be curious if parrot did this
lookup at each invocation or somehow cached it.

>> invalid_1559:
>> .local pmc interactive
>> interactive = get_root_global ['tcl'], '$tcl_interactive'
>
> # you might cache that once - it's probably used in zillion of places
>

This is a user visible variable that could theoretically change
anywhere, so any caching scheme has to take that into account.

>> unless interactive goto err_command1559
>> .local pmc unk
>> unk=find_global '&unknown'
>
> # same here and just do ...
>
> 'unknown'(...)
>
> ... call it instead of ...
>
>> unk($P1566, $P1560, $P1561, $P1563)
>
> my 2 c
> leo
>

I'll investigate just using the '&foreach'() syntax in both cases,
and there's probably some wins there.

A lot of the compilation at this point is naive - we don't bother to
do optimizations with constant strings, for example: constants,
command interpolation, variable interpolation - all these end up
naively in a PMC register that we can than manipulate uniformly. We
can for sure be more aggressive about this sort of thing, though.

Another thing that I'm considering is a custom op library for this
sort of dispatch. Let it handle things at the C level - which might
not be too much faster/slower at execution time, but would be a plus
during compilation.

We could probably replace the entire section above with:

tcl_dispatch $P1566, $P1560, $P1561, $P1563

and have it do all the heavy lifting.

Thanks for the feedback, I'll incorporate what I can!

Matt Diephouse

unread,
Aug 6, 2006, 9:49:50 PM8/6/06
to Will Coleda, Leopold Toetsch, perl6-i...@perl.org
Will Coleda <wi...@coleda.com> wrote:
>
> On Aug 6, 2006, at 4:08 PM, Leopold Toetsch wrote:
>
> >> invalid_1559:
> >> .local pmc interactive
> >> interactive = get_root_global ['tcl'], '$tcl_interactive'
> >
> > # you might cache that once - it's probably used in zillion of places
> >
>
> This is a user visible variable that could theoretically change
> anywhere, so any caching scheme has to take that into account.
It should work to just find this once. Since we (have to) use assign
for variables, any changes that happen happen to the actual PMC. So
caching this is fine.

Also, we can cache the 'tcl' and '_tcl' namespaces to prevent that key
lookup when we're looking for globals. I'll look at doing that.

> >> unless interactive goto err_command1559
> >> .local pmc unk
> >> unk=find_global '&unknown'
> >
> > # same here and just do ...
> >
> > 'unknown'(...)
> >
> > ... call it instead of ...
> >
> >> unk($P1566, $P1560, $P1561, $P1563)
> >
> > my 2 c
> > leo
> >
>
> I'll investigate just using the '&foreach'() syntax in both cases,
> and there's probably some wins there.

Except this only works when the user is in the root HLL namespace or
when the user is executing code in the current namespace. That doesn't
mean it's not a valid option; just that it's not always valid.

--
Matt Diephouse
http://matt.diephouse.com

0 new messages