Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pants ponie performace

14 views
Skip to first unread message

Nicholas Clark

unread,
May 7, 2004, 12:16:00 PM5/7/04
to perl6-i...@perl.org, poni...@perl.org
With all the perl scalars now indirecting through PMCs, Ponie's performance
has dropped. Not surprising, in the general case, but the performance of
lib/locale.t is vile, and I'm wondering if that regression test hits
something pathological. It slows to a crawl after subtest 99 (line 240 in
http://public.activestate.com/cgi-bin/perlbrowse?file=lib%2Flocale.t&rev=
search for the text Find locales
)

I built ponie with profiling, and I find that it's spending an inordinate
amount of time in trace_children (half a second per call):

Flat profile:

Each sample counts as 0.000999001 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
29.55 379.89 379.89 2381 159.55 503.64 trace_children
26.06 714.95 335.06 1337922049 0.00 0.00 pobject_lives
12.95 881.48 166.52 1337414896 0.00 0.00 mark_special
9.90 1008.71 127.23 16667 7.63 49.16 mark_hash
5.69 1081.83 73.12 1939938960 0.00 0.00 lookupBucket
4.81 1143.73 61.90 1939938960 0.00 0.00 lookupBucketIndex
4.52 1201.88 58.15 3433969095 0.00 0.00 getBucket
4.41 1258.55 56.68 21429 2.64 2.65 free_unused_pobjects
0.27 1262.02 3.47 18878877 0.00 0.00 find_bucket
0.12 1263.60 1.57 14149041 0.00 0.00 hash_get_bucket
0.10 1264.92 1.33 166406567 0.00 0.00 Parrot_PMC_get_pointer
0.07 1265.87 0.95 19048 0.05 0.05 used_cow
0.07 1266.73 0.86 9459863 0.00 0.13 get_free_object_df
0.06 1267.56 0.83 19048 0.04 0.04 clear_cow
0.05 1268.18 0.62 124063316 0.00 0.00 int_compare
0.05 1268.76 0.58 816271 0.00 0.00 Perl_pregcomp
0.04 1269.22 0.47 4729750 0.00 0.00 pmc_type
0.03 1269.65 0.43 4736686 0.00 0.27 Perl_sv_upgrade
0.03 1270.07 0.42 1816485 0.00 0.00 Perl_regexec_flags
0.03 1270.48 0.41 48661537 0.00 0.00 Perl_macro_SvPVX
0.03 1270.87 0.38 1632298 0.00 0.00 Perl_pp_regcomp
0.03 1271.23 0.37 1808077 0.00 0.00 Perl_pp_match


Any suggestions on what stats to gather to try to work out where things are
going wrong?

Nicholas Clark

Leopold Toetsch

unread,
May 7, 2004, 12:57:24 PM5/7/04
to Nicholas Clark, perl6-i...@perl.org
Nicholas Clark <ni...@ccl4.org> wrote:
> With all the perl scalars now indirecting through PMCs, Ponie's performance
> has dropped. Not surprising, in the general case, but the performance of
> lib/locale.t is vile, and I'm wondering if that regression test hits
> something pathological. It slows to a crawl after subtest 99 (line 240 in
> http://public.activestate.com/cgi-bin/perlbrowse?file=lib%2Flocale.t&rev=
> search for the text Find locales
> )

> I built ponie with profiling, and I find that it's spending an inordinate
> amount of time in trace_children (half a second per call):

That's very likely caused by GC_DEBUG. Hash lookup calls string_equal
which triggers a DOD run on *each* string compare. Please recompile
parrot with DISABLE_GC_DEBUG=1 in parrot.h. That could help. But see
below.

> Flat profile:

> Each sample counts as 0.000999001 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 29.55 379.89 379.89 2381 159.55 503.64 trace_children
> 26.06 714.95 335.06 1337922049 0.00 0.00 pobject_lives

^^^^^^^^^^

Boy that's a lot.

> 0.07 1266.73 0.86 9459863 0.00 0.13 get_free_object_df

^^^^^^^

How big is the live data set during that test? Seems to be rather big.

I don't see any reference to add_free_object, so I presume the test is
shuffling 9.5 M PMCs around. How much RAM has your machine?

> Any suggestions on what stats to gather to try to work out where things are
> going wrong?

First make sure, that GC_DEBUG(interpreter) := 0, or set
DISABLE_GC_DEBUG. Then - if it's still slow - we have to look further,
e.g. what actually triggers the DOD runs.

> Nicholas Clark

leo

Dan Sugalski

unread,
May 7, 2004, 2:14:29 PM5/7/04
to Nicholas Clark, perl6-i...@perl.org, poni...@perl.org
At 5:16 PM +0100 5/7/04, Nicholas Clark wrote:
>I built ponie with profiling, and I find that it's spending an inordinate
>amount of time in trace_children (half a second per call):

Wow, that's... bad. It seems like you've an insane number of PMCs,
though, since there were over 1.3 billion calls to pobject_lives and
3.4 billion calls to getBucket. Either something's gone wrong in the
profiling or you've gone really degenerate somehow.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,
May 7, 2004, 2:42:42 PM5/7/04
to Nicholas Clark, P6I
Nicholas Clark wrote:

> I built ponie with profiling, and I find that it's spending an inordinate
> amount of time in trace_children (half a second per call):

One more question: Looking at the test, there seems to be a lot of stuff
collected in @Locale: How do these PMCs look like? And how many entries
has @Locale?


> Nicholas Clark

leo


Nicholas Clark

unread,
May 7, 2004, 3:12:35 PM5/7/04
to Leopold Toetsch, perl6-i...@perl.org
On Fri, May 07, 2004 at 06:57:24PM +0200, Leopold Toetsch wrote:
> Nicholas Clark <ni...@ccl4.org> wrote:
> > With all the perl scalars now indirecting through PMCs, Ponie's performance
> > has dropped. Not surprising, in the general case, but the performance of
> > lib/locale.t is vile, and I'm wondering if that regression test hits
> > something pathological. It slows to a crawl after subtest 99 (line 240 in
> > http://public.activestate.com/cgi-bin/perlbrowse?file=lib%2Flocale.t&rev=
> > search for the text Find locales
> > )
>
> > I built ponie with profiling, and I find that it's spending an inordinate
> > amount of time in trace_children (half a second per call):
>
> That's very likely caused by GC_DEBUG. Hash lookup calls string_equal
> which triggers a DOD run on *each* string compare. Please recompile
> parrot with DISABLE_GC_DEBUG=1 in parrot.h. That could help. But see
> below.

I tried building with -DDISABLE_GC_DEBUG=1 and the build fails:

./parrot parrot-config.imc VERSION DEVEL
parrot: src/platform.c:665: Parrot_memcpy_aligned_sse_debug: Assertion `((unsigned long) d & 7) == 0' failed.
make: *** [library/config.fpmc] Aborted
make: *** Deleting file `library/config.fpmc'

Like the profiles, this is on:

$ uname -a
Linux switch.work.fotango.com 2.6.4 #2 SMP Thu Mar 11 18:01:19 GMT 2004 i686 unknown

It's the main Fotango development box, and has 4G of RAM.


> > Flat profile:
>
> > Each sample counts as 0.000999001 seconds.
> > % cumulative self self total
> > time seconds seconds calls ms/call ms/call name
> > 29.55 379.89 379.89 2381 159.55 503.64 trace_children
> > 26.06 714.95 335.06 1337922049 0.00 0.00 pobject_lives
> ^^^^^^^^^^
>
> Boy that's a lot.
>
> > 0.07 1266.73 0.86 9459863 0.00 0.13 get_free_object_df
> ^^^^^^^
>
> How big is the live data set during that test? Seems to be rather big.

How do I best measure this? The line from top (without GC debug) is:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31469 nick 25 0 236m 57m 173m R 99.9 1.4 0:16.64 perl


> I don't see any reference to add_free_object, so I presume the test is
> shuffling 9.5 M PMCs around. How much RAM has your machine?

4Gb. Presumably Linux can only give 2Gb to any one process. 5Gb of swap.

On Fri, May 07, 2004 at 08:42:42PM +0200, Leopold Toetsch wrote:

> One more question: Looking at the test, there seems to be a lot of stuff
> collected in @Locale: How do these PMCs look like? And how many entries

As in a gdb dump? Or a description? Assuming description, ponie is
allocating lots of Perl5... PMCs, currently done as programmatic cut&paste
from the template file ponie/src/pmc/perl5cargo_cult.pmc
http://cvs.perl.org/cgi/cvsweb.cgi/ponie/src/pmc/perl5cargo_cult.pmc?rev=1.5&content-type=text/x-cvsweb-markup

The PMC only used the PMC_data member to store a pointer on to a perl5
SV body. So the array @Locale will be a perl5 SV head pointing to a
perl5PVAV PMC, which in turn will point to a perl5 PVAV structure.
(An array). That will have a pointer to a (perl5 managed, malloc()d) buffer
which will have pointers to perl5 SVs.

(which in turn will be perl SV heads pointing to PMCs, probably perl5PVs,
in turn pointing to perl5 PV structures for the SV's body)

This mess is the interim structure - pretty much the next thing for ponie
is to eliminate those perl5 bodies hanging off the PMCs and store the
relevant data directly.

> has @Locale?

342

(C and POSIX are duplicated for some reason)

Looks like switch has just about every locale installed.

Nicholas Clark

Leopold Toetsch

unread,
May 7, 2004, 4:09:10 PM5/7/04
to Nicholas Clark, perl6-i...@perl.org
Nicholas Clark <ni...@ccl4.org> wrote:
> On Fri, May 07, 2004 at 06:57:24PM +0200, Leopold Toetsch wrote:

> I tried building with -DDISABLE_GC_DEBUG=1 and the build fails:

> ./parrot parrot-config.imc VERSION DEVEL
> parrot: src/platform.c:665: Parrot_memcpy_aligned_sse_debug: Assertion `((unsigned long) d & 7) == 0' failed.

Strange. Try Configure.pl --optimize

> $ uname -a
> Linux switch.work.fotango.com 2.6.4 #2 SMP

... which works here with linux 2.2 and 2.4.

> It's the main Fotango development box, and has 4G of RAM.

Should be enough :)


>> > 0.07 1266.73 0.86 9459863 0.00 0.13 get_free_object_df
>> ^^^^^^^
>>
>> How big is the live data set during that test? Seems to be rather big.

> How do I best measure this?

Indirectly:)
see the <interpinfo> opcode and function or debugging output of parrot -d

> ... The line from top (without GC debug) is:

> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

> 31469 nick 25 0 236m 57m 173m R 99.9 1.4 0:16.64 perl ^^^^

That'ts huge. How much mem does the test console with perl5?


>> I don't see any reference to add_free_object, so I presume the test is
>> shuffling 9.5 M PMCs around. How much RAM has your machine?

> The PMC only used the PMC_data member to store a pointer on to a perl5
> SV body.

That's suboptimal :) Please have a look at the timings I posted a minute
ago. Stuff the pointer into PMC_struct_val(pmc) and get rid of the
C<need_ext> field in the pmc file. This will speed up things
considerable (~6 times) but, the question still remains - where are all
these live PMCs coming from. Does Ponie anchor everything so that
nothing gets collected?

Are there any significant trails of add_free_object in the profile?

> 342

That isn't much. Something is wrong.

> Nicholas Clark

leo

Nicholas Clark

unread,
May 7, 2004, 7:34:44 PM5/7/04
to Leopold Toetsch, perl6-i...@perl.org
On Fri, May 07, 2004 at 10:09:10PM +0200, Leopold Toetsch wrote:
> Nicholas Clark <ni...@ccl4.org> wrote:

> > The PMC only used the PMC_data member to store a pointer on to a perl5
> > SV body.
>
> That's suboptimal :) Please have a look at the timings I posted a minute
> ago. Stuff the pointer into PMC_struct_val(pmc) and get rid of the
> C<need_ext> field in the pmc file. This will speed up things
> considerable (~6 times) but, the question still remains - where are all
> these live PMCs coming from. Does Ponie anchor everything so that
> nothing gets collected?

Thanks.

From skimming managedstruct.pmc I didn't realise that one could do this,
let alone that it would be a superior approach.

I think this illustrates quite nicely why I'd like documentation about PMCs
I guess I'm going to have to write the first draft, based on the sources
people have told me about.

Nicholas Clark

Leopold Toetsch

unread,
May 8, 2004, 4:01:56 AM5/8/04
to Nicholas Clark, perl6-i...@perl.org
Nicholas Clark <ni...@ccl4.org> wrote:

[ half second DODs ]

> 0.07 1266.73 0.86 9459863 0.00 0.13 get_free_object_df

> 0.05 1268.18 0.62 124063316 0.00 0.00 int_compare

Looking at these 2 lines: You got 9.5 million PMCs. That would be ok,
*if* these actually got recycled. From the second line above I've to
assume that all of these PMCs are anchored with C<Parrot_register_pmc>
and are never released.

> Nicholas Clark

leo

Nicholas Clark

unread,
May 8, 2004, 6:40:10 AM5/8/04
to Leopold Toetsch, perl6-i...@perl.org

I checked. They're not. Well deduced.
The problem came down to me storing not the PMC pointer, but the 1's complement
of it in the perl SV. This worked very well at making ponie SEGV whenever it
was about to take a PMC pointer, cast it to some sort of perl structure, and
then merrily try and access that memory.

However, I forgot to do the 1's complement on the value passed to the
unregister PMC call, and that doesn't SEGV if you pass it a duff value, so
I never realised my bug.

So I've fixed that and now it all runs in reasonable time. (As for all its
inherent horrors, perl's reference counting pretty much gets it right as to
when it's time to free up something)

Nicholas Clark

Jeff Clites

unread,
May 8, 2004, 2:16:30 PM5/8/04
to Nicholas Clark, perl6-i...@perl.org, Leopold Toetsch
On May 8, 2004, at 3:40 AM, Nicholas Clark wrote:

> The problem came down to me storing not the PMC pointer, but the 1's
> complement
> of it in the perl SV.

...


> However, I forgot to do the 1's complement on the value passed to the
> unregister PMC call

...


> (As for all its inherent horrors, perl's reference counting pretty
> much gets it right as to when it's time to free up something)

Something which occurred to me recently, in regard to the types of bugs
which arise in ref. counting v. GC v. explicit memory management, is
that in a sense GC seems like to most complicated to implement and get
right, but at least whenever there's a problem you know that you need
to look at the GC code--almost by definition, the bug can't be
somewhere else. If something's prematurely freed, the question is
always, "what's wrong with the sweep code, that caused that not to get
marked as alive", rather than, "hey, where in my code did the reference
count not get incremented, or get accidentally decremented".

I guess in a sense that's obvious, but I found it interesting that
(arguably) GC is the hardest to implement but the easiest to debug.

Oversimplification I know, but I thought it was interesting.

JEff

Nicholas Clark

unread,
May 10, 2004, 12:42:32 PM5/10/04
to perl6-i...@perl.org, poni...@plum.flirble.org
On Sat, May 08, 2004 at 11:40:10AM +0100, Nicholas Clark wrote:

> So I've fixed that and now it all runs in reasonable time. (As for all its
> inherent horrors, perl's reference counting pretty much gets it right as to
> when it's time to free up something)

Reasonable time being:

perl5.9.1 compiled with the same options as ponie uses, and -DPURIFY to force
similar malloc() behaviour runs all regression tests:

All tests successful.
u=1.7 s=0.77 cu=161.8 cs=14.92 scripts=782 tests=75469

ponie fresh from CVS:

All tests successful.
u=2.6 s=1.33 cu=325.74 cs=27.77 scripts=780 tests=75405

so about half the speed. Currently ponie has 2 additional big overheads that
perl doesn't

1: it's also allocating a PMC for every scalar, registering it at creation,
de-registering it when the reference count hits zero, and causing parrot
to make GC runs that will find no garbage
2: all perl data structure operations are going through functions, rather than
being direct lookups. Additionally all the lookups are through two levels
of pointer indirection rather than one.

As the code gets re-factored both these will go.

Nicholas Clark

Nicholas Clark

unread,
May 10, 2004, 12:47:42 PM5/10/04
to perl6-i...@perl.org, poni...@perl.org
I should check my aliases before hitting send, shouldn't I?

----- Forwarded message from Nicholas Clark <ni...@ccl4.org> -----

Mailing-List: contact perl6-inte...@perl.org; run by ezmlm
Precedence: bulk
From: Nicholas Clark <ni...@ccl4.org>
To: perl6-i...@perl.org, poni...@plum.flirble.org
Subject: Re: pants ponie performace

Reasonable time being:

ponie fresh from CVS:

Nicholas Clark

----- End forwarded message -----

Leopold Toetsch

unread,
May 10, 2004, 2:29:36 PM5/10/04
to Nicholas Clark, perl6-i...@perl.org, poni...@perl.org
Nicholas Clark <ni...@ccl4.org> wrote:

> Reasonable time being:

> perl5.9.1 compiled with the same options as ponie uses, and -DPURIFY to force
> similar malloc() behaviour runs all regression tests:

Parrot built optimized? (perl Configure.pl --optimize ...)

> so about half the speed. Currently ponie has 2 additional big overheads that
> perl doesn't

That's not too bad, when I see ...

> 1: it's also allocating a PMC for every scalar, registering it at creation,
> de-registering it when the reference count hits zero, and causing parrot
> to make GC runs that will find no garbage

two hash lookpus per variable create/destruction pair. But Parrot
hopefully will find garbage from all your temp vars hopefully and
recycle these...

> As the code gets re-factored both these will go.

Good.

> Nicholas Clark

leo

0 new messages