After many months of lying dormant, I figured I'd get my act together and adapt this patch to the few recent modifications. And this time, I'm posting a benchmark!
My results get about 5% slowdown in the eager case, and the usual 10,000% speedup in the lazy case.
Luke
First, the benchmark (examples/benchmarks/dod_priority.imc):
#! /usr/bin/env parrot # Call this program with an argument of 0 (or no arguments at all) # to see the priority dod behavior. Call with an argument of 1 to # see the full sweep.
.pcc_sub _main non_prototyped .param pmc argv
.const int ntimes = 10000 .const int nestlevel = 1000
$I0 = nestlevel $P0 = new PerlString $P0 = "Foo!" needs_destroy $P0 makenest: unless $I0 goto last dec $I0 $P1 = new PerlArray $P1[0] = $P0 $P0 = $P1 goto makenest last:
# That concludes making the data structure, now let's run # DOD a bunch of times (we're still in the root set in $P0) $I0 = ntimes again: unless $I0 goto last2 dec $I0 if sweep_arg goto sweep_one sweep 0 goto again sweep_one: sweep 1 goto again last2: end .end
/* per interpreter global vars */ INTVAL world_inited; /* Parrot_init is done */ - PMC *mark_ptr; /* last PMC marked used in DOD runs */ PMC *iglobals; /* SArray of PMCs, containing: */ /* 0: PMC *Parrot_base_classname_hash; hash containing name->base_type */ /* 1: PMC *Parrot_compreg_hash; hash containing assembler/compilers */ @@ -255,7 +254,12 @@ /* 3: PMC *Env; hash_like Env PMC */ /* 4: PMC *ParrotInterpreter that's me */ /* 5: PMC *Dyn_libs Array of dynamically loaded ParrotLibrary */ - int has_early_DOD_PMCs; /* Flag that some want immediate destruction */ + UINTVAL num_early_DOD_PMCs; /* how many PMCs want immediate destruction */ + UINTVAL num_early_PMCs_seen; /* how many such PMCs has DOD seen */ + PMC* dod_mark_ptr; /* last PMC marked during a DOD run */ + PMC* dod_trace_ptr; /* last PMC trace_children was called on */ + int lazy_dod; /* flag that indicates whether we should stop + when we've seen all impatient PMCs */ PMC* DOD_registry; /* registered PMCs added to the root set */ struct MMD_table *binop_mmd_funcs; /* Table of MMD function pointers */ PMC** nci_method_table; /* Method table PMC for NCI stubs per class */ Index: include/parrot/pobj.h =================================================================== RCS file: /cvs/public/parrot/include/parrot/pobj.h,v retrieving revision 1.31 diff -u -r1.31 pobj.h --- include/parrot/pobj.h 24 Dec 2003 10:43:08 -0000 1.31 +++ include/parrot/pobj.h 5 Jan 2004 13:30:46 -0000 @@ -224,12 +224,14 @@ */ b_PObj_is_special_PMC_FLAG = 1 << 26,
- b_PObj_needs_early_DOD_FLAG = 1 << 27, + /* true if this is connected by some route to a needs_early_DOD object */ + PObj_high_priority_DOD_FLAG = 1 << 27, + PObj_needs_early_DOD_FLAG = (1 << 27 | 1 << 28),
/* True if the PMC is a class */ - PObj_is_class_FLAG = 1 << 28, + PObj_is_class_FLAG = 1 << 29, /* True if the PMC is a parrot object */ - PObj_is_object_FLAG = 1 << 29 + PObj_is_object_FLAG = 1 << 30
- /* XXX boe: Parrot_really_destroy might call us with mark_ptr not + /* XXX boe: Parrot_really_destroy might call us with dod_mark_ptr not * set. This is neccessary until destruction ordering prevents * the premature destruction of the standardhandles */ - if (!interpreter->mark_ptr) - interpreter->mark_ptr = table[0]; + if (!interpreter->dod_mark_ptr) + interpreter->dod_mark_ptr = table[0];
for (i = 0; i < PIO_NR_OPEN; i++) { if (table[i]) { Index: ops/core.ops =================================================================== RCS file: /cvs/public/parrot/ops/core.ops,v retrieving revision 1.344 diff -u -r1.344 core.ops --- ops/core.ops 2 Jan 2004 16:25:37 -0000 1.344 +++ ops/core.ops 5 Jan 2004 13:30:46 -0000 @@ -878,8 +878,11 @@ =cut
op sweep(inconst INT) { - if ($1 || interpreter->has_early_DOD_PMCs) + if ($1) Parrot_do_dod_run(interpreter, 0); + else + if (interpreter->num_early_DOD_PMCs) + Parrot_do_dod_run(interpreter, DOD_lazy_FLAG); goto NEXT(); }
Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > After many months of lying dormant, I figured I'd get my act together > and adapt this patch to the few recent modifications. And this time, > I'm posting a benchmark!
Wow, thanks.
Some comments:
> - b_PObj_needs_early_DOD_FLAG = 1 << 27, > + /* true if this is connected by some route to a needs_early_DOD object */ > + PObj_high_priority_DOD_FLAG = 1 << 27, > + PObj_needs_early_DOD_FLAG = (1 << 27 | 1 << 28),
Moving the needs_early_DOD_FLAG out of the arena_flags is suboptimal (and probably the reason for the 5% slowdown for the eager case). Now the relevant flags is the high_priority_DOD_FLAG. If I get the patch right, it gets set on the PMC itself and all its parents.
So it would be fine, if the high_priority_DOD_FLAG would be in the arena_flags nibble. The tests in the fast path of pobject_lives can then go away - PMCs that can contain other PMCs can't be the parent of PMCs that need timely destruction, so that should work. And all PMCs, that can contain other PMCs must have a PMC_EXT attached with the next_for_GC pointer.
And if that helps, we can just say: If an object needs timely destruction it must have the PMC_EXT struct attached, so that marking is out of the fast path.
This seems bogus, if the flag is set, the first test has succeeded.
> + if (PObj_high_priority_DOD_TEST(obj) && interpreter->dod_trace_ptr) > + /* set obj's parent to high priority */ > + PObj_high_priority_DOD_SET(interpreter->dod_trace_ptr);
That could go probably below, if the special_PMC_FLAG is set.
>@@ -84,23 +108,38 @@ > - if (PObj_is_special_PMC_TEST(obj)) { > + if (*dod_flags & (PObj_is_special_PMC_FLAG << nm)) {
This seems to be the non-ARENA_DOD_FLAGS case, so above doesn't work.
> After many months of lying dormant, I figured I'd get my act together > and adapt this patch to the few recent modifications. And this time, > I'm posting a benchmark!
> My results get about 5% slowdown in the eager case, and the usual > 10,000% speedup in the lazy case.
Sounds cool; do you have a quick high-level description of what it's all about? (I had seen that needs_early_DOD flag in the source, and didn't know what it was intended to do.)
Jeff Clites writes: > On Jan 5, 2004, at 5:47 AM, Luke Palmer wrote:
> >After many months of lying dormant, I figured I'd get my act together > >and adapt this patch to the few recent modifications. And this time, > >I'm posting a benchmark!
> >My results get about 5% slowdown in the eager case, and the usual > >10,000% speedup in the lazy case.
> Sounds cool; do you have a quick high-level description of what it's > all about? (I had seen that needs_early_DOD flag in the source, and > didn't know what it was intended to do.)
> Thanks!
Okay, here I go, just before I get some sleep and then wake up to tie up the rather insidious bugs in the patch.
We have a problem with several common constructs that Perl will likely require of us. For instance, this one:
{ my $fh = open '> foo'; print $fh: "Hello!\n"; } { my $fh = open '< foo'; print <$fh>; }
In Perl 5, this prints "Hello!" without the blink of an eye (well, if it were valid Perl 5 syntax ;-). But now on Parrot with a opportunistic DOD, this guarantee doesn't hold. If $fh isn't closed before it's opened, the results here are undefined.
Performing a full sweep at every scope exit is not practical: there's enough sub-call overhead as it is. So a new variant of C<sweep> was added: C<sweep 0> would only sweep if there were no PMCs that had been marked as neededing early (or impatient for) DOD.
I and several others quickly saw this as a makeshift solution. There will commonly be at least one such object that will last the entire lifetime of a program. So C<sweep 0>, in practice, was exactly equivalent to C<sweep 1>.
Enter the scheme that this patch implements. It was batted around and refined several times before I implemented it a few months ago. Here's how it goes:
The common case is that none of the impatient objects go out of program view, and moreover that the various paths to find them remain the same. The algorithm takes heavy advantage of this, keeping state across multiple DOD runs.
Each PMC has associated with it a "high priority" flag. First, only the impatient objects have this flag set. But on each DOD run, it's possible to efficiently tell all objects that hold a reference to this high priority object. Those objects also get their high priority flag set. And so on down the chain.
Then during the DOD main loop, if such a high priority object comes along, it is prepended to the queue, temporarily switching the DOD to depth first. This allows it to quickly navigate its way to the impatient objects, and once they are all found, the lazy DOD is terminated. And the more the lazy DOD is run, the faster it becomes.
When one of the impatient objects can't be found, the run is equivalent to a C<sweep 1>, but that happens rarely. Also in this case, all high priority flags are cleared, so that a wide-spreading circular data structure doesn't end up making I<everything> high priority, thereby nullifying the algorithm. This is one of those things I forgot to implement ;-)
Leopold Toetsch writes: > Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > > After many months of lying dormant, I figured I'd get my act together > > and adapt this patch to the few recent modifications. And this time, > > I'm posting a benchmark!
> Wow, thanks.
> Some comments:
> > - b_PObj_needs_early_DOD_FLAG = 1 << 27, > > + /* true if this is connected by some route to a needs_early_DOD object */ > > + PObj_high_priority_DOD_FLAG = 1 << 27, > > + PObj_needs_early_DOD_FLAG = (1 << 27 | 1 << 28),
> Moving the needs_early_DOD_FLAG out of the arena_flags is suboptimal > (and probably the reason for the 5% slowdown for the eager case). Now the > relevant flags is the high_priority_DOD_FLAG. If I get the patch right, > it gets set on the PMC itself and all its parents.
Hmm. Interesting that I did that, as I don't understand arena DOD flags :-) I would appreciate a short explanation, thanks.
Just as a note, you can expect even more delays to the maintainence of this patch. My job just picked up its pace, and I've got to have this compiler cracking at full-featuredness in the next week. I'm not against somebody else maintaining the patch in the meantime :-)
> So it would be fine, if the high_priority_DOD_FLAG would be in the > arena_flags nibble. The tests in the fast path of pobject_lives can then > go away - PMCs that can contain other PMCs can't be the parent of PMCs > that need timely destruction, so that should work. And all PMCs, that > can contain other PMCs must have a PMC_EXT attached with the next_for_GC > pointer.
> And if that helps, we can just say: If an object needs timely > destruction it must have the PMC_EXT struct attached, so that marking > is out of the fast path.
Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > Leopold Toetsch writes: >> Moving the needs_early_DOD_FLAG out of the arena_flags is suboptimal >> (and probably the reason for the 5% slowdown for the eager case). Now the >> relevant flags is the high_priority_DOD_FLAG. If I get the patch right, >> it gets set on the PMC itself and all its parents. > Hmm. Interesting that I did that, as I don't understand arena DOD flags :-) > I would appreciate a short explanation, thanks.
Some additional information beyond docs/memory_internals.pod:
I expect by far more plain scalars being around then aggregates. Plain scalars (w/o properties attached) don't have the PMC_EXT structure, so they don't have the next_for_GC pointer either.
During DOD now only the relevant bits in the arena are touched (or pulled into the processors caches) if such a plain scalar is marked live. The PMCs memory itself isn't touched and not pulled into the caches.
All non-plain scalars and aggregates have the is_special_PMC bit set and are marked by chaining them with the next_for_GC pointer, so that contained PMCs get marked too. This of course needs that the PMC and the PMC_EXT structure is pulled into the caches.
This reduced cache misses during DOD to almost zero.
> Just as a note, you can expect even more delays to the maintainence of > this patch. My job just picked up its pace, and I've got to have this > compiler cracking at full-featuredness in the next week. I'm not > against somebody else maintaining the patch in the meantime :-)
Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > ... I'm not > against somebody else maintaining the patch in the meantime :-)
I went again through the patch and the original one from Sept, 5th. But it seems that one thing is missing in both:
*If* all PMCs which needs_early_DOD are seen live, the DOD run is aborted. But completing the DOD run also resets live bits of non-dead objects, which is missing now.
It seems, that in that case we still need to walk the PMC arenas and reset all live bits. OTOH when ARENA_DOD_FLAGS is on, this isn't too expensive because it can be done my masking a full word worth of 8 PMCs at once. So its still a lot faster then the eager case - hopefully.
Leopold Toetsch writes: > Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > > ... I'm not > > against somebody else maintaining the patch in the meantime :-)
> I went again through the patch and the original one from Sept, 5th. But > it seems that one thing is missing in both:
> *If* all PMCs which needs_early_DOD are seen live, the DOD run is > aborted. But completing the DOD run also resets live bits of non-dead > objects, which is missing now.
> It seems, that in that case we still need to walk the PMC arenas and > reset all live bits. OTOH when ARENA_DOD_FLAGS is on, this isn't too > expensive because it can be done my masking a full word worth of 8 PMCs > at once. So its still a lot faster then the eager case - hopefully.
> Or am I missing something?
I don't think you are. I would have thought that GC be the one to reset the live bits, but there was a lot of DOD that I didn't completely understand. I don't remember explicitly taking such a thing out, but I could have inadvertently.
Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > Leopold Toetsch writes: >> It seems, that in that case we still need to walk the PMC arenas and >> reset all live bits. OTOH when ARENA_DOD_FLAGS is on, this isn't too >> expensive because it can be done my masking a full word worth of 8 PMCs >> at once. So its still a lot faster then the eager case - hopefully.
>> Or am I missing something? > I don't think you are. I would have thought that GC be the one to reset > the live bits, but there was a lot of DOD that I didn't completely > understand.
GC is mostly independent of DOD and it doesn't look at PMCs. GC compacts variable sized memory pools.
I have applied now your patch with some minor cleanup. Funnily clearing the live bits after an aborted lazy DOD run seems not to be necessary. Either the test (t/pmc/timer_7) is bogus or I'm still missing something.
On Sat, Jan 10, 2004 at 02:28:38PM +0100, Leopold Toetsch wrote: > Luke Palmer <fibon...@babylonia.flatirons.org> wrote: > > Leopold Toetsch writes: > >> It seems, that in that case we still need to walk the PMC arenas and > >> reset all live bits. OTOH when ARENA_DOD_FLAGS is on, this isn't too > >> expensive because it can be done my masking a full word worth of 8 PMCs > >> at once. So its still a lot faster then the eager case - hopefully.
> >> Or am I missing something?
> > I don't think you are. I would have thought that GC be the one to reset > > the live bits, but there was a lot of DOD that I didn't completely > > understand.
> GC is mostly independent of DOD and it doesn't look at PMCs. GC compacts > variable sized memory pools.
> I have applied now your patch with some minor cleanup. Funnily clearing > the live bits after an aborted lazy DOD run seems not to be necessary. > Either the test (t/pmc/timer_7) is bogus or I'm still missing something.
I'm seeing compile failures on FreeBSD now:
src/dod.c: In function `clear_live_bits': src/dod.c:755: `cur_arena' undeclared (first use in this function) src/dod.c:755: (Each undeclared identifier is reported only once src/dod.c:755: for each function it appears in.) src/dod.c *** Error code 1
The appended patch cures it (and all tests pass) but I'm not sure if it is correct.
Nicholas Clark
Index: src/dod.c =================================================================== RCS file: /cvs/public/parrot/src/dod.c,v retrieving revision 1.80 diff -p -u -r1.80 dod.c --- src/dod.c 10 Jan 2004 13:47:50 -0000 1.80 +++ src/dod.c 10 Jan 2004 16:05:52 -0000 @@ -752,7 +752,7 @@ clear_live_bits(Parrot_Interp interprete } #else Buffer *b = arena->start_objects; - for (i = 0; i < cur_arena->used; i++) { + for (i = 0; i < arena->used; i++) { PObj_live_CLEAR(b); b = (Buffer *)((char *)b + object_size); }
Nicholas Clark <n...@ccl4.org> wrote: > src/dod.c: In function `clear_live_bits': > src/dod.c:755: `cur_arena' undeclared (first use in this function) > The appended patch cures it (and all tests pass) but I'm not sure if it > is correct.
Ah yep, thanks. I didn't test the patch with ARENA_DOD_FLAGS disabled and finally diceded that "arena" is as meaningful and readable as "cur_arena" and changed that before submitting - a policy which almost always causes an error somewhere :)
On Sat, Jan 10, 2004 at 08:39:51PM +0100, Leopold Toetsch wrote: > Nicholas Clark <n...@ccl4.org> wrote:
> > src/dod.c: In function `clear_live_bits': > > src/dod.c:755: `cur_arena' undeclared (first use in this function)
> > The appended patch cures it (and all tests pass) but I'm not sure if it > > is correct.
> Ah yep, thanks. I didn't test the patch with ARENA_DOD_FLAGS disabled > and finally diceded that "arena" is as meaningful and readable as > "cur_arena" and changed that before submitting - a policy which almost > always causes an error somewhere :)
How come most tinderboxes kept going without failing? What's making the choice on the value of ARENA_DOD_FLAGS ?
Yes, I think arena makes as much sense as cur_arena, implying that the cur_ is redundant.
> On Sat, Jan 10, 2004 at 08:39:51PM +0100, Leopold Toetsch wrote: >> Nicholas Clark <n...@ccl4.org> wrote:
>>> src/dod.c: In function `clear_live_bits': >>> src/dod.c:755: `cur_arena' undeclared (first use in this function)
>>> The appended patch cures it (and all tests pass) but I'm not sure if >>> it >>> is correct.
>> Ah yep, thanks. I didn't test the patch with ARENA_DOD_FLAGS disabled >> and finally diceded that "arena" is as meaningful and readable as >> "cur_arena" and changed that before submitting - a policy which almost >> always causes an error somewhere :)
> How come most tinderboxes kept going without failing? What's making the > choice on the value of ARENA_DOD_FLAGS ?
It gets set in include/parrot/pobj.h, and is basically set to true if your platform has some flavor of memalign(), which allows you to allocate chunks of memory with arbitrary power-of-2 alignment. So all the platforms being tested on the tinders probably have this. (Of course, you can manually set ARENA_DOD_FLAGS to false in the source, for testing.)
Jeff Clites <jcli...@mac.com> writes: > It gets set in include/parrot/pobj.h, and is basically set to true if > your platform has some flavor of memalign(), which allows you to > allocate chunks of memory with arbitrary power-of-2 alignment. So all > the platforms being tested on the tinders probably have this. (Of > course, you can manually set ARENA_DOD_FLAGS to false in the source, > for testing.)
Actually, at least FreeBSD doesn't have either of the memalign()'s parrot looks for, which is what Nicholas (and I) ran into.
(If OpenBSD doesn't have one either, there's some clean-up to be done in parrot/config/init/hints/openbsd.pl :-) ) -- Lars Balker Rasmussen Consult::Perl
Nicholas Clark <n...@ccl4.org> wrote: > How come most tinderboxes kept going without failing? What's making the > choice on the value of ARENA_DOD_FLAGS ?
ARENA_DOD_FLAGS is turned on by default. If there is no memalign or such library function (which it depends on), this define is disabled.
On Sat, Jan 10, 2004 at 04:25:58PM -0800, Jeff Clites wrote: > allocate chunks of memory with arbitrary power-of-2 alignment. So all > the platforms being tested on the tinders probably have this. (Of > course, you can manually set ARENA_DOD_FLAGS to false in the source, > for testing.)
My tinderbox running on FreeBSD definitely didn't have it, and the OpenBSD tinderbox returned from burning to success at the same time as mine. (Not that I've checked the actual case of its failure, but I think only one patch got committed around then)