is_pmc_ptr

3 views
Skip to first unread message

Jonathan Worthington

unread,
Sep 3, 2011, 7:06:47 PM9/3/11
to parro...@lists.parrot.org
Hi,

Compiling the Rakudo setting creates a lot of objects (the parse tree,
the PAST tree, the model and so forth). Not only that, but since it's
building up a complete view of the thing it's compiling, those objects
tend to stay around for a long time. Innevitably, that makes for a
pretty large working set. Of course, there's lots of very short lived
things (CallContext, for example) that come and go.

From the profiles I've been looking at, in situations where we make
lots of objects but they are short lived, the GC seems to handle
reasonably well. In theory, since it's generational, it'd continue to do
so when we're doing things like compiling the setting, since the tree
nodes we're keeping around survive a collection and end up in an older
generation, and the regular collections just grab the short-lived
things. However, the actual profile results are not so rosy.

Tonight I dug the profile data I was seeing, and found that
gc_gms_is_pmc_ptr seems to be a hot path. A little analysis later, I
think I know what's going on. When we do a GC run, we have to find all
pointers to PMCs and STRINGs, and that includes scanning the stack.
Since we can't be precise there, we have to validate the things we've
seen really are string and PMC pointers. gc_gms_is_pmc_ptr is the
function that does that. However, we'd not expect this to be so costly -
the C stack isn't really going to be that deep. Yet it shows up high in
the profile. What's going on?

The PMC headers are allocated from a pool. A pool is a linked list of
arenas. And arena is a chunk of memory holding a bunch of PMC headers.
Thus if we have a PMC pointer, it must have an address at a valid offset
somewhere between the start address and the end address of one of the
arenas in the pool. The way that we determine this is "the obvious":
walk the linked list, and check if the pointer lies within the bounds of
the arena, and at a valid offset.

Unfortunatley, I suspect this is actually quite expensive. An arena is
4096 bytes. On a 32 bit machine, given that a PMC is 16 bytes and
there's an extra 4 bytes for a pointer used by the GC, we get about 200
PMCs in an arena. On a 64 bit machine I guess that drops off to about
100. So if we had, say, 100,000 PMCs around in memory, that's 500/1000
linked list follows to determine it's not a pointer, and average 250/500
if it is (maybe temporal locality biases the search one way or the
other). And we probably have many more PMCs than that. Also note that we
do this per "candidate" pointer that we see in the memory block.

That would be less than ideal, but there's another issue: we're likely
jumping all over the place in memory as we visit these arenas. We're
probably getting a ton of CPU cache misses as we do so, which is going
to make walking the linked list more expensive than it may otherwise be.
We go through this a bunch of times, and given the arenas probably line
up well with pages, it's possible that we hit unfortunate overlap issues
with the cache too (IIUC, set associative ones can do weird things in
these cases).

One obvious fix would be to just keep an extent list: an array of
(start,end) pairs, hanging off the pool header, which we can very
cheaply scan through (since it's just a bunch of values laid out
contiguously in memory). That way, we don't go chasing all over RAM, and
clobber the cache (though since we're doing a GC mark we may well be
about to anyway...)

It'd be interesting to know if anybody else sees things like this when
profile Rakudo (nom branch) setting compilation, and of course a patch
would be interested to see too (and, in theory, shouldn't be too hard).

Thanks,

Jonathan

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Nicholas Clark

unread,
Sep 4, 2011, 5:18:37 AM9/4/11
to Jonathan Worthington, parro...@lists.parrot.org
On Sun, Sep 04, 2011 at 01:06:47AM +0200, Jonathan Worthington wrote:

> One obvious fix would be to just keep an extent list: an array of
> (start,end) pairs, hanging off the pool header, which we can very
> cheaply scan through (since it's just a bunch of values laid out
> contiguously in memory). That way, we don't go chasing all over RAM, and

I think you can binary search it, as it's effectively an inversion list.
O(log n) will beat O(n) for any non-trivial size :-)


Nicholas Clark
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Andrew Whitworth

unread,
Sep 4, 2011, 8:31:24 AM9/4/11
to Nicholas Clark, parro...@lists.parrot.org
I've prototyped this idea in the whiteknight/pmc_is_ptr branch. It's
passing tests in the branch, but I haven't done any benchmarking yet.
Code review, additional testing, and benchmark results would all be
appreciated

--Andrew Whitworth

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Jonathan Worthington

unread,
Sep 4, 2011, 10:38:39 AM9/4/11
to Nicholas Clark, parro...@lists.parrot.org
On 04/09/2011 11:18, Nicholas Clark wrote:
> On Sun, Sep 04, 2011 at 01:06:47AM +0200, Jonathan Worthington wrote:
>
>> One obvious fix would be to just keep an extent list: an array of
>> (start,end) pairs, hanging off the pool header, which we can very
>> cheaply scan through (since it's just a bunch of values laid out
>> contiguously in memory). That way, we don't go chasing all over RAM, and
> I think you can binary search it, as it's effectively an inversion list.
> O(log n) will beat O(n) for any non-trivial size :-)
>
Yeah, that would probably be an additional win. Takers wanted. :-)

That said, it turns out even the naive O(n) way is very than worth doing
- tadzik++ reported a somewhat scary 25% improvement in compiling the
rakudo/nom setting after this, and the function in question has fallen
way down my profiler output.

Jimmy Zhuo

unread,
Sep 5, 2011, 9:19:19 AM9/5/11
to parro...@lists.parrot.org
I don't know how to binary search it, it's not an ordered array.

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Nicholas Clark

unread,
Sep 5, 2011, 3:00:09 PM9/5/11
to Jimmy Zhuo, parro...@lists.parrot.org
On Mon, Sep 05, 2011 at 06:19:19AM -0700, Jimmy Zhuo wrote:
> I don't know how to binary search it, it's not an ordered array.

I was assuming that it could be kept ordered.

(And that the cost of keeping it ordered by inserting each new arena at the
right point in the middle is less than the benefits of being able to use a
binary search rather than a linear scan.)

Andrew Whitworth

unread,
Sep 6, 2011, 8:45:34 AM9/6/11
to Nicholas Clark, Jimmy Zhuo, parro...@lists.parrot.org
I've just merged the branch and the fixes from not_gerd into master.
Improvements like the binary search of arenas is a good idea, but is
only incremental. It's not going to blow the benchmarks out of the
water to replace it with a binary search at this point.

As plobsing pointed out on IRC (several times), the real fix is to
move away from stack walking and make a proper, precise GC. I'm not
sure if such a thing will be possible before M0, but there are real
wins to be had if we can do it. If anybody has any ideas regarding
that journey, I would be very interested to hear them.

Thanks,

--Andrew Whitworth

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply all
Reply to author
Forward
0 new messages