Changes are:
- SPMC (small or scalar PMC) with half the size of a PMC, no promotion
or whatever to a PMC, disabled with one define in pmc.c
- pool flags with aligned pools, disabled in pobj.h
- key.c and list.c flags cleanup
- mark constant PMCs in DOD[1], hash undef PMC changes
- minor cleanups
The first two are pretty much encapsulated and disabled by default. They
can be easily removed at any time. SPMCs could get really functional
after vtable var/value changes.
[1] we currently have constant KEYs and a global undef object in
perlhash, that live in the normal PMC pool and thus where skipped during
free_unused_pobjects().
During playing with pool flags, I did remove checking for constant
items, so I introduced marking them during trace_active_PMCs. The final
solution should be, to create a constant PMC header pool and a new init
function "pmc_new_constant_noinit()".
leo
My personal opinion, worth the paper it's printed on: all of those
sound very good except for the first, which makes me nervous. It adds
a lot of complexity if those PMCs ever need to be promoted, and I'm
not clear on the advantages. The space advantage is obvious, but I
would guess not all that large. I don't understand the cache advantage
-- is it perhaps doubling the usable size of the cache because the
larger PMCs are all aligned similarly, and so half the cache is never
used? If so, would staggering the alignment of the PMCs be enough to
get the same gain (possibly with reordering the fields)? For example,
make every other PMC allocated fall on a different alignment. I
haven't been paying much attention recently, so I'm probably just
being naive and dumb, but my intuition is concerned about adding this
complexity when we're not totally confident of what's going on. Have
you tried running cachegrind or something similar on the two versions?
> On Jan-09, Leopold Toetsch wrote:
>
>>So the question is, should I checked it in / partially / forget it.
>>
>>Changes are:
>>- SPMC (small or scalar PMC) with half the size of a PMC, no promotion
>>or whatever to a PMC, disabled with one define in pmc.c
>>- pool flags with aligned pools, disabled in pobj.h
>>- key.c and list.c flags cleanup
>>- mark constant PMCs in DOD[1], hash undef PMC changes
>>- minor cleanups
> My personal opinion, worth the paper it's printed on: all of those
> sound very good except for the first, which makes me nervous.
key, list and some cleanup is in.
> ... It adds
> a lot of complexity if those PMCs ever need to be promoted, and I'm
> not clear on the advantages. The space advantage is obvious, but I
> would guess not all that large.
Space advantage is 50% which goes almost directly with same speed advantage.
Promoting wouldn't be too hard, IMHO, when vtable changes and references
are in. We make the original PMC a reference to the bigger promoted
type, should work as references work on behalf of the referenced object.
> ... I don't understand the cache advantage
> -- is it perhaps doubling the usable size of the cache because the
> larger PMCs are all aligned similarly, and so half the cache is never
> used? If so, would staggering the alignment of the PMCs be enough to
> get the same gain (possibly with reordering the fields)?
You get double the amount of PMCs into the cache - used during marking
and freeing. It isn't related to alignment, just more throughput.
> ... Have
> you tried running cachegrind or something similar on the two versions?
No, only stress*.pasm. But:
$ time parrot -j stress.pbc
A total of 9 DOD runs were made
real 0m0.708s
But this still could go faster:
$ parrot -j stress.pbc # w/o pmc->synchronize (-10% size)
A total of 9 DOD runs were made
real 0m0.635s
$ time parrot -j stress.pbc # half sized (16 byte PMC)
$ make -s && time parrot -j stress.pbc
A total of 13 DOD runs were made
real 0m0.378s
leo
Oh. You're right. I was thinking that the unused portion of the PMC
wouldn't need to be loaded into the cache, so that only the "active"
portions of the PMCs would ever be loaded. Which is a fine argument,
if your objects are larger than a cache line. But probably few CPUs we
care about have only 32-byte cache lines.
Ah! So all we have to do is use discontiguous PMCs -- the first 32
bytes is at offset 0, the second at byte offset 128 or so. Then we can
interleave them, so that everything in offset 0..127 gets loaded into
the cache, but 128..255 is left untouched. (Just kidding.)
> Ah! So all we have to do is use discontiguous PMCs -- the first 32
> bytes is at offset 0, the second at byte offset 128 or so. Then we can
> interleave them, so that everything in offset 0..127 gets loaded into
> the cache, but 128..255 is left untouched. (Just kidding.)
s/32/16/
But I don't think, this is a bad idea. The real size of a PMC is used
only once for allocating the arena. So by faking a bigger structure for
the compiler and interleaving this, we could really have double density
for the pobj->flags during a DOD run.
This would need to determine the cache line size (or allocating allways
more PMCs then a cache line size).
leo