I'm exploring the idea of forbidding in-place modification of STRINGs in the C
API; the functions will return new STRING headers with the changes. This has
implications for PIR code which expects that STRINGs have reference semantics
-- that you can modify a STRING referred by multiple locations.
Currently Parrot seems to prefer reference semantics. A handful of
frequently-called C functions perform a copy-on-write (COW) operation to
create a new STRING header every time a STRING header escapes -- in other
words, because they can't tell if the escaping header will get modified, they
have to allocate a new header with COW semantics for every escaping header,
even if the header only ever gets read (or becomes garbage immediately).
The NQP-rx benchmark represents some likely HLL performance:
./parrot ext/nqp-rx/nqp-rx.pbc --target=pir Actions.pm
Some ~72% of all STRING COW headers created are for internal bookkeeping only
-- to prevent the accidental modification of a STRING out from underneath
something else that uses it. This occurs in two places in the benchmark. The
first is when fetching the STRING contents of a Key PMC. The second is when
using a constant STRING (one created with CONST_STRING in our .c files, for
example, or appearing as a literal in PIR) as a parameter to a function.
Another occasion which does not appear in this benchmark is when fetching the
name of a Class. (You can imagine how modifying that STRING in place would
cause problems.)
Note that the String PMC's get_string() vtable entry always returns a COW
STRING. The set S, SC opcode performs COW on the STRING constant.
Removing the always-COW from the Key PMC (when dealing with STRINGs) speeds up
the benchmark by 2.504%.
Removing the always-COW from constant STRINGs used as function parameters
speeds up the benchmark by 1.204%.
Both together speed up the benchmark by 3.678%.
This particular benchmark shows no change in GC performance, which suggests
that the GC pressure primarily comes from PMCs. Another benchmark with
different STRING usage would show more benefit if it had STRING pressure on the
GC.
A couple of test files show failures with these changes, but they're where you
might expect them:
t/op/string.t (Wstat: 11 Tests: 392 Failed: 0)
Non-zero wait status: 11
Parse errors: Bad plan. You planned 411 tests but ran 392.
t/pmc/key.t (Wstat: 11 Tests: 8 Failed: 0)
Non-zero wait status: 11
Parse errors: Bad plan. You planned 9 tests but ran 8.
-- c
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev
Besides the few test cases that you mention, do we have a lot of
places where strings are specifically used with reference semantics in
order to do inplace modifications in multiple places in the code? That
is, is this going to be a huge change to existing PIR code?
--Andrew Whitworth
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev
> A total benchmark improvement of ~3.7% is certainly nothing to ignore.
> However, this isn't going to be a 100% gain: we do after all need to
> factor in the need to create new string headers and possibly allocate
> new buffer storage on string modifications. We're going to be better
> than even, but I don't thnk we're going to be at 3.7% after all those
> changes are made.
The question is whether we pay the price of allocating a new COW header for
every STRING we don't want anything else to modify or allocating a new header
for every STRING we know is a modification.
> Besides the few test cases that you mention, do we have a lot of
> places where strings are specifically used with reference semantics in
> order to do inplace modifications in multiple places in the code? That
> is, is this going to be a huge change to existing PIR code?
I don't know that the test suite is representative of existing code. Allison
suggested asking HLL developers and people who've written PIR libraries about
their expectations of STRINGs in S registers in PIR.