Object freezing

0 views
Skip to first unread message

Dan Sugalski

unread,
Oct 20, 2003, 4:38:15 PM10/20/03
to perl6-i...@perl.org
Since this has come up again, and it's apparent that the last time around
I wasn't sufficiently clear, it's time to go through this again, and for
the final time. (I will beat this thing into the ground by the time we're
done)


The way object serialization will be handled from the bytecode level is
simple: You call freeze or thaw, like so:

freeze S3, P5 # Freezes the P5 PMC, and children, to the string in S3
thaw P5, S3 # Thaw out the serialized object


From within the PMC vtables, there are two vtable entries that are of
importance:

void freeze(interpreter, STRING *)
PMC *thaw(interpreter, STRING *)

freeze tells the PMC to freeze itself and put the results of the freezing
in the string. This is an object method

Thaw tells the class to construct a PMC based on the passed-in string
data. It's a class method, though it may be called on an existing PMC. It
always produces a new PMC. (We can argue over that one if need be)


The runtime will provide the following functionality

Freezing core singleton data (int, num, string, PMC)
freezing named lists
freezing named key/value lists
thaw core singleton data
thaw named lists
thaw named key/value lists

chill PMC *
warm STRING *

When freezing or thawing, if child PMCs are frozen/thawed using the
appropriate calls to the runtime (rather than the freeze/thaw method of a
PMC doing it manually for child PMCs) then the runtime will guarantee that
multiply-referenced PMCs will be instantiated only once.

The chill and warm runtime methods take a PMC or a frozen representation
of a PMC (respectively) and provide a human readable version of that PMC.
If the PMC has a __debug_chill or __debug_warm method (which are
optional) then that method will be called to provide a human-readable
version of the PMC, otherwise the default will be used.

The encoding methods for freezing (and corresponding decoding methods for
thawing) may be overridden to provide an alternate serialization format.
The only requirement of the serialziation format is that it starts with a
minimally valid piece of XML that encodes the format and version of the
serialized format. The rest of the serialization format need not be XML.
This is done because the format and version of the serialized data are
required in the stream, and making it XML incoveniences nobody and makes
the XML folks happy. It's good enough, and not up for discussion.

The overriding API is not, as yet, specified. We can do that when we're
done fighting out over the semantics here.


The following rules will be in place:

1) Freezing at the destruction level may *not* use any additional memory
for object traversal
2) Overlapping freezes are not allowed
3) Freezing while a freeze is in progress is deferred until the current
freeze is done
4) Destruction level freezing may not freeze non-dead PMCs

I can be convinced that #4 is not going to fly, but be aware that it will
potentially cost an extra pointer per PMC.

Requirement #1 generally mandates that all PMCs must have sufficient
information available all the time to perform serialization. It also
mandates that there can't be iterators, save hashes, or other whatnots,
since the potential for destroy-time serialization means those methods are
untenable. I'm open to argument that the freeze and thaw methods can be
context sensitive and the non-destroy case can be memory hungry and
multithreaded, but that's a dodgy thing. Make a really good case.

Note that I do *not* want to have multiple object traversal systems in
parrot! We have one for DOD, and proposals have ranged upwards from there.
No. That is *not* happening--the chance for error is significant, the
side-effects of the error annoying and tough to track down for complex
cases (akin to the trouble with tracking down GC issues), and just not
necessary. (Perhaps desirable for speed/space reasons, but desirable
isn't necessary) This is something that's hidden under a number of layers
of API, so regardless of the outcome it doesn't affect the assembly, PMC,
or runtime API.

Clone can be handled by switching in encodings (one that produces a PMC
rather than an encoded stream, or one that just does an intermediate
string and chews memory, at least for now), and doesn't require a separate
API entry point. I'm unconvinced that objects need to clone themselves
often enough to bother with a separate API entry for 'em, or if one is
provided that VTABLE_clone() should do anything for non-simple objects
besides calling VTABLE_thaw(VTABLE_freeze()).


The thread-safety is an issue, and as such interpreters that share data
should share a mutex. At the moment I think we should have separate
symbolic mutexes that may, potentially, resolve down to a single mutex,
but when freezing an interpreter should grab its freeze mutex. If there
are multiple interperters in the thread group they will all have the same
mutex, and as such will be single-threaded for the freeze.

We can fight about this, too, but the fight is orthogonal to the other
issues.

Dan

Dan Sugalski

unread,
Oct 20, 2003, 4:46:19 PM10/20/03
to perl6-i...@perl.org
Note that I forgot to account for the possibility of freezing code and
environment. As such, the freeze and thaw opcodes have the following
variant:

freeze Sx, Py, Iz
thaw Py, Sx, Iz

where Z is the freeze/thaw level. O (the default) is the PMC only, 1
includes the global/class data for the PMC, and 2 includes the code for
the packages for the PMC, pruned for accessibility.

Also note that if teh PMC being frozen is a closure or continuation, then
all lexical and global data to reinstantiate the PMC (including
potentially code) will be frozen at level 0.

Dan

Melvin Smith

unread,
Oct 20, 2003, 8:29:11 PM10/20/03
to Dan Sugalski, perl6-i...@perl.org
At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
>The encoding methods for freezing (and corresponding decoding methods for
>thawing) may be overridden to provide an alternate serialization format.
>The only requirement of the serialziation format is that it starts with a
>minimally valid piece of XML that encodes the format and version of the
>serialized format. The rest of the serialization format need not be XML.
>This is done because the format and version of the serialized data are
>required in the stream, and making it XML incoveniences nobody and makes
>the XML folks happy. It's good enough, and not up for discussion.

Does that mean all encodings will start with the "standard" markup header:

<xml version="1.0"?>

I'm not sure what minimally valid means, but I think this is considerable
overhead.

Also, I don't think I like using freeze/thaw format for the standard PBC
format if we are embedding XML everywhere, which is ok, but Leo had
mentioned that we would use freeze/thaw for loading classes. With
XML in the picture I think it becomes too bulky.

If we are to use recursive serialization for classes then we'd end up with
fat bytecodes; every field and method would have the XML header.

-Melvin

Uri Guttman

unread,
Oct 20, 2003, 8:41:21 PM10/20/03
to Melvin Smith, Dan Sugalski, perl6-i...@perl.org
>>>>> "MS" == Melvin Smith <mrjol...@mindspring.com> writes:

MS> At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
>> The encoding methods for freezing (and corresponding decoding methods for
>> thawing) may be overridden to provide an alternate serialization format.
>> The only requirement of the serialziation format is that it starts with a
>> minimally valid piece of XML that encodes the format and version of the
>> serialized format. The rest of the serialization format need not be XML.
>> This is done because the format and version of the serialized data are
>> required in the stream, and making it XML incoveniences nobody and makes
>> the XML folks happy. It's good enough, and not up for discussion.

MS> Does that mean all encodings will start with the "standard" markup header:

MS> <xml version="1.0"?>

MS> I'm not sure what minimally valid means, but I think this is considerable
MS> overhead.

how so? most serialized things (other than simple scalars) will generate
much more data than that.

MS> Also, I don't think I like using freeze/thaw format for the
MS> standard PBC format if we are embedding XML everywhere, which is
MS> ok, but Leo had mentioned that we would use freeze/thaw for
MS> loading classes. With XML in the picture I think it becomes too
MS> bulky.

i don't he was referring to the current p5 freeze/thaw formats or
code. he was just using freeze/thaw as the methods/calls to be made to
do the work. the actual formats are undecided now AFAICT.

MS> If we are to use recursive serialization for classes then we'd end
MS> up with fat bytecodes; every field and method would have the XML
MS> header.

the xml header is only for the top level thing in the serialized
tree. if it is nonstandard you have to mark the serialized string so you
can call the matching thaw methods. each object in the serialized tree
will have to support that method or some code has to be supplied to
handle all the freeze/thaw calls made by the tree traverser code. so the
xml header is just a way to mark which external class will be used for
the freeze/thaw and it will always be called for each object in the
tree. you can't mix/match different freeze/thaw techniques in one
operation (yes, you could but then you do have to mark each node with
its technique which is a lot of overhead and painful in other ways).

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Melvin Smith

unread,
Oct 20, 2003, 9:23:42 PM10/20/03
to Uri Guttman, Dan Sugalski, perl6-i...@perl.org
At 08:41 PM 10/20/2003 -0400, Uri Guttman wrote:
> >>>>> "MS" == Melvin Smith <mrjol...@mindspring.com> writes:
>
> MS> At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
> >> The encoding methods for freezing (and corresponding decoding
> methods for
> >> thawing) may be overridden to provide an alternate serialization format.
> >> The only requirement of the serialziation format is that it starts
> with a
> >> minimally valid piece of XML that encodes the format and version of the
> >> serialized format. The rest of the serialization format need not be XML.
> >> This is done because the format and version of the serialized data are
> >> required in the stream, and making it XML incoveniences nobody and makes
> >> the XML folks happy. It's good enough, and not up for discussion.
>
> MS> Does that mean all encodings will start with the "standard" markup
> header:
>
> MS> <xml version="1.0"?>
>
> MS> I'm not sure what minimally valid means, but I think this is
> considerable
> MS> overhead.
>
>how so? most serialized things (other than simple scalars) will generate
>much more data than that.

See below.

> MS> Also, I don't think I like using freeze/thaw format for the
> MS> standard PBC format if we are embedding XML everywhere, which is
> MS> ok, but Leo had mentioned that we would use freeze/thaw for
> MS> loading classes. With XML in the picture I think it becomes too
> MS> bulky.
>
>i don't he was referring to the current p5 freeze/thaw formats or
>code. he was just using freeze/thaw as the methods/calls to be made to
>do the work. the actual formats are undecided now AFAICT.
>
> MS> If we are to use recursive serialization for classes then we'd end
> MS> up with fat bytecodes; every field and method would have the XML
> MS> header.
>
>the xml header is only for the top level thing in the serialized
>tree. if it is nonstandard you have to mark the serialized string so you
>can call the matching thaw methods. each object in the serialized tree
>will have to support that method or some code has to be supplied to
>handle all the freeze/thaw calls made by the tree traverser code. so the
>xml header is just a way to mark which external class will be used for
>the freeze/thaw and it will always be called for each object in the
>tree. you can't mix/match different freeze/thaw techniques in one
>operation (yes, you could but then you do have to mark each node with
>its technique which is a lot of overhead and painful in other ways).

That answers my question of overhead with regards to XML headers.
If there is a single header for defining the type of "stream" then the
actual serialization can be dense enough.

I just needed clarification. :)

I think a grammar should be developed for this, since it should likely
be implemented as a recursive parser.

-Melvin


Uri Guttman

unread,
Oct 20, 2003, 9:56:30 PM10/20/03
to Melvin Smith, Dan Sugalski, perl6-i...@perl.org
>>>>> "MS" == Melvin Smith <mrjol...@mindspring.com> writes:

MS> That answers my question of overhead with regards to XML headers.
MS> If there is a single header for defining the type of "stream" then the
MS> actual serialization can be dense enough.

MS> I just needed clarification. :)

well, that was my take on what dan said. i could have misinterpreted him.

MS> I think a grammar should be developed for this, since it should likely
MS> be implemented as a recursive parser.

there are many ways to encode serialized stuff and not all are conducive
to grammars IMO. some may be simple type/length/binary value things
which are best expanded in nice fast loops. other formats could just go
into pure ugly XML which can be parsed back into data. also dan
mentioned the half-frozen human readable types (which both the binary
AND XML formats need!) which don't (or may not) get parsed again.

Melvin Smith

unread,
Oct 20, 2003, 10:15:21 PM10/20/03
to Uri Guttman, Dan Sugalski, perl6-i...@perl.org
At 09:56 PM 10/20/2003 -0400, Uri Guttman wrote:
> >>>>> "MS" == Melvin Smith <mrjol...@mindspring.com> writes:
>
> MS> That answers my question of overhead with regards to XML headers.
> MS> If there is a single header for defining the type of "stream" then the
> MS> actual serialization can be dense enough.
>
> MS> I just needed clarification. :)
>
>well, that was my take on what dan said. i could have misinterpreted him.
>
> MS> I think a grammar should be developed for this, since it should likely
> MS> be implemented as a recursive parser.
>
>there are many ways to encode serialized stuff and not all are conducive
>to grammars IMO. some may be simple type/length/binary value things
>which are best expanded in nice fast loops. other formats could just go
>into pure ugly XML which can be parsed back into data. also dan
>mentioned the half-frozen human readable types (which both the binary
>AND XML formats need!) which don't (or may not) get parsed again.

When I say grammar, I don't mean as input to a parser generator.
I'm accustomed to using grammars for any sort of non-trivial packet or
data format. It makes dealing with and discussing the format much easier.

-Melvin


Gregor N. Purdy

unread,
Oct 21, 2003, 1:09:33 AM10/21/03
to Uri Guttman, Melvin Smith, Dan Sugalski, perl6-i...@perl.org

> the xml header is only for the top level thing in the serialized
> tree. if it is nonstandard you have to mark the serialized string so you
> can call the matching thaw methods. each object in the serialized tree
> will have to support that method or some code has to be supplied to
> handle all the freeze/thaw calls made by the tree traverser code. so the
> xml header is just a way to mark which external class will be used for
> the freeze/thaw and it will always be called for each object in the
> tree. you can't mix/match different freeze/thaw techniques in one
> operation (yes, you could but then you do have to mark each node with
> its technique which is a lot of overhead and painful in other ways).

I find the notion of an "XML header" a bit confusing, given Dan's
statement to the effect that it was a throw to XML folks.

I think anything "XML folks" will be interested in will entail
*wrapping* stuff, not *prefixing* it.

Perhaps Dan meant "wrapper" or "envelope" when he said "header", but
its not clear to me without an example. So, I'll put some examples out
there for folks to throw stuff at:

Here's an example of a Parrot Magic Ice (PMI) header (no love here from
XML folks):

<pmi class="foo" .../>
# Data of some sort determined by class foo

(This is the way I read Dan's original comment.)

Taken in its entirety, this chunk isn't XML. Sure, you could pull out
the first line and pass it to something that understands XML, and it
wouldn't choke. But, if there is value here, I'm missing it. It could
just as well have been SMTP style:

PMI-Class: foo
Some-Other-Header: ...
# Data of some sort


Here's an example of a wrapper:

<pmi class="foo" ...>
<!-- Data of some sort determined by class foo -->
</pmi>

That's a bit better, although bear in mind that if the intent is that
you could throw the entire chunk at an XML parser and have it not
complain, you are going to have to take some care in generating the
guts. Binary data is in general right out (where does it end? What if
it contains fragments that look like XML markup?). Sure, you could
slap it in a <![CDATA[ ... ]]> film, but you'd still have to watch
out for the possibility that the body might want the sequence "]]>"
in it somewhere...

.............................................

OK. Now, I'll throw one crazy idea into the mix. Suppose for just a
moment that instead of using XML proper, or leaving things completely
open-ended, we adopted SAX events as our interchange. it would be
roughly equivalent to:

begin-element pmi { class => 'foo' }
# events for the guts
end-element pmi

Now, anyone who likes XML can hook up a DOM tree builder, or an XML
renderer to the stream of events and be happy as a clam. But, for
storing stuff on disk, we are free to invent a more compact
representation of the events. Thawing entails interpreting the events
as object allocations and state changes to the objects.

I can imagine some reasonably compact representations...


Regards,

-- Gregor

--
Gregor Purdy gre...@focusresearch.com
Focus Research, Inc. http://www.focusresearch.com/

Jeff Clites

unread,
Oct 21, 2003, 1:41:05 AM10/21/03
to Gregor N. Purdy, perl6-i...@perl.org
On Oct 20, 2003, at 10:09 PM, Gregor N. Purdy wrote:

> Here's an example of a wrapper:
>
> <pmi class="foo" ...>
> <!-- Data of some sort determined by class foo -->
> </pmi>
>
> That's a bit better, although bear in mind that if the intent is that
> you could throw the entire chunk at an XML parser and have it not
> complain, you are going to have to take some care in generating the
> guts. Binary data is in general right out (where does it end? What if
> it contains fragments that look like XML markup?). Sure, you could
> slap it in a <![CDATA[ ... ]]> film, but you'd still have to watch
> out for the possibility that the body might want the sequence "]]>"
> in it somewhere...

A slight aside, but just to build on what you said since this is an
often-misunderstood facet of XML, there are a bunch of other reasons
you can't just throw binary data into an arbitrary XML document inside
of a CDATA section:

1) If you declare the encoding of the XML to be UTF-8, then your
document won't be well-formed XML if your binary data doesn't look like
legitimate UTF-8 data (which it won't in the general case--many bytes
and byte sequences can't occur in UTF-8).

2) XML parsers are free to transcode--so a document declared in one
encoding may pass data on to its client application in another
encoding, which will mangle your binary data. For instance, the expat
parser can consume documents in a variety of encodings, but always
passes text to its callbacks in UTF-8.

3) XML parsers are required to line-ending normalize, so anything which
looks like a CR or CRLF will turn into an LF, even inside of a CDATA
section.

That said, you can get around all of this by base-64 encoding your
binary data for storage in XML, since that turns binary data into text.
On the other hand, it's a waste of space and more CPU cycles to consume
than a more obvious binary format.

And that said, I read Dan's email as just meaning an XML header, but I
didn't quite understand exactly what he had in mind either.

JEff

Leopold Toetsch

unread,
Oct 21, 2003, 3:46:20 AM10/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> Since this has come up again, ...

[ FYI: I was starting implementing this, based on a general traverse
vtable with callback functions. Two patches got backed out by
Dan after some discussion in PM ]

> ... and it's apparent that the last time around


> I wasn't sufficiently clear, it's time to go through this again, and for
> the final time.

I'd be really happy, if you could go through my concerns mentioned in
the summary in the thread:
Subject: Re: [RfC] vtable->dump
Date: Thu, 4 Sep 2003 12:31:08 +0200

> ... (I will beat this thing into the ground by the time we're
> done)

Sorry for the inconvenience and being ignorant ...

> PMC *thaw(interpreter, STRING *)

This should IMHO be able to create constant PMCs out of metadata, e.g.
for subroutine objects. So there should be some means to tell thaw() to
create PMC(s) in the constant_pmc_pool.

> The chill and warm runtime methods take a PMC or a frozen representation
> of a PMC (respectively) and provide a human readable version of that PMC.

I dunno, why chill() is superior to dump() or pretty_print(), but the
name doesn't really matter.

> 1) Freezing at the destruction level may *not* use any additional memory
> for object traversal

What is "Freezing at the destruction level"? Is this anyhow related to
destruction ordering?

> Note that I do *not* want to have multiple object traversal systems in
> parrot! We have one for DOD, and proposals have ranged upwards from there.
> No. That is *not* happening--the chance for error is significant, the
> side-effects of the error annoying and tough to track down for complex
> cases (akin to the trouble with tracking down GC issues), and just not
> necessary. (Perhaps desirable for speed/space reasons, but desirable
> isn't necessary)

DOD's mark() routine has different requirements then a general
traverse() for freeze(), chill(), clone(), and destruction ordering.
Using just mark() will have these side effects that you want to avoid.

A general traverse() can be depth first of breadth first, mark() isn't
required do have any specific ordering as long as it sets live bits
everywhere.

mark() is called permanently in a running interpreter, that does non
trivial things. There are shortcuts for scalars, DOD is highly optimized
not to destroy cache coherency. Using mark() also implies to back out
my small PMC patches. All the advantages of smaller scalars are gone
then.

While freeze() and friends have to pull in each PMC into the cache, just
setting the live bit on a PMC hasn't. Further: Lukes proposal for
speeding up timely destruction puts objects either in front or at the
end of the next_for_GC chain. This IMHO implies that mark() is unusable
as your general and solely iterator.

> ... This is something that's hidden under a number of layers


> of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> or runtime API.

So when its hidden, I really don't understand, why you are insisting on
an (IMHO) suboptimal design.

> The thread-safety is an issue,

While all schemes aren't thread-safe from user level (e.g.
manually sorting an array containing shared PMCs, while it gets
frozen), your scheme isn't thread-safe at low-level, as the next_for_GC
pointer inside the PMC is used as a duplicate marker. But if a user
changes shared resources its a user problem. We only guarantee atomic
updates per PMC (s. P6E p 86f by Dan).

> Dan

Comments addressing all these issues are highly welcome,
leo

Juergen Boemmels

unread,
Oct 21, 2003, 7:33:51 AM10/21/03
to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org
Leopold Toetsch <l...@toetsch.at> writes:

> > 1) Freezing at the destruction level may *not* use any additional memory
> > for object traversal

This is a really hard problem. In some early experiments with
destruction ordering (one of the problems wich need iteration) I
didn't get around with allocating new memory, or recursing on the
stack. It may be that we can get arround with a second pointer, but
I'm not sure.

> What is "Freezing at the destruction level"? Is this anyhow related to
> destruction ordering?
>
> > Note that I do *not* want to have multiple object traversal systems in
> > parrot! We have one for DOD, and proposals have ranged upwards from there.
> > No. That is *not* happening--the chance for error is significant, the
> > side-effects of the error annoying and tough to track down for complex
> > cases (akin to the trouble with tracking down GC issues), and just not
> > necessary. (Perhaps desirable for speed/space reasons, but desirable
> > isn't necessary)

I did some benchmarking (to test our hash implementation, but thats a
different story). One thing I found out: We are completely dominated
by gc. I'm not sure if it was trace_systemareas or the mark method,
but don't put any load on mark.

mark should be as fast as possible. The other uses of traverse for
freeze, dump, destruction-ordering etc. are all more or less called on
user request, so the user needs to know its cost.

One other thing that makes mark different. If we ever want to use a
copying collector (Which is not reachable currently because of the
conservative stack-walking) The mark routine needs to know about the
moving of objects. All other traverse routine never get this problem.

> DOD's mark() routine has different requirements then a general
> traverse() for freeze(), chill(), clone(), and destruction ordering.
> Using just mark() will have these side effects that you want to avoid.

My words. mark() is not traverse() also they do similar things.

> A general traverse() can be depth first of breadth first, mark() isn't
> required do have any specific ordering as long as it sets live bits
> everywhere.
>
> mark() is called permanently in a running interpreter, that does non
> trivial things. There are shortcuts for scalars, DOD is highly optimized
> not to destroy cache coherency. Using mark() also implies to back out
> my small PMC patches. All the advantages of smaller scalars are gone
> then.

This ist just on more thing of mark() speed.

> While freeze() and friends have to pull in each PMC into the cache, just
> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> speeding up timely destruction puts objects either in front or at the
> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> as your general and solely iterator.
>
> > ... This is something that's hidden under a number of layers
> > of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> > or runtime API.
>
> So when its hidden, I really don't understand, why you are insisting on
> an (IMHO) suboptimal design.

We have at the moment 15 (in words fifteen) vtable slots for
dividing/remainder, 5 for multiplikation, 24 for bitwise ops. So
bloating the vtable is by design, but it is the end of world if we
special case the most often called function and have 2 (in words two)
walking functions. Sorry, I think there are other places in the vtable
which need some cleanup.

> > The thread-safety is an issue,
>
> While all schemes aren't thread-safe from user level (e.g.
> manually sorting an array containing shared PMCs, while it gets
> frozen), your scheme isn't thread-safe at low-level, as the next_for_GC
> pointer inside the PMC is used as a duplicate marker. But if a user
> changes shared resources its a user problem. We only guarantee atomic
> updates per PMC (s. P6E p 86f by Dan).

The thread safty is less a problem for marking. It only needs to make
sure that other threads don't munge the data they are walking. Write
barriers or mutexes might help here. But how to freeze an object of an
other thread? This needs to freeze the whole thread.

> > Dan
>
> Comments addressing all these issues are highly welcome,
> leo

I think we should address this issue like experimentalists: Create the
general traverse function. (No don't call it mark). Implement freeze,
dump, destruction ordering using this function. When this all is
running, port the mark function to use this new
functionality. Benchmark, and watch the speedup of the brandnew
design (or just find out that the slowdown is not bad enough to
satisfy two walking functions). When the benchmarking is done lets
descide if we need only one walk-function, and only than remove the
mark function.

bye
boe
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Dan Sugalski

unread,
Oct 21, 2003, 8:11:29 AM10/21/03
to Melvin Smith, perl6-i...@perl.org
On Mon, 20 Oct 2003, Melvin Smith wrote:

> At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
> >The encoding methods for freezing (and corresponding decoding methods for
> >thawing) may be overridden to provide an alternate serialization format.
> >The only requirement of the serialziation format is that it starts with a
> >minimally valid piece of XML that encodes the format and version of the
> >serialized format. The rest of the serialization format need not be XML.
> >This is done because the format and version of the serialized data are
> >required in the stream, and making it XML incoveniences nobody and makes
> >the XML folks happy. It's good enough, and not up for discussion.
>
> Does that mean all encodings will start with the "standard" markup header:
>
> <xml version="1.0"?>

Each serialized data stream, yeah. Just once, and probably a few extra
characters in there, to make it legit XML.

Definitely not once per object, just once per stream--if you do:

freeze S3, P5

and P5 happens to have a PMC that points to your top level symbol table,
the string in S3 will be darned huge, and have the <XML foo=?> thing in
there exactly once. (Well, unless you've chosen an XML encoding, in which
case I expect you've just blown memory... :)

Dan

Dan Sugalski

unread,
Oct 21, 2003, 8:21:42 AM10/21/03
to Gregor N. Purdy, Uri Guttman, Melvin Smith, perl6-i...@perl.org
On Mon, 20 Oct 2003, Gregor N. Purdy wrote:

>
> > the xml header is only for the top level thing in the serialized
> > tree. if it is nonstandard you have to mark the serialized string so you
> > can call the matching thaw methods. each object in the serialized tree
> > will have to support that method or some code has to be supplied to
> > handle all the freeze/thaw calls made by the tree traverser code. so the
> > xml header is just a way to mark which external class will be used for
> > the freeze/thaw and it will always be called for each object in the
> > tree. you can't mix/match different freeze/thaw techniques in one
> > operation (yes, you could but then you do have to mark each node with
> > its technique which is a lot of overhead and painful in other ways).
>
> I find the notion of an "XML header" a bit confusing, given Dan's
> statement to the effect that it was a throw to XML folks.
>
> I think anything "XML folks" will be interested in will entail
> *wrapping* stuff, not *prefixing* it.

Nah, I expect what they'll want is for the entire data stream of
serialized objects to be in XML format. Which is fine--they can have that.
(It's why I mentioned the serialization routines can be overridden)

For an XML stream the header might be <xml parrot format='xml'
version=1.0> with the rest of the stream in XML. A YAML stream would start
<xml parrot format='yaml' version=1.0> with the rest in YAML, and teh
binary format as <xml parrot format='binary' version=1.0>. Or something
like that, modulo actual correct XML.

This way we have a single, fixed-format type/version header, which makes
the initial identification easier and less error-prone. (Possibly even
fit for file and programs of its ilk to note) The binary format won't
care, and teh YAML format shouldn't care (as long as the indenting's
right) but the XML format would, so it seems to make sense to use the XML
stuff for the initial header.

Dan

Dan Sugalski

unread,
Oct 21, 2003, 8:53:39 AM10/21/03
to Leopold Toetsch, perl6-i...@perl.org
On Tue, 21 Oct 2003, Leopold Toetsch wrote:

> Dan Sugalski <d...@sidhe.org> wrote:
> > Since this has come up again, ...
>
> [ FYI: I was starting implementing this, based on a general traverse
> vtable with callback functions. Two patches got backed out by
> Dan after some discussion in PM ]

Right, because you'd implemented some stuff I'd specifically said we
weren't doing, and didn't back them out any of the times I asked...

> > ... and it's apparent that the last time around
> > I wasn't sufficiently clear, it's time to go through this again, and for
> > the final time.
>
> I'd be really happy, if you could go through my concerns mentioned in
> the summary in the thread:

That's why I did this, in part. It's the plan, until declared otherwise.

> > PMC *thaw(interpreter, STRING *)
>
> This should IMHO be able to create constant PMCs out of metadata, e.g.
> for subroutine objects. So there should be some means to tell thaw() to
> create PMC(s) in the constant_pmc_pool.

There should be a way to put PMCs in the constant pool in general. I was
thinking a constant op would work--something like

constant Ix, [SP]y

to make the string or PMC Y a constant at slot X in the constant pool.
Passing in the PMC header to be filled in also works, though both fail if
you want full PMC trees marked as constants since thawing out a PMC stream
may involve creating multiple PMCs. (In which case we might be better
temporarily switching allocation pools at constant creation time, rather
than passing in PMCs)

> > The chill and warm runtime methods take a PMC or a frozen representation
> > of a PMC (respectively) and provide a human readable version of that PMC.
>
> I dunno, why chill() is superior to dump() or pretty_print(), but the
> name doesn't really matter.

The important thing is that it's not a vtable method. It's a function that
belongs in the freeze/thaw API as it's just an alternate encoding or
decoding. (Arguably it ought not be a separate API entry at all and just
another encoding scheme, but that requires transcoding serialization
forms, and I'd rather not get into that)

> > 1) Freezing at the destruction level may *not* use any additional memory
> > for object traversal
>
> What is "Freezing at the destruction level"? Is this anyhow related to
> destruction ordering?

No. There are some valid cases where an object, after having been declared
dead by the DOD, wants to serialize itself. Persistent object stores
apparently do this, and it makes a certain amount of sense--when the
object goes out of scope the current state is flushed to disk.

It puts a number of unpleasant constraints on the core freeze routines.
User code can violate them and take the consequences, but we can't.

> > Note that I do *not* want to have multiple object traversal systems in
> > parrot! We have one for DOD, and proposals have ranged upwards from there.
> > No. That is *not* happening--the chance for error is significant, the
> > side-effects of the error annoying and tough to track down for complex
> > cases (akin to the trouble with tracking down GC issues), and just not
> > necessary. (Perhaps desirable for speed/space reasons, but desirable
> > isn't necessary)
>
> DOD's mark() routine has different requirements then a general
> traverse() for freeze(), chill(), clone(), and destruction ordering.
> Using just mark() will have these side effects that you want to avoid.

The only thing that mark does that the general traversal doesn't, in the
abstract, is flip the object's live flag. Everything else is an
optimization of code which we can, if we need, discard.

> A general traverse() can be depth first of breadth first, mark() isn't
> required do have any specific ordering as long as it sets live bits
> everywhere.

I'm pretty sure that with a singly linked list we can get a generally
properly-ordered flattened tree without having to do an insane number of
passes across the dead object store. I may be incorrect in this, but I
don't think so, and for our purposes the live bit can be safely ignored if
we end up setting it, though potentially with another pass over the dead
store, which may end up prohibitively expensive. We'll see.

> mark() is called permanently in a running interpreter, that does non
> trivial things. There are shortcuts for scalars, DOD is highly optimized
> not to destroy cache coherency. Using mark() also implies to back out
> my small PMC patches. All the advantages of smaller scalars are gone
> then.

All of this stuff for freezing is going to end up killing the small PMC
patch anyway, unfortunately, since we're going to have to be able to
traverse PMCs in the destruction phase, which means we have to have the
means of traversal at hand as we can't guarantee that we can allocate more
PMCs or resize the PMCs ext data.

> While freeze() and friends have to pull in each PMC into the cache, just
> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> speeding up timely destruction puts objects either in front or at the
> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> as your general and solely iterator.

We may end up playing macro games with the code. I can live with multiple
instantiations of the code, but I want only a single chunk of source to be
maintained.

> > ... This is something that's hidden under a number of layers
> > of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> > or runtime API.
>
> So when its hidden, I really don't understand, why you are insisting on
> an (IMHO) suboptimal design.

YHO, in this case, turns out to not consider all the issues involved.
Either way, we do it this way for now. Once the implementation of the
specified API is in, we can fight over reworking the internals.

Dan

Dan Sugalski

unread,
Oct 21, 2003, 9:12:27 AM10/21/03
to Elizabeth Mattijsen, Gregor N. Purdy, Uri Guttman, Melvin Smith, perl6-i...@perl.org
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

> At 08:21 -0400 10/21/03, Dan Sugalski wrote:
> > > I find the notion of an "XML header" a bit confusing, given Dan's
> >> statement to the effect that it was a throw to XML folks.
> >>
> >> I think anything "XML folks" will be interested in will entail
> >> *wrapping* stuff, not *prefixing* it.
> >
> >Nah, I expect what they'll want is for the entire data stream of
> >serialized objects to be in XML format. Which is fine--they can have that.
> >(It's why I mentioned the serialization routines can be overridden)
> >
> >For an XML stream the header might be <xml parrot format='xml'
> >version=1.0> with the rest of the stream in XML. A YAML stream would start
> ><xml parrot format='yaml' version=1.0> with the rest in YAML, and teh
> >binary format as <xml parrot format='binary' version=1.0>. Or something
> >like that, modulo actual correct XML.
>

> If you want that to be looking like valid XML, it would have to be different:
>
> error: Specification mandate value for attribute parrot
> <xml parrot/>
> ^
> Better in my opinion would be something like:
>
> <parrot format="xml" version="1.0"/>data yadda yadda yadda

I'm not an XML guy, and I'm making all this up as I go along. If that's
better, fine with me. :)

> >This way we have a single, fixed-format type/version header, which makes
> >the initial identification easier and less error-prone. (Possibly even
> >fit for file and programs of its ilk to note) The binary format won't
> >care, and teh YAML format shouldn't care (as long as the indenting's
> >right) but the XML format would, so it seems to make sense to use the XML
> >stuff for the initial header.
>

> So are we talking about a header or a wrapper? If it is really a
> header, it's not XML and then it's prettyy useless from an XML point
> of view.

We're talking about the first thing in a file (or stream, or whatever). I
was under the impression that XML files should be entirely composed of
valid XML, hence the need for the stream type marker being valid XML. YAML
doesn't care as much, so far as I understand, and for our own internal
binary format we cna do whatever we want. If that's not true, then we can
go for a more compact header.

Note that the serialized stream will be different depending on the encoder
chosen. If you have the structure:

$bar = 1;
@foo[0] = \$bar;
@foo[1] = "Baz";

The XML stream serializing @foo might look like:

<XML type=parrot-xml version=1.0>
<PMC name=foo>
<type>
PerlArray
</type>
<value>
<PMC>bar</pmc>
<string>Baz</string>
</value>
</PMC>
<PMC name=bar>
<type>
PerlInt
</type>
<value>
<integer>1</integer>
</value>
</PMC>

Only not inevitably horribly broken, invalid, and poorly done. :) The YAML
form might look like

<XML type=parrot-yaml version=1.0>
PMC: foo
type: PerlArray
values:
pmc: bar
string: Baz
PMC: bar
type: PerlInt
values:
integer:1

Once again, modulo my limited and inevitably incorrect YAML knowledge. So
if the header says it's XML the whole thing is valid XML, while if it
doesn't the rest of the stream doesn't have to be. (Just enough of the
header so that an XML processing program can examine the stream and decide
that the valid XML chunk at the beginning says that the rest of the
stream's not XML)

Basically we want some nice, fixed (mostly) thing at the head of the
stream that doesn't vary regardless of the way the stream is encoded, and
XML seemed to be the most restrictive of the forms I know people will
clamor for. (I know, it means the stream can't be valid Lisp-style sexprs,
but XML's more widespread :)

Dan

Leopold Toetsch

unread,
Oct 21, 2003, 10:13:47 AM10/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> On Tue, 21 Oct 2003, Leopold Toetsch wrote:

[ thaw ]

>> This should IMHO be able to create constant PMCs out of metadata, e.g.
>> for subroutine objects. So there should be some means to tell thaw() to
>> create PMC(s) in the constant_pmc_pool.

> There should be a way to put PMCs in the constant pool in general. I was
> thinking a constant op would work--something like

> constant Ix, [SP]y

> to make the string or PMC Y a constant at slot X in the constant pool.

You can append items to the constant table. You can't declare existing
items as constant, because you can't change the underlying object pool,
where the object was allocated. This would change the objects address.

> Passing in the PMC header to be filled in also works, though both fail if
> you want full PMC trees marked as constants since thawing out a PMC stream
> may involve creating multiple PMCs. (In which case we might be better
> temporarily switching allocation pools at constant creation time, rather
> than passing in PMCs)

These are either serious shortcomings or unneeded workarounds. An extra
parameter to relevant vtables can take care of such special cases.

>> I dunno, why chill() is superior to dump() or pretty_print(), but the
>> name doesn't really matter.

> The important thing is that it's not a vtable method.

Ah, that's the difference. How shall the system pretty-print dynamically
loaded PMCs then, when only a bytecode-stream is available? IMHO only a
vtable in the class can perform that job.

>> > 1) Freezing at the destruction level may *not* use any additional memory
>> > for object traversal

> It puts a number of unpleasant constraints on the core freeze routines.

Constructing the frozen stream definitely needs memory. I don't see the
difference, to memory consumed by a seen hash. Can you please elaborate
a bit more on this.

> The only thing that mark does that the general traversal doesn't, in the
> abstract, is flip the object's live flag. Everything else is an
> optimization of code which we can, if we need, discard.

Yes, mark() can be written in terms of a general traverse, which gets a
vtable function (and a data pointer). mark is basically traverse(mark,
0). But this isn't true the other way round. You can't do freeze based
on the mark iterator. How do you pass the desired output format?

>> mark() is called permanently in a running interpreter, that does non
>> trivial things. There are shortcuts for scalars, DOD is highly optimized
>> not to destroy cache coherency. Using mark() also implies to back out
>> my small PMC patches. All the advantages of smaller scalars are gone
>> then.

> All of this stuff for freezing is going to end up killing the small PMC
> patch anyway, unfortunately, since we're going to have to be able to
> traverse PMCs in the destruction phase, which means we have to have the
> means of traversal at hand as we can't guarantee that we can allocate more
> PMCs or resize the PMCs ext data.

A scalar can't contain or reference other PMCs, so it can't be a
potential source of freeze loops. If I now spit out (PMC: Int, ID=xy,
value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take
care of duplicates, if needed. Other PMCs have the next_for_GC pointer.

Albeit I'm not convinced, that we can't have a seen hash.

> YHO, in this case, turns out to not consider all the issues involved.

That might very well be true, yes. So it would be fine, if you could fill
the gaps.

> Dan

leo

Dan Sugalski

unread,
Oct 21, 2003, 10:49:27 AM10/21/03
to Leopold Toetsch, perl6-i...@perl.org
On Tue, 21 Oct 2003, Leopold Toetsch wrote:

> Dan Sugalski <d...@sidhe.org> wrote:
> > On Tue, 21 Oct 2003, Leopold Toetsch wrote:
>
> [ thaw ]
>
> >> This should IMHO be able to create constant PMCs out of metadata, e.g.
> >> for subroutine objects. So there should be some means to tell thaw() to
> >> create PMC(s) in the constant_pmc_pool.
>
> > There should be a way to put PMCs in the constant pool in general. I was
> > thinking a constant op would work--something like
>
> > constant Ix, [SP]y
>
> > to make the string or PMC Y a constant at slot X in the constant pool.
>
> You can append items to the constant table. You can't declare existing
> items as constant, because you can't change the underlying object pool,
> where the object was allocated. This would change the objects address.

The object's address should be irrelevant for the constant table. PMCs are
referenced in the opstream by table offset. This offset can be into a PMC
pool, or into a pointer table. While the pointer table has an extra level
of indirection to it it adds flexibility and takes some pressure off of
the ordering of PMCs for instantiated constants.

> > Passing in the PMC header to be filled in also works, though both fail if
> > you want full PMC trees marked as constants since thawing out a PMC stream
> > may involve creating multiple PMCs. (In which case we might be better
> > temporarily switching allocation pools at constant creation time, rather
> > than passing in PMCs)
>
> These are either serious shortcomings or unneeded workarounds. An extra
> parameter to relevant vtables can take care of such special cases.

Not necessarily, no. The number of PMCs that are reconstituted for a set
of constant frozen PMCs is indeterminate. If we're instantiating bytecode
with constant PMCs in it it's possible the class that backs those PMCs has
changed and things instantiate differently than they might otherwise do.

If we've frozen 20 PMCs, all we can guarantee is that when we unthaw them
that we've got at least 20 PMCs, though we may have more, and the extras
arguably should be allocated from the constant PMC arena (though not given
slots in the constant table) so we can skip scanning the constant arenas
for dead objects needing cleanup.

> >> I dunno, why chill() is superior to dump() or pretty_print(), but the
> >> name doesn't really matter.
>
> > The important thing is that it's not a vtable method.
>
> Ah, that's the difference. How shall the system pretty-print dynamically
> loaded PMCs then, when only a bytecode-stream is available? IMHO only a
> vtable in the class can perform that job.

If the dynamically loaded PMC class doesn't have a backing Parrot class,
you can't, and get the default, relatively primitive dump.

> >> > 1) Freezing at the destruction level may *not* use any additional memory
> >> > for object traversal
>
> > It puts a number of unpleasant constraints on the core freeze routines.
>
> Constructing the frozen stream definitely needs memory. I don't see the
> difference, to memory consumed by a seen hash. Can you please elaborate
> a bit more on this.

Constructing the frozen stream will need some memory, yes. At the moment
all it needs is a chunk of random memory and that's it, so we may well
fail because we're out of memory. We may, however, have general pool
memory handy. We can't guarantee that we have *any* headers, however,
since we can legitimately be called from within the destruct phase of a
DOD run, which may have been triggered by an out-of-headers condition.

Depending on how we flesh things out freezing may also not require any
additional memory--if we relax the requirement for freezing to allow the
output to be a PMC, we may be backed directly to a file or other storage
that doesn't involve RAM allocation.

> > The only thing that mark does that the general traversal doesn't, in the
> > abstract, is flip the object's live flag. Everything else is an
> > optimization of code which we can, if we need, discard.
>
> Yes, mark() can be written in terms of a general traverse, which gets a
> vtable function (and a data pointer). mark is basically traverse(mark,
> 0). But this isn't true the other way round. You can't do freeze based
> on the mark iterator. How do you pass the desired output format?

What does the desired output format have to do with any of this? All
marking does is put things on the list of PMCs to be visited if it hasn't
already been visited, so we get to that PMC at some point as we walk the
visited list. In the context of the DOD sweep it also sets the live flag,
but we could, if we chose, skip that and use the presence of a non-NULL
value in the mark chain address for a PMC as an indicator of liveness.
(Though yes, I realize that this means potentially skipping some of the
optimizations, so I'm not proposing it as a requirement for the DOD sweep
implementation)

> >> mark() is called permanently in a running interpreter, that does non
> >> trivial things. There are shortcuts for scalars, DOD is highly optimized
> >> not to destroy cache coherency. Using mark() also implies to back out
> >> my small PMC patches. All the advantages of smaller scalars are gone
> >> then.
>
> > All of this stuff for freezing is going to end up killing the small PMC
> > patch anyway, unfortunately, since we're going to have to be able to
> > traverse PMCs in the destruction phase, which means we have to have the
> > means of traversal at hand as we can't guarantee that we can allocate more
> > PMCs or resize the PMCs ext data.
>
> A scalar can't contain or reference other PMCs, so it can't be a
> potential source of freeze loops. If I now spit out (PMC: Int, ID=xy,
> value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take
> care of duplicates, if needed. Other PMCs have the next_for_GC pointer.

Thaw can only properly take care of duplicates if the duplicates are
correctly indicated in the serialization stream. Identical end-values are
*not* sufficient to note multiple references to the same PMC.

> Albeit I'm not convinced, that we can't have a seen hash.

It takes an insane amount of memory and requires header allocation. We
can't allocate headers, and the memory requirements are extreme. Been
there, done that, it was a bad idea. Consider this arbitrarily and
unconditionally ruled out if you're unwilling to believe the stats that
were previously posted about this.

Dan

Juergen Boemmels

unread,
Oct 21, 2003, 11:11:57 AM10/21/03
to Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> writes:

[...]

> > > The chill and warm runtime methods take a PMC or a frozen representation
> > > of a PMC (respectively) and provide a human readable version of that PMC.
> >
> > I dunno, why chill() is superior to dump() or pretty_print(), but the
> > name doesn't really matter.
>
> The important thing is that it's not a vtable method. It's a function that
> belongs in the freeze/thaw API as it's just an alternate encoding or
> decoding. (Arguably it ought not be a separate API entry at all and just
> another encoding scheme, but that requires transcoding serialization
> forms, and I'd rather not get into that)

This is really just a naming problem. Dan wants to call the
vtable-function freeze and have different encodings for all kinds of
dumping/pretty_printing/marking. Leo calls the function traverse and
controlls it by callbacks.

My personal opinion on this naming problem is: traverse describes more
generaly what the function does. Marking live objects by freezing them
in an encoding that does return nothing just sounds plain wrong.

Freeze should be just a user of the general traverse function. (And
this does mean it is also no vtable function)

STRING *freeze(PMC *pmc, whatever *encoding)
{
return (STRING *)pmc->vtable->traverse(pmc, freeze_callbacks, encoding);
}

or even the freeze_encodings are callback_sets: freeze_xml,
freeze_yaml, freeze_binary, whatever.

>>> 1) Freezing at the destruction level may *not* use any additional memory
>>> for object traversal
>>
>> What is "Freezing at the destruction level"? Is this anyhow related to
>> destruction ordering?
>
> No. There are some valid cases where an object, after having been declared
> dead by the DOD, wants to serialize itself. Persistent object stores
> apparently do this, and it makes a certain amount of sense--when the
> object goes out of scope the current state is flushed to disk.

This is a question of what is allowed at destruction time. You don't
want to allow memory allocation, but allow freezing. That gets hard,
because you need at least allocate the STRING where you want to put
your frozen stream.

> It puts a number of unpleasant constraints on the core freeze routines.
> User code can violate them and take the consequences, but we can't.

We can call (hopefully) arbitary user code in destruction routines. So
this argument does not count

>>> Note that I do *not* want to have multiple object traversal systems in
>>> parrot! We have one for DOD, and proposals have ranged upwards from there.
>>> No. That is *not* happening--the chance for error is significant, the
>>> side-effects of the error annoying and tough to track down for complex
>>> cases (akin to the trouble with tracking down GC issues), and just not
>>> necessary. (Perhaps desirable for speed/space reasons, but desirable
>>> isn't necessary)

Freeze is just another traversal method. Just calling it freeze
instead of traverse does not change this fact. You can limit the power
of encodings, but this does not change the fact that you need to walk
all children

>> DOD's mark() routine has different requirements then a general
>> traverse() for freeze(), chill(), clone(), and destruction ordering.
>> Using just mark() will have these side effects that you want to avoid.
>
> The only thing that mark does that the general traversal doesn't, in the
> abstract, is flip the object's live flag. Everything else is an
> optimization of code which we can, if we need, discard.

mark() may be implemented in form of a general traverse. Let the
profiler decide if a special purpose mark() or a general traverse is
better.

>> A general traverse() can be depth first of breadth first, mark() isn't
>> required do have any specific ordering as long as it sets live bits
>> everywhere.
>
> I'm pretty sure that with a singly linked list we can get a generally
> properly-ordered flattened tree without having to do an insane number of
> passes across the dead object store. I may be incorrect in this, but I
> don't think so, and for our purposes the live bit can be safely ignored if
> we end up setting it, though potentially with another pass over the dead
> store, which may end up prohibitively expensive. We'll see.

I'm pretty sure that a singly linked list is not enough. I had done
some experiments with this. One pass my be enough, but you need to
keep track of the tree-traversal and of the partial ordered
list. These to things don't play well together. Maybe this can be cut
down to two lists, or one list and one bit per pmc.



>> mark() is called permanently in a running interpreter, that does non
>> trivial things. There are shortcuts for scalars, DOD is highly optimized
>> not to destroy cache coherency. Using mark() also implies to back out
>> my small PMC patches. All the advantages of smaller scalars are gone
>> then.
>
> All of this stuff for freezing is going to end up killing the small PMC
> patch anyway, unfortunately, since we're going to have to be able to
> traverse PMCs in the destruction phase, which means we have to have the
> means of traversal at hand as we can't guarantee that we can allocate more
> PMCs or resize the PMCs ext data.

Destruction ordering just enforces that small PMCs can't have
destructors. If they have destructors they must be big, big enough to
construct the ordered list of objects without allocating any memory.

If you think about it: The call to the destructors is done after
free_unused_pobjects completed. The memory of the objects without
destructors is already freed. If you are still in an out of memory
situation when the destructors are run, then its also very likely that
you are in a not much better situation afterwards.

>> While freeze() and friends have to pull in each PMC into the cache, just
>> setting the live bit on a PMC hasn't. Further: Lukes proposal for
>> speeding up timely destruction puts objects either in front or at the
>> end of the next_for_GC chain. This IMHO implies that mark() is unusable
>> as your general and solely iterator.
>
> We may end up playing macro games with the code. I can live with multiple
> instantiations of the code, but I want only a single chunk of source to be
> maintained.

You know we already have two versions of pobject_lives lying around.

[...]

boe

Dan Sugalski

unread,
Oct 21, 2003, 11:39:38 AM10/21/03
to Juergen Boemmels, Leopold Toetsch, perl6-i...@perl.org
On Tue, 21 Oct 2003, Juergen Boemmels wrote:

> Dan Sugalski <d...@sidhe.org> writes:
>
> [...]
>
> > > > The chill and warm runtime methods take a PMC or a frozen representation
> > > > of a PMC (respectively) and provide a human readable version of that PMC.
> > >
> > > I dunno, why chill() is superior to dump() or pretty_print(), but the
> > > name doesn't really matter.
> >
> > The important thing is that it's not a vtable method. It's a function that
> > belongs in the freeze/thaw API as it's just an alternate encoding or
> > decoding. (Arguably it ought not be a separate API entry at all and just
> > another encoding scheme, but that requires transcoding serialization
> > forms, and I'd rather not get into that)
>
> This is really just a naming problem. Dan wants to call the
> vtable-function freeze and have different encodings for all kinds of
> dumping/pretty_printing/marking. Leo calls the function traverse and
> controlls it by callbacks.

It's more than just a naming issue (or if it is, then traverse is the
wrong name). The traversal must be done externally, since we can't be
recursive.

Mark puts a PMC on the list of PMCs to be frozen. Freeze dumps the
PMC being frozen (and *only* that PMC) to the stream. The freeze routine
for a PMC must mark (generally indirectly by calling the "add this pmc to
the stream" api function) any PMCs that it needs to be in the stream.

The external function that traverses this list of PMCs to be dumped is
responsible for making sure there are no duplicates--the easiest way is to
do what the DOD sweep does and note that a PMC has already been put on the
list and thus not mark it.

Mark and freeze are separate, though related by the subsystems that use
them.

> This is a question of what is allowed at destruction time. You don't
> want to allow memory allocation, but allow freezing. That gets hard,
> because you need at least allocate the STRING where you want to put
> your frozen stream.

It's more a question of what we we require the engine to do, vs what user
code is allowed to do. A user program is allowed to write code that can
fail at destroy time, however the infrastructure we provide (including, in
this case, freezing--while I don't like it there's no choice) can't fail
that way. It's the reason the DOD and GC systems don't allocate memory (or
didn't--they shouldn't) when they run. The engine's not allowed to have
failure modes in critical sections.

Basically the engine may fail because of user code, but user code can't
fail because of the engine. It makes some things annoyingly restrictive,
but some problems are inherently annoyingly restrictive.

> > It puts a number of unpleasant constraints on the core freeze routines.
> > User code can violate them and take the consequences, but we can't.
>
> We can call (hopefully) arbitary user code in destruction routines. So
> this argument does not count

See above. User code can fail, we can't.

> >> A general traverse() can be depth first of breadth first, mark() isn't
> >> required do have any specific ordering as long as it sets live bits
> >> everywhere.
> >
> > I'm pretty sure that with a singly linked list we can get a generally
> > properly-ordered flattened tree without having to do an insane number of
> > passes across the dead object store. I may be incorrect in this, but I
> > don't think so, and for our purposes the live bit can be safely ignored if
> > we end up setting it, though potentially with another pass over the dead
> > store, which may end up prohibitively expensive. We'll see.
>
> I'm pretty sure that a singly linked list is not enough. I had done
> some experiments with this. One pass my be enough, but you need to
> keep track of the tree-traversal and of the partial ordered
> list. These to things don't play well together. Maybe this can be cut
> down to two lists, or one list and one bit per pmc.

There may be a little more infrastructure--I've not dug out the algorithms
books and gone hunting. The common algorithms tend to cheat by just
dodging the whole problem. :)

> Destruction ordering just enforces that small PMCs can't have
> destructors. If they have destructors they must be big, big enough to
> construct the ordered list of objects without allocating any memory.

Can't have destructors *or* refer to PMCs that may either have a
destructor or (indirectly) refer to a PMC that has a destructor.

If we have 2 PMCs with destructors they may be connected by a chain of 100
PMCs that don't, but we still need to walk that chain.

> If you think about it: The call to the destructors is done after
> free_unused_pobjects completed. The memory of the objects without
> destructors is already freed.

Then we reorder. This can't happen, and it didn't used to happen--if
that's how it works now then there's a bug in the DOD system. *All*
destructors *must* be called before any headers are collected.

> >> While freeze() and friends have to pull in each PMC into the cache, just
> >> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> >> speeding up timely destruction puts objects either in front or at the
> >> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> >> as your general and solely iterator.
> >
> > We may end up playing macro games with the code. I can live with multiple
> > instantiations of the code, but I want only a single chunk of source to be
> > maintained.
>
> You know we already have two versions of pobject_lives lying around.

Then we need to fix that, too.

Dan

Jeff Clites

unread,
Oct 21, 2003, 12:18:36 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org, Leopold Toetsch
On Oct 21, 2003, at 5:53 AM, Dan Sugalski wrote:

>>> Note that I do *not* want to have multiple object traversal systems
>>> in
>>> parrot! We have one for DOD, and proposals have ranged upwards from
>>> there.
>>> No. That is *not* happening--the chance for error is significant, the
>>> side-effects of the error annoying and tough to track down for
>>> complex
>>> cases (akin to the trouble with tracking down GC issues), and just
>>> not
>>> necessary. (Perhaps desirable for speed/space reasons, but desirable
>>> isn't necessary)
>>
>> DOD's mark() routine has different requirements then a general
>> traverse() for freeze(), chill(), clone(), and destruction ordering.
>> Using just mark() will have these side effects that you want to avoid.
>
> The only thing that mark does that the general traversal doesn't, in
> the
> abstract, is flip the object's live flag. Everything else is an
> optimization of code which we can, if we need, discard.

I don't believe that is quite true. There are a couple of important
differences between traversal-for-GC and traversal-for-serialization,
which will be a challenge to reconcile in the one-true-traversal:

1) Serialization traversals need to "take note" of logical int and
float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they
can be serialized, but for GC you only need to worry about GC-able
objects. It's difficult to come up with a reasonable callback which can
take either int, float, or PObj arguments.

2) It's reasonable for an object to have a pointer to some sort of
cache object, which is not logically part of the object, and shouldn't
be serialized along with it. This needs to be traversed for GC
purposes, but needs to not be traversed for serialization. (Situations
such as this--physical but not logical membership--are the origin of
the "mutable" keyword in C++.)

3) Traversal for GC needs to do loop detection, but can just stop going
down a particular branch of the object graph once it encounters an
object it's seen before. Serialization traversals would need to have a
way, upon encountering an object seen before, to include in the
serialization stream an indication that the current object has already
been serialized, and enough information to enable deserialization code
to go find it and recreate the loop. The only options I see here are
either for serialization to involve the allocation of unbounded
additional memory, or to expand the PObj structure to include a slot
for a UUID which can be used as a back-reference in a stream, or to
have serialization break loops (so that deserialized structures never
have loops).

I'm not 100% convinced that a single approach can't handle both
applications, but it's looking as though their requirements are
different enough that it may not work well.

Two other questions/concerns/comments/issues:

1) I assume that ultimately a user-space iterator would end up calling
the traversal code, right? If so, you can't reasonably mandate that
only one traversal be in progress at one time. That would be the
canonical way to compare two ordered collections--get an iterator for
each, and compare element-by-element.

2) I don't see it as a huge problem that serialization code could end
up creating additional objects if called from a destroy() method.
(Though yes, it would be a problem for GC infrastructure code to.) I
say that for two reasons: (a) destroy() methods can really do anything
they want, and if that task involves allocating additional memory, that
just makes it a risk to perform that task in a destroy() method--it may
fail due to out-of-memory conditions. I think that Java design experts
tend to argue against doing things like serialization in finalization
methods. It sounds elegant, but it's problematic. One reason for this
is that you tend to want to serialize structures as a whole, not
piece-by-piece as they are garbage-collected. The second reason it is
not always a problem in practice is that (b) a DOD run may be triggered
by an out-of-headers conditions, but that doesn't mean that an
additional chunk of memory for headers can't be allocated. If it can't
be, then this is no more problematic that it would be in other user
code--think of the case where I have some big tree of objects I want to
make some sort of copy of, with the intention of then letting go of the
original when I'm done. I'll be freeing up headers at the end of that
process, but if I run out of memory part-way-through, then I'm just
stuck.

3) I assume that not every object is assumed to be serializable? For
instance, an object representing a filehandle can't really be
serialized in a useful way. So I'm not sure of what sort of "fidelity"
is required of a generic serialization method--that is, how similar a
deserialized structure is guaranteed to be to the original.


JEff

Leopold Toetsch

unread,
Oct 21, 2003, 12:14:38 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> On Tue, 21 Oct 2003, Leopold Toetsch wrote:

>> You can append items to the constant table. You can't declare existing
>> items as constant, because you can't change the underlying object pool,
>> where the object was allocated. This would change the objects address.

> The object's address should be irrelevant for the constant table. PMCs are
> referenced in the opstream by table offset.

Only in the opstream. But not when such PMCs are used then. I.e. when
constant Sub PMC is refered to in the global stash.

>> Ah, that's the difference. How shall the system pretty-print dynamically
>> loaded PMCs then, when only a bytecode-stream is available? IMHO only a
>> vtable in the class can perform that job.

> If the dynamically loaded PMC class doesn't have a backing Parrot class,
> you can't, and get the default, relatively primitive dump.

I was thinking of plain PMCs, that where loaded to provide some special
functionality. Parrot doesn't know anything about these, so will be
unable to pretty print the opstream. Loaded classes OTOH as based on
ParrotClass and should be printable.

>> Constructing the frozen stream definitely needs memory. I don't see the
>> difference, to memory consumed by a seen hash. Can you please elaborate
>> a bit more on this.

> Constructing the frozen stream will need some memory, yes. At the moment
> all it needs is a chunk of random memory and that's it, so we may well
> fail because we're out of memory.

So, with the same argument I can say, (destructor level) freezing will
need *system* memory for the stream plus the hash. So we may well fail.
I don't see any difference. The hash hasn't to be a "fat" PerlHash.

If we don't want a hash one bit inside the objects arena flags should be
able to serve the same functionality - this PMC already got serialized.

Anyway - how does/would freezing at destructor level look like from HLL
POV? Shortly before, there ought to be a full DOD run (or all possible
garbage would be frozen). At this time, the amount of still active and
then to be serialized PMCs is known (an upper boundary is always known).
So it should be possible to work around such constraints.

> ... We may, however, have general pool


> memory handy. We can't guarantee that we have *any* headers, however,
> since we can legitimately be called from within the destruct phase of a
> DOD run, which may have been triggered by an out-of-headers condition.

I really doubt, that thawing a program (or some data of it), that died
in middle of some non trivial operation, because it ran out of headers,
will be of any use.

>> A scalar can't contain or reference other PMCs, so it can't be a
>> potential source of freeze loops. If I now spit out (PMC: Int, ID=xy,
>> value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take
>> care of duplicates, if needed. Other PMCs have the next_for_GC pointer.

> Thaw can only properly take care of duplicates if the duplicates are
> correctly indicated in the serialization stream. Identical end-values are
> *not* sufficient to note multiple references to the same PMC.

Sorry I thought of PMC IDs, which are the address of the frozen PMCs.

>> Albeit I'm not convinced, that we can't have a seen hash.

> It takes an insane amount of memory and requires header allocation.

A PerlHash takes more memory, and yes. But we just need a hash of PMC
addresses, or a bit inside the objects arena.

We have several different traverse-like functions:

* mark (DOD): called frequently, should get all possible speed
* freeze (destruction): no speed issues, can't take Parrot resources
* freeze (user): rarely used, can take resources
* destruction ordering: only active objects to be visited
* clone: can take resources thaw(freeze()), or separate vtable
* dump/pretty-print: no vtable?
* thaw: special class method, is different anyway

The first 2 critical items have diametral usage patterns. This does not
really imply, that they should be implemented based on the same scheme.

> ... We


> can't allocate headers, and the memory requirements are extreme. Been
> there, done that, it was a bad idea. Consider this arbitrarily and
> unconditionally ruled out if you're unwilling to believe the stats that
> were previously posted about this.

You are speaking of Storable.pm? I'm not aware of any stats regarding
that. But I'm not thinking of using a full fledged hash for such a
special case.

> Dan

leo

Leopold Toetsch

unread,
Oct 21, 2003, 11:20:30 AM10/21/03
to Melvin Smith, perl6-i...@perl.org
Melvin Smith <mel...@us.ibm.com> wrote:
>> Albeit I'm not convinced, that we can't have a seen hash.

> A seen hash most likely would:

> 1) Kill GC performance especially in pathological cases. The GC
> should be quiet and invisible.
> 2) Cause memory usage to double upon a mark run.

GC isn't involved. A mark() run sets the live bit in the PMCs arena. No
hash is needed for both cases. I have stated several times, that I
don't like to mix mark() and the other traverse functions.

> -Melvin

leo

Jeff Clites

unread,
Oct 21, 2003, 12:40:17 PM10/21/03
to Dan Sugalski, Elizabeth Mattijsen, perl6-i...@perl.org

Yeah, you can't put extra things in the "<?xml..." at the start of a
document, and you can't create a tag of your own whose name starts with
"XML" or "xml".

>> So are we talking about a header or a wrapper? If it is really a
>> header, it's not XML and then it's prettyy useless from an XML point
>> of view.
>
> We're talking about the first thing in a file (or stream, or
> whatever). I
> was under the impression that XML files should be entirely composed of
> valid XML, hence the need for the stream type marker being valid XML.

No, XML _documents_ must be XML, but that doesn't mean that document ==
file. (For another example where this comes up, consider an XML
document transmitted over HTTP. There are headers and other textual
things in the stream along with the xml, and it's the HTTP protocol
which determines where the document begins and ends, not xml's.) You
can certainly have more than one XML document in a single file, but
something needs to decide where an xml document begins and ends, and
hand only that data to the xml parser.

> YAML doesn't care as much, so far as I understand, and for our own
> internal
> binary format we cna do whatever we want. If that's not true, then we
> can
> go for a more compact header.

Yes, if you want the whole serialized steam to count as a well-formed
xml document, then you can't but arbitrary binary data in the middle.
See my previous post for why.

> Once again, modulo my limited and inevitably incorrect YAML knowledge.
> So
> if the header says it's XML the whole thing is valid XML, while if it
> doesn't the rest of the stream doesn't have to be. (Just enough of the
> header so that an XML processing program can examine the stream and
> decide
> that the valid XML chunk at the beginning says that the rest of the
> stream's not XML)

Most XML parsers aren't expecting to handle this. That is, there's no
such thing as a valid half-of-an-xml document, from the perspective of
the xml spec, and in many cases you'd have trouble getting a parser to
stop before hitting something problematic and blowing up. In other
words, you can't rely on an xml parser to process something which
starts out looking like xml, but isn't.

> Basically we want some nice, fixed (mostly) thing at the head of the
> stream that doesn't vary regardless of the way the stream is encoded,
> and
> XML seemed to be the most restrictive of the forms I know people will
> clamor for. (I know, it means the stream can't be valid Lisp-style
> sexprs,
> but XML's more widespread :)

Yeah, if you're just needing to tag the stream with a label to indicate
the type plus a version number, then xml's on the one hand overkill and
on the other hand not necessarily a big help to xml proponents.

JEff

Dan Sugalski

unread,
Oct 21, 2003, 12:53:05 PM10/21/03
to Jeff Clites, Elizabeth Mattijsen, perl6-i...@perl.org
> Yeah, if you're just needing to tag the stream with a label to indicate
> the type plus a version number, then xml's on the one hand overkill and
> on the other hand not necessarily a big help to xml proponents.

So, in a nutshell, throwing an XML format type tag at the beginning buys
us nothing regardless of whether it's an XML stream or not?

In that case, nuts to that. It's already terribly obvious I'm going to
mess it up if I try, so we'll just skip it and move on to the next
headache. :)

(FWIW, with respect to binary data in the output stream--if an encoded
format doesn't allow binary data then the encoder is responsible for
changing it to a non-binary format. So for XML and YAML (and any other
text encoding format, I expect) that'll likely be a base64 encoding or
something)

Dan

Dan Sugalski

unread,
Oct 21, 2003, 1:20:34 PM10/21/03
to Jeff Clites, perl6-i...@perl.org, Leopold Toetsch
On Tue, 21 Oct 2003, Jeff Clites wrote:

> I don't believe that is quite true. There are a couple of important
> differences between traversal-for-GC and traversal-for-serialization,
> which will be a challenge to reconcile in the one-true-traversal:
>
> 1) Serialization traversals need to "take note" of logical int and
> float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they
> can be serialized, but for GC you only need to worry about GC-able
> objects. It's difficult to come up with a reasonable callback which can
> take either int, float, or PObj arguments.

That's not an issue for us. A PMC is responsible for serializing itself,
so if its got a string, float, or int component then it must take
respnsibility for dumping those components to the serialization stream.
Basically PMCs *must* dump themselves out completely, but the engine
provides support to defer dumping of PMCs so that we don't get into
recursive dumping and blow stack, as well as to make sure that we properly
maintain multiple references to the same PMC.

> 2) It's reasonable for an object to have a pointer to some sort of
> cache object, which is not logically part of the object, and shouldn't
> be serialized along with it. This needs to be traversed for GC
> purposes, but needs to not be traversed for serialization. (Situations
> such as this--physical but not logical membership--are the origin of
> the "mutable" keyword in C++.)

That's what custom mark routines are for, though it does argue that we
should have a separate mark for freezing.

> 3) Traversal for GC needs to do loop detection, but can just stop going
> down a particular branch of the object graph once it encounters an
> object it's seen before. Serialization traversals would need to have a
> way, upon encountering an object seen before, to include in the
> serialization stream an indication that the current object has already
> been serialized, and enough information to enable deserialization code
> to go find it and recreate the loop. The only options I see here are
> either for serialization to involve the allocation of unbounded
> additional memory, or to expand the PObj structure to include a slot
> for a UUID which can be used as a back-reference in a stream, or to
> have serialization break loops (so that deserialized structures never
> have loops).

The loop breaking needs for freezing are the same as for DOD sweeps,
though with freezing we're at an advantage as we know where the tree
starts.

In all cases (I made sure this was in the example, but it might not have
been clear) we only include a marker for child PMCs in the parent PMC's
serialized data, and serialize the child PMCs later on in the stream. So
if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the
place of PMC2's data is just a marker saying "refer to PMC2 here" and then
after the end of PMC1's data in the stream we dump out PMC2's data.

> 1) I assume that ultimately a user-space iterator would end up calling
> the traversal code, right? If so, you can't reasonably mandate that
> only one traversal be in progress at one time. That would be the
> canonical way to compare two ordered collections--get an iterator for
> each, and compare element-by-element.

While it could, I think it's infeasable to use the serialization iterator
for normal user-space iteration, if only because the limits that have to
be on the serialization iterator for use in restricted circumstances are a
bit onerous for general use.

I'm not entirely sure that parrot's going to provide this form of
iteration as it stands anyway--it's not necessary for the core langauge
support and while it'd be really useful there's a limit to the number of
Big Problems I'm up to solving. (Having said that there may, probably
will, be enough introspective capabilites to do this without engine
support)

> 2) I don't see it as a huge problem that serialization code could end
> up creating additional objects if called from a destroy() method.

User code may, parrot may not. The reasons are twofold--while parrot will
let you shoot yourself in the foot, it provides the gun, not the foot.
It should also be possible for carefully written destroy methods to
serialize but not eat any headers or memory. (I can see this being the
case in some embedded applications or systems) If we make it so freezing
is not a guaranteed possibility at destroy time then this can't happen and
it lessens the utility of the system some.

We can, if we choose, loosen the restriction later if sufficient reason is
presented. Can't really tighten it, though, so for now...

> 3) I assume that not every object is assumed to be serializable? For
> instance, an object representing a filehandle can't really be
> serialized in a useful way. So I'm not sure of what sort of "fidelity"
> is required of a generic serialization method--that is, how similar a
> deserialized structure is guaranteed to be to the original.

No fidelity is required at the moment, as we've not put any requirements
at all on what goes in the output stream. It could, I suppose, consist of
a near-infinite stream of fnords or something. That's the next bridge to
burn, but I don't think I'm done being cooked over the current one :)

Dan

Dan Sugalski

unread,
Oct 21, 2003, 1:49:13 PM10/21/03
to Elizabeth Mattijsen, Jeff Clites, perl6-i...@perl.org
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

> At 12:53 -0400 10/21/03, Dan Sugalski wrote:
> > > Yeah, if you're just needing to tag the stream with a label to indicate
> >> the type plus a version number, then xml's on the one hand overkill and
> > > on the other hand not necessarily a big help to xml proponents.
> >So, in a nutshell, throwing an XML format type tag at the beginning buys
> >us nothing regardless of whether it's an XML stream or not?
>

> Yep. But mainly I think because you'll need to encode binary data to
> make it valid XML. That's on overhead you don't to suffer for those
> serialization that don't need it.

I had it in mind that the XML parsers were all event driven so they'd read
the header and stop until prodded, and wouldn't be prodded on if it wasn't
a real parrot XML serialization stream, so binary data wouldn't matter.

> If you ask me, you could do easy with a simple header line like:
>
> parrot xml 1.0
> \0
>
> basically magic word ('parrot')
> followed by a space
> followed by the type
> followed by a space
> followed by version
> followed by a CRLF (not sure about this one, but could be nice)
> followed by a null byte

That works for me, including the crlf. Congrats, you just defined the
parrot serialization header tag! :-)

> I'm not clear if you would know beforehand how many bytes of data you
> would receive. If that is possible to know at all time, then I would
> suggest having the length as an extra part of the header.

Since we're going to potentially be serializing to an on-the-fly
unseekable device (i.e. dumping to a socket) so no length.

> >In that case, nuts to that. It's already terribly obvious I'm going to
> >mess it up if I try, so we'll just skip it and move on to the next
> >headache. :)
>

> Which means I'll be going back to lurking mode again... ;-)

Waiting to pounce, huh? :)

Dan

Jeff Clites

unread,
Oct 21, 2003, 1:56:28 PM10/21/03
to Elizabeth Mattijsen, Dan Sugalski, perl6-i...@perl.org
On Oct 21, 2003, at 10:41 AM, Elizabeth Mattijsen wrote:

> At 12:53 -0400 10/21/03, Dan Sugalski wrote:
>> > Yeah, if you're just needing to tag the stream with a label to
>> indicate
>>> the type plus a version number, then xml's on the one hand overkill
>>> and
>> > on the other hand not necessarily a big help to xml proponents.
>> So, in a nutshell, throwing an XML format type tag at the beginning
>> buys
>> us nothing regardless of whether it's an XML stream or not?
>

> Yep. But mainly I think because you'll need to encode binary data to
> make it valid XML. That's on overhead you don't to suffer for those
> serialization that don't need it.
>

> If you ask me, you could do easy with a simple header line like:
>
> parrot xml 1.0
> \0
>
> basically magic word ('parrot')
> followed by a space
> followed by the type
> followed by a space
> followed by version
> followed by a CRLF (not sure about this one, but could be nice)
> followed by a null byte

Yep, that's the sort of thing that I was thinking, though I'd actually
leave the CRLF (or just an LF or CR, whatever), and take out the null
byte. My reason for that is that this way, if your serialization format
always spits out vanilla ASCII w/o control characters, suitable for
consumption by some foreign C program, then the header won't change
this. (That's one of the nice features of the tar format--a tar archive
of ASCII text file is itself an ASCII text file, if I recall
corrrectly.)

It could also be handy to allow additional "comment" text after the
version (ignored by the deserialization, restricted to be ASCII w/o any
CR or LF), because that would let you put in some human-readably
comment to help out people trying to figure out what this file is. Some
other formats to this, which is nice. Just another thought.

JEff

Jeff Clites

unread,
Oct 21, 2003, 2:04:56 PM10/21/03
to Dan Sugalski, Elizabeth Mattijsen, perl6-i...@perl.org
On Oct 21, 2003, at 10:49 AM, Dan Sugalski wrote:

> On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:
>
>> At 12:53 -0400 10/21/03, Dan Sugalski wrote:
>>>> Yeah, if you're just needing to tag the stream with a label to
>>>> indicate
>>>> the type plus a version number, then xml's on the one hand
>>>> overkill and
>>>> on the other hand not necessarily a big help to xml proponents.
>>> So, in a nutshell, throwing an XML format type tag at the beginning
>>> buys
>>> us nothing regardless of whether it's an XML stream or not?
>>
>> Yep. But mainly I think because you'll need to encode binary data to
>> make it valid XML. That's on overhead you don't to suffer for those
>> serialization that don't need it.
>
> I had it in mind that the XML parsers were all event driven so they'd
> read
> the header and stop until prodded, and wouldn't be prodded on if it
> wasn't
> a real parrot XML serialization stream, so binary data wouldn't matter.

The event-based parsers (such as expat and other SAX parsers) tend to
be push instead of pull, so you hand them your bytes and they invoke
your callbacks (as opposed to pull-style in which you'd ask for the
next event). Sometimes you can hand them your bytes in chunks (and
they'll process what they can and save up the rest to include with your
next chunk), so with expat (for example) you could probably do what you
wanted, but you'd have to hand it data one byte at a time until your
first callback was invoked, then stop. So it could probably be done
with some parsers, but it would be an unusual usage, and more overhead
than it's worth to just parse out a couple of strings. :)

JEff

Dan Sugalski

unread,
Oct 21, 2003, 2:09:05 PM10/21/03
to Elizabeth Mattijsen, Jeff Clites, perl6-i...@perl.org
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

> At 13:49 -0400 10/21/03, Dan Sugalski wrote:
> >On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

> Hmmm... maybe as an optimilization, something that would fit in 4 or
> 8 bytes would be better for the magic string (so a single or double
> integer check would be suffcient?).
>
> prrt (4 byte)
>
> ParrotDS (8 byte)
>
> (DS for Data Stream, rather than what you think, Dan ;-)

Yeah, that's a better option, I think. (And no, I didn't figure it stood
for anything else--that thing in the build file is embarrassing enough
:-P)

> > > I'm not clear if you would know beforehand how many bytes of data you
> > > would receive. If that is possible to know at all time, then I would
> > > suggest having the length as an extra part of the header.
> >Since we're going to potentially be serializing to an on-the-fly
> >unseekable device (i.e. dumping to a socket) so no length.
>

> Ok, so how is the encoder to know that no more data will come?

Good point. We'll have to have an end encoding entry in the encoding API,
the same way we'll need a begin encoding. The encoding format itself
defines the end of encoding marker and will know when it's hit the end of
the data stream.

Dan

Elizabeth Mattijsen

unread,
Oct 21, 2003, 8:47:42 AM10/21/03
to Dan Sugalski, Gregor N. Purdy, Uri Guttman, Melvin Smith, perl6-i...@perl.org
At 08:21 -0400 10/21/03, Dan Sugalski wrote:
> > I find the notion of an "XML header" a bit confusing, given Dan's
>> statement to the effect that it was a throw to XML folks.
>>
>> I think anything "XML folks" will be interested in will entail
>> *wrapping* stuff, not *prefixing* it.
>
>Nah, I expect what they'll want is for the entire data stream of
>serialized objects to be in XML format. Which is fine--they can have that.
>(It's why I mentioned the serialization routines can be overridden)
>
>For an XML stream the header might be <xml parrot format='xml'
>version=1.0> with the rest of the stream in XML. A YAML stream would start
><xml parrot format='yaml' version=1.0> with the rest in YAML, and teh
>binary format as <xml parrot format='binary' version=1.0>. Or something
>like that, modulo actual correct XML.

If you want that to be looking like valid XML, it would have to be different:

error: Specification mandate value for attribute parrot
<xml parrot/>
^
Better in my opinion would be something like:

<parrot format="xml" version="1.0"/>data yadda yadda yadda

At least this would be a valid stand-alone XML container. And
possibly parsers out there can be coerced into leaving the rest of
the stream for other processes to be read.


>This way we have a single, fixed-format type/version header, which makes
>the initial identification easier and less error-prone. (Possibly even
>fit for file and programs of its ilk to note) The binary format won't
>care, and teh YAML format shouldn't care (as long as the indenting's
>right) but the XML format would, so it seems to make sense to use the XML
>stuff for the initial header.

So are we talking about a header or a wrapper? If it is really a

header, it's not XML and then it's prettyy useless from an XML point
of view.


Liz

Elizabeth Mattijsen

unread,
Oct 21, 2003, 1:41:08 PM10/21/03
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org
At 12:53 -0400 10/21/03, Dan Sugalski wrote:
> > Yeah, if you're just needing to tag the stream with a label to indicate
>> the type plus a version number, then xml's on the one hand overkill and
> > on the other hand not necessarily a big help to xml proponents.
>So, in a nutshell, throwing an XML format type tag at the beginning buys
>us nothing regardless of whether it's an XML stream or not?

Yep. But mainly I think because you'll need to encode binary data to

make it valid XML. That's on overhead you don't to suffer for those
serialization that don't need it.

If you ask me, you could do easy with a simple header line like:

parrot xml 1.0
\0

basically magic word ('parrot')
followed by a space
followed by the type
followed by a space
followed by version
followed by a CRLF (not sure about this one, but could be nice)
followed by a null byte

I'm not clear if you would know beforehand how many bytes of data you

would receive. If that is possible to know at all time, then I would
suggest having the length as an extra part of the header.

>In that case, nuts to that. It's already terribly obvious I'm going to
>mess it up if I try, so we'll just skip it and move on to the next
>headache. :)

Which means I'll be going back to lurking mode again... ;-)


Liz

Elizabeth Mattijsen

unread,
Oct 21, 2003, 1:59:37 PM10/21/03
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org
At 13:49 -0400 10/21/03, Dan Sugalski wrote:
>On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:
> > Yep. But mainly I think because you'll need to encode binary data to
> > make it valid XML. That's on overhead you don't to suffer for those
> > serialization that don't need it.
>I had it in mind that the XML parsers were all event driven so they'd read
>the header and stop until prodded, and wouldn't be prodded on if it wasn't
>a real parrot XML serialization stream, so binary data wouldn't matter.
> > If you ask me, you could do easy with a simple header line like:
>>
>> parrot xml 1.0
>> \0
>>
>> basically magic word ('parrot')
>> followed by a space
>> followed by the type
>> followed by a space
>> followed by version
>> followed by a CRLF (not sure about this one, but could be nice)
> > followed by a null byte
>That works for me, including the crlf. Congrats, you just defined the
>parrot serialization header tag! :-)

Hmmm... maybe as an optimilization, something that would fit in 4 or

8 bytes would be better for the magic string (so a single or double
integer check would be suffcient?).

prrt (4 byte)

ParrotDS (8 byte)

(DS for Data Stream, rather than what you think, Dan ;-)


> > I'm not clear if you would know beforehand how many bytes of data you
> > would receive. If that is possible to know at all time, then I would
> > suggest having the length as an extra part of the header.
>Since we're going to potentially be serializing to an on-the-fly
>unseekable device (i.e. dumping to a socket) so no length.

Ok, so how is the encoder to know that no more data will come?


> > >In that case, nuts to that. It's already terribly obvious I'm going to
>> >mess it up if I try, so we'll just skip it and move on to the next
> > >headache. :)
> > Which means I'll be going back to lurking mode again... ;-)
>Waiting to pounce, huh? :)

I wish. ;-) For now, I'm more the hidden dragon rather than the
crouching tiger... ;-(


Liz

Leopold Toetsch

unread,
Oct 21, 2003, 1:43:48 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org

One is with ARENA_DOD_FLAGS one w/o. If you are trying to implement your
universal mark() for everything, one is obsolete anyway.

> Dan

leo

Leopold Toetsch

unread,
Oct 21, 2003, 2:21:51 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

> So, in a nutshell, throwing an XML format type tag at the beginning buys
> us nothing regardless of whether it's an XML stream or not?

Yes. That's what people say :)

What about a well known format called PBC. (Parrot bortable^Wbyte code :)
It knows about en/decoding basic types. A PMC doesn't need a lot more.

> Dan

leo

Leopold Toetsch

unread,
Oct 21, 2003, 2:13:49 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> On Tue, 21 Oct 2003, Jeff Clites wrote:

>> 1) Serialization traversals need to "take note" of logical int and
>> float slots

> That's not an issue for us. A PMC is responsible for serializing itself,


> so if its got a string, float, or int component then it must take
> respnsibility for dumping those components to the serialization stream.
> Basically PMCs *must* dump themselves out completely, but the engine
> provides support to defer dumping of PMCs so that we don't get into
> recursive dumping

That's what my general traversal routine was intended for. A PerlHash
may have native datatypes as well as PMCs as data members, plus the
STRING keys, which are references. The hash itself and plain data
members can get serialized/frozen/dumped whatever. The callback takes
care of the desired action.

PMCs inside (especially aggregates of any kind) would get postponed
(only the ID or address needs to be serialized). If that is now done via
the next_for_GC pointer, a seen hash a bitmap or whatever is debatable
and seems (when destructor level freezing comes in) to be not too simple.

But using mark() for it doesn't meat the goal. Its a different thing. It
sets the live bit on objects.

> That's what custom mark routines are for, though it does argue that we
> should have a separate mark for freezing.

Which can be traverse or visit or whatever, but different.

>> 3) Traversal for GC needs to do loop detection,

It stops, when the live bit is set or sets a live bit and places
aggregates on an todo list. Its by far simpler then freezing. Its an
optimization - yes.

> The loop breaking needs for freezing are the same as for DOD sweeps,

s/are/can be/

> In all cases (I made sure this was in the example, but it might not have
> been clear) we only include a marker for child PMCs in the parent PMC's
> serialized data, and serialize the child PMCs later on in the stream. So
> if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the
> place of PMC2's data is just a marker saying "refer to PMC2 here" and then
> after the end of PMC1's data in the stream we dump out PMC2's data.

That's clear. But plain scalars don't have child PMCs. Freez em and be
done with them. There is no need to put these on a next_for_GC list,
mark() doesn't do it (anymore) and freeze doesn't have to do it. I don't
see the small PMCs approach dieing because of that.

> Dan

leo

Clark C. Evans

unread,
Oct 21, 2003, 3:02:59 PM10/21/03
to Dan Sugalski, perl6-i...@perl.org
On Tue, Oct 21, 2003 at 09:12:27AM -0400, Dan Sugalski wrote:
| We're talking about the first thing in a file (or stream, or whatever). I
| was under the impression that XML files should be entirely composed of
| valid XML, hence the need for the stream type marker being valid XML. YAML
| doesn't care as much, so far as I understand, and for our own internal
| binary format we cna do whatever we want.

As for 'autodetecting' XML vs YAML, an earlier version of the YAML
spec restricted plain-style mapping keys so that they could not
start with the '<' character. In this way, a processor could
auto-detect if the incoming stream was XML or YAML, and use the
appropriate parser. With the restricted schema described below,
a small-footprint XML parser could even be shipped with the
core libyaml allowing us (the YAML team) to handle this XML
compatibility requirement; especially with regard to SOAP, the
defacto XML object 'serialization' schema.

I wrote up a brief 'sketch' as to one option for interoperability
between XML and YAML, although there are many such options, and soon
the yaml-core list will be forced to discuss icky things like this.
My thoughts (which are _not_ concensus in the YAML community) are
found at http://www.yaml.org/xml.html ; this page gives the
usually "invoice" example in XML, and an imperfect XSLT stylesheet
for converting XML in this schema to YAML. Clearly more work is
needed here, I would very much like to hear your requirements.

Kind Regards,

Clark

P.S. I try to follow this list, but I often miss items, so if you put
'YAML' in the title and cc me on it, it will surely get my attention.

Dan Sugalski

unread,
Oct 21, 2003, 3:10:25 PM10/21/03
to Clark C. Evans, perl6-i...@perl.org
On Tue, 21 Oct 2003, Clark C. Evans wrote:

> On Tue, Oct 21, 2003 at 09:12:27AM -0400, Dan Sugalski wrote:
> | We're talking about the first thing in a file (or stream, or whatever). I
> | was under the impression that XML files should be entirely composed of
> | valid XML, hence the need for the stream type marker being valid XML. YAML
> | doesn't care as much, so far as I understand, and for our own internal
> | binary format we cna do whatever we want.
>
> As for 'autodetecting' XML vs YAML

We don't have to! Woohoo! :)

This is one problem I didn't want to go into, so the encoding would be
explicit in the header. Since we've now dodged even the pretense of
guaranteed minimally valid anything in the stream header, the point's
moot, which is nice.

For YAML encoding, like for XML (and the default native encoding, which'll
probably be parrot bytecode) everything after the end-of-header will be
proper, well-formed whatever. Unless the encoder's messed up, of course.
;)

Dan

Dan Sugalski

unread,
Oct 21, 2003, 3:18:08 PM10/21/03
to Clark C. Evans, Elizabeth Mattijsen, Jeff Clites, perl6-i...@perl.org
On Tue, 21 Oct 2003, Clark C. Evans wrote:

> If you are going to go this far (including content-length) may I
> just suggest using a MIME envelope? This has several advantages:

This is a very good idea, but not this time, as it's too easy to get stuck
in the endless churn of very good ideas and alternatives.

The simple header format, without the null byte (taking it out is a good
idea, since we may have the possibility of an all-text file in that case),
is the way we're going to go. Maybe for version 2.0, but we've hit the
Good Enough point here.

Dan

Elizabeth Mattijsen

unread,
Oct 21, 2003, 3:25:48 PM10/21/03
to Dan Sugalski, Clark C. Evans, Jeff Clites, perl6-i...@perl.org
At 15:18 -0400 10/21/03, Dan Sugalski wrote:
>On Tue, 21 Oct 2003, Clark C. Evans wrote:
> > If you are going to go this far (including content-length) may I
> > just suggest using a MIME envelope? This has several advantages:
>This is a very good idea, but not this time, as it's too easy to get stuck
>in the endless churn of very good ideas and alternatives.

I would think the MIME-envelope would have to be part of the data,
rather than in the header. Or am I missing anything? It's
encode/decoder determined, is it not?


>The simple header format, without the null byte (taking it out is a good
>idea, since we may have the possibility of an all-text file in that case),

I have no particular feeling about the null byte. It would just be a
convenience when debugging as it would allow you to just print the
string, as it would be null-delimited. The CRLF (or just the CR or
just the LF) could serve as an end of header marker just as well.


Liz

Clark C. Evans

unread,
Oct 21, 2003, 3:11:29 PM10/21/03
to Elizabeth Mattijsen, Dan Sugalski, Jeff Clites, perl6-i...@perl.org
On Tue, Oct 21, 2003 at 07:41:08PM +0200, Elizabeth Mattijsen wrote:
| If you ask me, you could do easy with a simple header line like:
|
| parrot xml 1.0
| \0
|
| basically magic word ('parrot')
| followed by a space
| followed by the type
| followed by a space
| followed by version
| followed by a CRLF (not sure about this one, but could be nice)
| followed by a null byte
|
| I'm not clear if you would know beforehand how many bytes of data you
| would receive. If that is possible to know at all time, then I would
| suggest having the length as an extra part of the header.

If you are going to go this far (including content-length) may I

just suggest using a MIME envelope? This has several advantages:

- there are already readers for the format
- it allows you to specify the 'Content-Type' as, say binary/parrot
or text/yaml or text/xml
- it gives you a place to put 'Content-Length'
- it is extensible, allowing for other headers
- it allows you to include other 'binary' blobs in the same file

Best,

Clark

Dan Sugalski

unread,
Oct 21, 2003, 4:04:01 PM10/21/03
to Clark C. Evans, Elizabeth Mattijsen, Jeff Clites, perl6-i...@perl.org
On Tue, 21 Oct 2003, Clark C. Evans wrote:

> Back to the YAML list... sorry for interloping!

Ah, you weren't interloping--it is a good idea. You just managed to come
in on the other side of Good Enough today. :)

Dan

Clark C. Evans

unread,
Oct 21, 2003, 3:56:55 PM10/21/03
to Elizabeth Mattijsen, Dan Sugalski, Jeff Clites, perl6-i...@perl.org
Dan/Elizabeth,

Thank you for considering my response, let me rephrase and then
I'll go back to my own list (*grins*).

On Tue, Oct 21, 2003 at 09:25:48PM +0200, Elizabeth Mattijsen wrote:
| At 15:18 -0400 10/21/03, Dan Sugalski wrote:
| >On Tue, 21 Oct 2003, Clark C. Evans wrote:
| > > If you are going to go this far (including content-length) may I
| > > just suggest using a MIME envelope? This has several advantages:
| >This is a very good idea, but not this time, as it's too easy to get stuck
| >in the endless churn of very good ideas and alternatives.

I should have just suggested RFC822 like headers (used by E-Mail
and HTTP) and not imply that MIME, with its multi-part and encoding
glory need be supported. In other words, the header could simply be:

Parrot-version: 0.3
Content-type: binary/parrot <- or text/yaml, text/xml
Content-length: 49384
<- blank line
(binary payload)

This has the advantages of:
(a) satisifies mentioned requirements: version, type, and size
(a) easy to parse, well known syntax
(b) fits in well with Intranet infrastructure
(c) easy to extend down the road, ie, more headers can be added

| I would think the MIME-envelope would have to be part of the data,
| rather than in the header. Or am I missing anything? It's
| encode/decoder determined, is it not?

Well, it seemed you were making an 'envelope', and this is
exactly what RFC822 is all about. In particular, all of the
items you wanted to put in your header could be done easily
with RFC822.

| >The simple header format, without the null byte (taking it out is a good
| >idea, since we may have the possibility of an all-text file in that case),
|
| I have no particular feeling about the null byte. It would just be a
| convenience when debugging as it would allow you to just print the
| string, as it would be null-delimited. The CRLF (or just the CR or
| just the LF) could serve as an end of header marker just as well.

RFC822 uses a "blank line", that is two adjacent "CRLF" items to
mark the end of the header.

Back to the YAML list... sorry for interloping!

Clark

Elizabeth Mattijsen

unread,
Oct 21, 2003, 4:15:58 PM10/21/03
to Clark C. Evans, Dan Sugalski, Jeff Clites, perl6-i...@perl.org
At 12:56 -0700 10/21/03, Clark C. Evans wrote:
>On Tue, Oct 21, 2003 at 09:25:48PM +0200, Elizabeth Mattijsen wrote:
>| At 15:18 -0400 10/21/03, Dan Sugalski wrote:
>| >On Tue, 21 Oct 2003, Clark C. Evans wrote:
>| > > If you are going to go this far (including content-length) may I
>| > > just suggest using a MIME envelope? This has several advantages:
>| >This is a very good idea, but not this time, as it's too easy to get stuck
>| >in the endless churn of very good ideas and alternatives.
>I should have just suggested RFC822 like headers (used by E-Mail
>and HTTP) and not imply that MIME, with its multi-part and encoding
>glory need be supported. In other words, the header could simply be:
> Parrot-version: 0.3
> Content-type: binary/parrot <- or text/yaml, text/xml
> Content-length: 49384
> <- blank line
> (binary payload)

But do we always need that? In my idea it would be something like:

prrt 1.0 yaml # prrt = magic word, 1.0 = parrot header version,
yaml = encode ID
Parrot-version: 0.3
Content-type: binary/parrot
Content-length: 49384

(binary payload)


The Parrot header line would just be enough to get the right decoder,
whatever the decoder does with the rest of the stream, is up to the
decoder. Another example with XML:

prrt 1.0 xml # assume XML
<?xml version="1.0" encoding="iso-8859-1"?>
<parrot
xmlns:parrot="http://www.parrotcode.org/0.3" # implies MIME-encode
binary data
>(mime encoded binary data)</parrot>


And another one with an oldy but goody?

prrt 1.0 storable # assume storable
(whatever Storable.pm puts in its magic)


Hope this made sense.


Liz

Jeff Clites

unread,
Oct 22, 2003, 3:03:25 AM10/22/03
to Juergen Boemmels, Perl6 Internals
On Oct 21, 2003, at 8:11 AM, Juergen Boemmels wrote:

> If you think about it: The call to the destructors is done after
> free_unused_pobjects completed. The memory of the objects without

> destructors is already freed. If you are still in an out of memory
> situation when the destructors are run, then its also very likely that
> you are in a not much better situation afterwards.

That's not quite right. You can't reclaim the memory of any object
which is reachable from any object with a destructor which has not yet
been called. But you're right that some memory can be reclaimed before
calling destructors.

JEff

Peter Haworth

unread,
Oct 24, 2003, 12:33:17 PM10/24/03
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org, Leopold Toetsch
On Tue, 21 Oct 2003 13:20:34 -0400 (EDT), Dan Sugalski wrote:
> On Tue, 21 Oct 2003, Jeff Clites wrote:
> > 2) I don't see it as a huge problem that serialization code could end
> > up creating additional objects if called from a destroy() method.
>
> User code may, parrot may not. The reasons are twofold--while parrot will
> let you shoot yourself in the foot, it provides the gun, not the foot.
> It should also be possible for carefully written destroy methods to
> serialize but not eat any headers or memory. (I can see this being the
> case in some embedded applications or systems) If we make it so freezing
> is not a guaranteed possibility at destroy time then this can't happen and
> it lessens the utility of the system some.
>
> We can, if we choose, loosen the restriction later if sufficient reason is
> presented. Can't really tighten it, though, so for now...

--
Peter Haworth p...@edison.ioppublishing.com
"But do not program in COBOL if you can avoid it."
-- "The Tao of Programming"

Peter Haworth

unread,
Oct 24, 2003, 12:33:06 PM10/24/03
to Dan Sugalski, Juergen Boemmels, Leopold Toetsch, perl6-i...@perl.org
On Tue, 21 Oct 2003 11:39:38 -0400 (EDT), Dan Sugalski wrote:
> On Tue, 21 Oct 2003, Juergen Boemmels wrote:
> > This is a question of what is allowed at destruction time. You don't
> > want to allow memory allocation, but allow freezing. That gets hard,
> > because you need at least allocate the STRING where you want to put your
> > frozen stream.
>
> It's more a question of what we we require the engine to do, vs what user
> code is allowed to do. A user program is allowed to write code that can
> fail at destroy time, however the infrastructure we provide (including, in
> this case, freezing--while I don't like it there's no choice) can't fail
> that way. It's the reason the DOD and GC systems don't allocate memory (or
> didn't--they shouldn't) when they run. The engine's not allowed to have
> failure modes in critical sections.

How can you serialise without using any extra memory? Even if you stream the
serialised data directly to disk one byte at a time (which could fail just
as easily as allocating memory to hold the serialised data in memory),
serialisation of anything more complex than a native type is inherently
recursive. Either you use up call stack space recursing over the PMCs, or
you need an explicitly manage a stack/queue of PMCs not yet serialised.

Have I missed some wonder of modern computing, or just something so obvious
I can't see it?

--
Peter Haworth p...@edison.ioppublishing.com
"What's so unpleasant about being drunk?"
"You ask a glass of water."
-- Douglas Adams, HHGTTG

Dan Sugalski

unread,
Oct 24, 2003, 12:52:24 PM10/24/03
to Peter Haworth, Juergen Boemmels, Leopold Toetsch, perl6-i...@perl.org

The latter. :)

We avoid extra memory usage in a destroy-time traversal for freezing the
same way we avoid it for DOD--make sure that the memory we need is
available to the traversal system in advance.

This is one of the reasons I wanted to use the next_for_gc pointer in the
PMCs to build the traversal chain, as the memory's already allocated and
we wouldn't need to expand the PMCs any.

It is certainly possible that the mechanism the user code chooses to use
to freeze will exhaust some resource--disk space, RAM, bandwidth
allotment, Universal electron supply, or something of the sort. That's
fine, it's user code and is allowed to fail like that. We, on the other
hand, have to make sure that we don't fail as part of the traversal.

It's a pain, sure, but there are a number of applications where worst-case
behaviour guarantees are required. I don't know that Parrot will ever be
seriously used in an embedded application or semi-embedded application
(like a Palm device) but enough folks have made noise about it that I'm
trying not to make it impossible.

I do realize that we're making some of our normal-case resource usages
higher in exchange for a better guaranteed worst case, but at the moment I
think that's worthwhile.

Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Peter Haworth

unread,
Oct 24, 2003, 1:15:13 PM10/24/03
to perl6-i...@perl.org
On Fri, 24 Oct 2003 17:33:17 +0100, Peter Haworth wrote:
> [stuff he didn't mean to send]

Sorry. Looks like I hit Send instead of Cancel.

--
Peter Haworth p...@edison.ioppublishing.com
"this system is slightly less secure than putting your IP address and
root password in big letters in a 30-second Superbowl commercial.
(Though I may be wrong--it's possible I'm underestimating the danger)"
-- Dan Sugalski

Reply all
Reply to author
Forward
0 new messages