Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Freezing and thawing

9 views
Skip to first unread message

Dan Sugalski

unread,
Nov 21, 2003, 12:10:26 PM11/21/03
to perl6-i...@perl.org
The beginnings of freezing and thawing are going in, which is good -- this
should get us to PMC constants in the bytecode files, and proper
over-the-wire freeze/thaw stuff. We're a bit raw at the moment, which is
fine, so I wanted to give a heads up as to where things are moving, what
needs doing, and things to keep in mind as you write PMC classes.
Numbered, mainly for easy referral later:

1) When freezing a PMC, the result should *always* be a PMC. Even if you
know your class only holds a lower level X, for whatever value X might be,
it still needs to be a full PMC in the data stream.

2) A PMC, when frozen, must maintain its internal PMC structure in the
data stream. That means if you have a hash of PMCs, you have to store each
PMC (via the freeze API) in the data stream. It's possible that they may
be shared with other PMCs (whether your PMC class knows it or not) in the
data stream and must be thawed in a way that maintains that sharing.

3) So long as the constraints of #1 and #2 are met, PMCs can freeze their
internal data any way they want. If pushing out the integer 3 in the
stream is enough to encode "It was a dark and stormy night" well, good for
you.

4) Don't count on the order PMCs are frozen, nor on the order that they
are reconstituted.

5) The vtable API for freeze/thaw is as follows:

freeze(thingie *freezecontext) - called when a PMC should freeze itself
thaw(thingie *thawcontext) - called when an empty PMC should reconstitute
itself
thawfinish() - Called on each PMC after the full stream of PMCs has been
thawed

We don't, I realized, need mark as a vtable method for freezing or
thawing.

6) The freeze/thaw library needs to consist of the following calls (some
of which we already have)

freezepmc(pmc *, name)
freezestring(string *, name)
freezeint(int, name)
freezefloat(float, name)
startlist(name)
endlist(name)
startpairs(name)
endpairs(name)
addpmctolist(pmc *)

name is a string pointer or null if there is no name. (There should be a
context pointer there too, but this is pine and editing's a pain) These
are all the functions that the PMC freeze routine calls to save off parts
of themselves, *not* anything that opcodes call.

The freezepmc routine here, it should be noted, acts as a mark routine of
sorts, which is why we don't need one on the PMC itself for this.
addpmctolist() just throws a PMC on the list of PMCs to be frozen to this
stream without actually freezing it at the current spot in the stream. (It
could, for example, be used to put the package stash on the list without
actually hanging it off the PMC making the add call)

Leo's idea of passing in a struct as part of the freeze is a good one, as
one of the things hanging off it can be a vtable with all these calls in
it, so we can have multiple freeze methods active at once. (Arguments as
to why that's a good thing are separate and I don't want to go there :)


This clear enough? We're most of the way there, we just need to make a few
alterations in the vtable names and functions, and some rejigging of the
draft freeze/thaw stuff that's in now.

Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,
Nov 21, 2003, 3:14:49 PM11/21/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

> 5) The vtable API

[ ... ]

> thawfinish()

This is very probably necessary to perform some final state adjustemnt,
when all the contained PMCs are done but not as a general "to be
called on each". E.g. a plain scalar PMC doesn't need it. I'd rather
have such functionality on demand i.e. a PMC on freezing sets a flag:
"I need post-processing".

> 6) The freeze/thaw library needs to consist of the following calls (some
> of which we already have)

[ ... ]

> startlist(name)
> endlist(name)
> startpairs(name)
> endpairs(name)

Do you have some more info about above functions and what they are
intended to perform?

> addpmctolist(pmc *)

Doesn't fit into the scheme. The latter tells the engine, to put a PMC
on the todo_list. That ought to be part of the context (info) structure.
All the other library functions are serializer-specific.

> The freezepmc routine here, it should be noted, acts as a mark routine of
> sorts, which is why we don't need one on the PMC itself for this.

That doesn't play with other usage of vtable->visit like destruction
ordering. When going that way, we are loosing the generalization that
currently is in the scheme. (At least if you are using the term "mark"
here as what vtable->visit() now is).

> Leo's idea of passing in a struct as part of the freeze is a good one, as
> one of the things hanging off it can be a vtable with all these calls in
> it, so we can have multiple freeze methods active at once. (Arguments as
> to why that's a good thing are separate and I don't want to go there :)

Actually, I'm (longterm) thinking of making a PMC out of PackFile_<xxx>
and what's now called IMAGE_IO. The generalization could be finally:
Writing a .pbc file image is freezing/writing such a PackFile PMC. Or
adding an item to a packfile is appending a frozen image. Having all
this functionality inside one or more PMCs, makes it easy to change
these formats - even to xml (brrr).

> Dan

leo

Dan Sugalski

unread,
Nov 24, 2003, 9:48:08 AM11/24/03
to Leopold Toetsch, perl6-i...@perl.org
On Fri, 21 Nov 2003, Leopold Toetsch wrote:

> Dan Sugalski <d...@sidhe.org> wrote:
>
> > 5) The vtable API
>
> [ ... ]
>
> > thawfinish()
>
> This is very probably necessary to perform some final state adjustemnt,
> when all the contained PMCs are done but not as a general "to be
> called on each". E.g. a plain scalar PMC doesn't need it. I'd rather
> have such functionality on demand i.e. a PMC on freezing sets a flag:
> "I need post-processing".

If we want to add a flag, or a call as part of the freeze API that lets a
PMC note that it needs post-processing, that's fine. The default vtable
entry should just do nothing.

> > 6) The freeze/thaw library needs to consist of the following calls (some
> > of which we already have)
>
> [ ... ]
>
> > startlist(name)
> > endlist(name)
> > startpairs(name)
> > endpairs(name)
>
> Do you have some more info about above functions and what they are
> intended to perform?

A list is just a series of values in this context -- the sort of thing
you'd use to dump out an array or a struct. If you had a PMC that
represented this:

struct foo {
INTVAL bar;
INTVAL baz;
STRING *name;
INTVAL count;
}

the calls the PMC freeze would make would look like

startlist("foo")
freezeint(bar)
freezeint(baz)
freezestring(name)
freezeint(count)
endlist()

start/end pairs does the same thing, only what gets frozen is a series of
pairs (key/value things) rather than individual entries. And yes, I
realize that you can simulate pairs with alternating key/value entries in
the freeze stream, but I'd rather keep them separate.


> > addpmctolist(pmc *)
>
> Doesn't fit into the scheme.

Sure it does. This puts a PMC on the list of PMCs being frozen, assuming
that PMC isn't already on the list. The freeze list drives everything that
gets frozen.

> The latter tells the engine, to put a PMC
> on the todo_list. That ought to be part of the context (info) structure.
> All the other library functions are serializer-specific.

Its not necessarily overridable, true, but it's conceptually one of the
core freeze routines. That's the imprtant part here.

> > The freezepmc routine here, it should be noted, acts as a mark routine of
> > sorts, which is why we don't need one on the PMC itself for this.
>
> That doesn't play with other usage of vtable->visit like destruction
> ordering. When going that way, we are loosing the generalization that
> currently is in the scheme. (At least if you are using the term "mark"
> here as what vtable->visit() now is).

Well, there's the problem. the ->mark vtable entry is really "mark your
children" rather than "mark yourself", which means that the freeze entry
for a PMC corresponds to the mark entry for DOD. When ->freeze is called
for a PMC all the children should be frozen. (Or all the children it
ultmately cares about) This is separate from the mark routine Parrot's API
provides, which is a "put this PMC on the list of PMCs to be visited, if
its not already on the list" routine.

It would've been better if the vtable entry was "mark_children" and the
DOD routine "add_to_todo" but alas they aren't named that, even if its
what's going on.

> > Leo's idea of passing in a struct as part of the freeze is a good one, as
> > one of the things hanging off it can be a vtable with all these calls in
> > it, so we can have multiple freeze methods active at once. (Arguments as
> > to why that's a good thing are separate and I don't want to go there :)
>
> Actually, I'm (longterm) thinking of making a PMC out of PackFile_<xxx>
> and what's now called IMAGE_IO. The generalization could be finally:
> Writing a .pbc file image is freezing/writing such a PackFile PMC. Or
> adding an item to a packfile is appending a frozen image. Having all
> this functionality inside one or more PMCs, makes it easy to change
> these formats - even to xml (brrr).

I'm picturing some interesting recursive definition issues there. At the
moment I'm thinking we'd be better off keeping the freeze format separate
from the bytecode format. Some of the limits on bytecode (like
mmappability) may conflict with how we'd prefer to do serialization by
default. We can look to unify them later, but for now lets keep things
separate.

Leopold Toetsch

unread,
Nov 24, 2003, 12:34:10 PM11/24/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

>> > startpairs(name)
>> > endpairs(name)

> start/end pairs does the same thing, only what gets frozen is a series of
> pairs (key/value things) rather than individual entries. And yes, I
> realize that you can simulate pairs with alternating key/value entries in
> the freeze stream, but I'd rather keep them separate.

In the mean time I've checked in freeze/thaw for PerlHash. It uses an
element count as list does. We could of course use your proposed scheme
with start/end-markers too. But thawing a list of a (first) unknown amount
of items isn't really as simple as having a count.

> Dan

leo

Dan Sugalski

unread,
Nov 24, 2003, 1:05:58 PM11/24/03
to Leopold Toetsch, perl6-i...@perl.org
On Mon, 24 Nov 2003, Leopold Toetsch wrote:

> Dan Sugalski <d...@sidhe.org> wrote:
>
> >> > startpairs(name)
> >> > endpairs(name)
>
> > start/end pairs does the same thing, only what gets frozen is a series of
> > pairs (key/value things) rather than individual entries. And yes, I
> > realize that you can simulate pairs with alternating key/value entries in
> > the freeze stream, but I'd rather keep them separate.
>
> In the mean time I've checked in freeze/thaw for PerlHash. It uses an
> element count as list does. We could of course use your proposed scheme

^^^^^
You mis-spelled "will" here.

> with start/end-markers too. But thawing a list of a (first) unknown amount
> of items isn't really as simple as having a count.

Right -- it's even simpler, and makes doing logical skips through the
frozen data easier for code doing external examination, since there's
structure in the frozen data rather than a glob of random data elements.

Leopold Toetsch

unread,
Nov 24, 2003, 4:37:39 PM11/24/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> On Mon, 24 Nov 2003, Leopold Toetsch wrote:

>> In the mean time I've checked in freeze/thaw for PerlHash. It uses an
>> element count as list does. We could of course use your proposed scheme
> ^^^^^
> You mis-spelled "will" here.

Your explanation about the start/stop pairs just arrived after I had
coded the PerlHash freezing. But that's small and easy to change.

>> with start/end-markers too. But thawing a list of a (first) unknown amount
>> of items isn't really as simple as having a count.

> Right -- it's even simpler, and makes doing logical skips through the
> frozen data easier for code doing external examination, since there's
> structure in the frozen data rather than a glob of random data elements.

Having a count in front is as structured as having items in between start
stop markers and I don't see any randomness.

I seem currently missing some detail bits.

e.g. freezing and IntList would be now:

[ pmc-id pmc-type N (int-values)*N ]

I don't have a clue, how the end-marker could transparently be coded to
discern the end of the list from a value. Or do you mean
start/end-markers plus a count?

For a Hash the sequence now is:

[ pmc-id pmc-type N (str-key, pmc-value)*N ]

So here again (w/o count) I would have to be prepared to get an end-pair
or a key.

> Dan

leo

Dan Sugalski

unread,
Nov 25, 2003, 9:59:23 AM11/25/03
to Leopold Toetsch, perl6-i...@perl.org
On Mon, 24 Nov 2003, Leopold Toetsch wrote:

> Dan Sugalski <d...@sidhe.org> wrote:
> > On Mon, 24 Nov 2003, Leopold Toetsch wrote:
>
> >> In the mean time I've checked in freeze/thaw for PerlHash. It uses an
> >> element count as list does. We could of course use your proposed scheme
> > ^^^^^
> > You mis-spelled "will" here.
>
> Your explanation about the start/stop pairs just arrived after I had
> coded the PerlHash freezing. But that's small and easy to change.

Sorry. Bad day yesterday, and I was cranky -- that was uncalled-for.

> >> with start/end-markers too. But thawing a list of a (first) unknown amount
> >> of items isn't really as simple as having a count.
>
> > Right -- it's even simpler, and makes doing logical skips through the
> > frozen data easier for code doing external examination, since there's
> > structure in the frozen data rather than a glob of random data elements.
>
> Having a count in front is as structured as having items in between start
> stop markers and I don't see any randomness.

It's not structure, it's convention. Something inspecting the data from
the outside would have no idea that that particular integer was any
different from any other integer. More importantly is the optional name
that can be attached to the data elements (including entire lists and pair
sets). I'd really like to make sure this works and gets used by the base
PMCs so there are good examples. It future-proofs things to some extent as
it removes ordering-dependencies--elements can be identified by their
labels rather than the position of things in the stream.

The easiest thing to do would be to have tag bytes that mark what's in the
stream. We're going to have to do this to some extent anyway so the
thawing engine knows what's being thawed--it's good to be able to
distinguish between int, float, string, or PMC (not to mention all the
string attributes) so we might as well have pair, list, and list-of-pair
markers as well.

Yes, this does bloat out the format some, but it

> I seem currently missing some detail bits.
>
> e.g. freezing and IntList would be now:
>
> [ pmc-id pmc-type N (int-values)*N ]
>
> I don't have a clue, how the end-marker could transparently be coded to
> discern the end of the list from a value. Or do you mean
> start/end-markers plus a count?

I haven't really addressed the thaw API, which is something of a problem.
Try this:

STRING *thaw_string()
INTVAL *thaw_int()
NUMVAL *thaw_num()
PMC *thaw_pmc()
PAIR *thaw_pair()
THING *thaw_next()

STRING *cur_label()

INTVAL last_error()

int next_type()
STRING *next_label()
skip_item()

THING is a union of a float, int, PMC *, STRING *, and PAIR *.

PAIR is a struct:

struct PAIR {
INTVAL labeltype;
union {INTVAL; NUMVAL; PMC *; STRING *} label;
INTVAL valtype;
union {INTVAL; NUMVAL; PMC *; STRING *} value;
}

(Only syntactically correct)

Everything returns temporary pointers--if something's not available
because you've run out you get a NULL pointer. This includes being within
a list and running off the end even if there's more data.

last_error() tells you what went wrong most recently, which is how you can
tell that the last NULL was because you ended the list being processed
rather than ran off the end of the PMC, or asked for a string and the next
type was an INTVAL instead.

next_type and next_label tell you what's coming up next and what it's
named, if its named. skip_next skips the next thing (which may be an
entire list or set of pairs). cur_label returns the label of the most
recently thawed thing.

I'm not sure if we want to have any sort of random access functionality.
It'd be nice, certainly, but it adds a certain complexity, and may be in
general infeasable (If we're thawing from a socket or pipe it'd require
buffering, which with potentially large data sets might be unreasonable)
so I think we'll skip it for now and consider adding it later, unless
people think we can add in an optional random-access API and not get so
dependent on it that it's no longer optional.

Leopold Toetsch

unread,
Nov 26, 2003, 6:10:35 AM11/26/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

[ a lot of design ]

> Yes, this does bloat out the format some, but it

Seems so. But anyway, can you put together a design document and check
it in, so that this whole stuff isn't lost in history.

Some additional random remarks and questions:

* Where do these name-labels come from?
* The new image format implies, that each simple item is preceeded by a
tag-byte having the type of the following item.
* For a dense image stream, we have to give up to use the opcode_t aligned
packfile constants routines. OTOH, e.g. small integer can be coded
with less bytes.
* Should packfile constants use the same format? Less code duplication,
less errors.
* What kind of data inspection from outside do you think of?
* Implementing thaw and error handling will be great fun :)

I'll still add a few bits to the current scheme to be able to
freeze/thaw packfile constants (Sub for now, maybe Classes). Then things
can settle and await the implementation of freeze/thaw 0.2.

> Dan

leo

0 new messages