Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Object freezing
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 48 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Dan Sugalski  
View profile  
 More options Oct 20 2003, 4:48 pm
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Mon, 20 Oct 2003 16:38:15 -0400 (EDT)
Local: Mon, Oct 20 2003 4:38 pm
Subject: Object freezing
Since this has come up again, and it's apparent that the last time around
I wasn't sufficiently clear, it's time to go through this again, and for
the final time. (I will beat this thing into the ground by the time we're
done)

The way object serialization will be handled from the bytecode level is
simple: You call freeze or thaw, like so:

  freeze S3, P5 # Freezes the P5 PMC, and children, to the string in S3
  thaw P5, S3   # Thaw out the serialized object

From within the PMC vtables, there are two vtable entries that are of
importance:

  void freeze(interpreter, STRING *)
  PMC *thaw(interpreter, STRING *)

freeze tells the PMC to freeze itself and put the results of the freezing
in the string. This is an object method

Thaw tells the class to construct a PMC based on the passed-in string
data. It's a class method, though it may be called on an existing PMC. It
always produces a new PMC. (We can argue over that one if need be)

The runtime will provide the following functionality

  Freezing core singleton data (int, num, string, PMC)
  freezing named lists
  freezing named key/value lists
  thaw core singleton data
  thaw named lists
  thaw named key/value lists

  chill PMC *
  warm STRING *

When freezing or thawing, if child PMCs are frozen/thawed using the
appropriate calls to the runtime (rather than the freeze/thaw method of a
PMC doing it manually for child PMCs) then the runtime will guarantee that
multiply-referenced PMCs will be instantiated only once.

The chill and warm runtime methods take a PMC or a frozen representation
of a PMC (respectively) and provide a human readable version of that PMC.
If the PMC has a  __debug_chill or __debug_warm method (which are
optional) then that method will be called to provide a human-readable
version of the PMC, otherwise the default will be used.

The encoding methods for freezing (and corresponding decoding methods for
thawing) may be overridden to provide an alternate serialization format.
The only requirement of the serialziation format is that it starts with a
minimally valid piece of XML that encodes the format and version of the
serialized format. The rest of the serialization format need not be XML.
This is done because the format and version of the serialized data are
required in the stream, and making it XML incoveniences nobody and makes
the XML folks happy. It's good enough, and not up for discussion.

The overriding API is not, as yet, specified. We can do that when we're
done fighting out over the semantics here.

The following rules will be in place:

  1) Freezing at the destruction level may *not* use any additional memory
     for object traversal
  2) Overlapping freezes are not allowed
  3) Freezing while a freeze is in progress is deferred until the current
     freeze is done
  4) Destruction level freezing may not freeze non-dead PMCs

I can be convinced that #4 is not going to fly, but be aware that it will
potentially cost an extra pointer per PMC.

Requirement #1 generally mandates that all PMCs must have sufficient
information available all the time to perform serialization. It also
mandates that there can't be iterators, save hashes, or other whatnots,
since the potential for destroy-time serialization means those methods are
untenable. I'm open to argument that the freeze and thaw methods can be
context sensitive and the non-destroy case can be memory hungry and
multithreaded, but that's a dodgy thing. Make a really good case.

Note that I do *not* want to have multiple object traversal systems in
parrot! We have one for DOD, and proposals have ranged upwards from there.
No. That is *not* happening--the chance for error is significant, the
side-effects of the error annoying and tough to track down for complex
cases (akin to the trouble with tracking down GC issues), and just not
necessary. (Perhaps desirable for speed/space reasons, but desirable
isn't necessary) This is something that's hidden under a number of layers
of API, so regardless of the outcome it doesn't affect the assembly, PMC,
or runtime API.

Clone can be handled by switching in encodings (one that produces a PMC
rather than an encoded stream, or one that just does an intermediate
string and chews memory, at least for now), and doesn't require a separate
API entry point. I'm unconvinced that objects need to clone themselves
often enough to bother with a separate API entry for 'em, or if one is
provided that VTABLE_clone() should do anything for non-simple objects
besides calling VTABLE_thaw(VTABLE_freeze()).

The thread-safety is an issue, and as such interpreters that share data
should share a mutex. At the moment I think we should have separate
symbolic mutexes that may, potentially, resolve down to a single mutex,
but when freezing an interpreter should grab its freeze mutex. If there
are multiple interperters in the thread group they will all have the same
mutex, and as such will be single-threaded for the freeze.

We can fight about this, too, but the fight is orthogonal to the other
issues.

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 20 2003, 4:48 pm
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Mon, 20 Oct 2003 16:46:19 -0400 (EDT)
Local: Mon, Oct 20 2003 4:46 pm
Subject: Re: Object freezing
Note that I forgot to account for the possibility of freezing code and
environment. As such, the freeze and thaw opcodes have the following
variant:

   freeze Sx, Py, Iz
   thaw Py, Sx, Iz

where Z is the freeze/thaw level. O (the default) is the PMC only, 1
includes the global/class data for the PMC, and 2 includes the code for
the packages for the PMC, pruned for accessibility.

Also note that if teh PMC being frozen is a closure or continuation, then
all lexical and global data to reinstantiate the PMC (including
potentially code) will be frozen at level 0.

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Melvin Smith  
View profile  
 More options Oct 20 2003, 8:48 pm
Newsgroups: perl.perl6.internals
From: mrjoltc...@mindspring.com (Melvin Smith)
Date: Mon, 20 Oct 2003 20:29:11 -0400
Local: Mon, Oct 20 2003 8:29 pm
Subject: Re: Object freezing
At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:

>The encoding methods for freezing (and corresponding decoding methods for
>thawing) may be overridden to provide an alternate serialization format.
>The only requirement of the serialziation format is that it starts with a
>minimally valid piece of XML that encodes the format and version of the
>serialized format. The rest of the serialization format need not be XML.
>This is done because the format and version of the serialized data are
>required in the stream, and making it XML incoveniences nobody and makes
>the XML folks happy. It's good enough, and not up for discussion.

Does that mean all encodings will start with the "standard" markup header:

<xml version="1.0"?>

I'm not sure what minimally valid means, but I think this is considerable
overhead.

Also, I don't think I like using freeze/thaw format for the standard PBC
format if we are embedding XML everywhere, which is ok, but Leo had
mentioned that we would use freeze/thaw for loading classes. With
XML in the picture I think it becomes too bulky.

If we are to use recursive serialization for classes then we'd end up with
fat bytecodes; every field and method would have the XML header.

-Melvin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Oct 20 2003, 8:48 pm
Newsgroups: perl.perl6.internals
From: u...@stemsystems.com (Uri Guttman)
Date: Mon, 20 Oct 2003 20:41:21 -0400
Local: Mon, Oct 20 2003 8:41 pm
Subject: Re: Object freezing

>>>>> "MS" == Melvin Smith <mrjoltc...@mindspring.com> writes:

  MS> At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
  >> The encoding methods for freezing (and corresponding decoding methods for
  >> thawing) may be overridden to provide an alternate serialization format.
  >> The only requirement of the serialziation format is that it starts with a
  >> minimally valid piece of XML that encodes the format and version of the
  >> serialized format. The rest of the serialization format need not be XML.
  >> This is done because the format and version of the serialized data are
  >> required in the stream, and making it XML incoveniences nobody and makes
  >> the XML folks happy. It's good enough, and not up for discussion.

  MS> Does that mean all encodings will start with the "standard" markup header:

  MS> <xml version="1.0"?>

  MS> I'm not sure what minimally valid means, but I think this is considerable
  MS> overhead.

how so? most serialized things (other than simple scalars) will generate
much more data than that.

  MS> Also, I don't think I like using freeze/thaw format for the
  MS> standard PBC format if we are embedding XML everywhere, which is
  MS> ok, but Leo had mentioned that we would use freeze/thaw for
  MS> loading classes. With XML in the picture I think it becomes too
  MS> bulky.

i don't he was referring to the current p5 freeze/thaw formats or
code. he was just using freeze/thaw as the methods/calls to be made to
do the work. the actual formats are undecided now AFAICT.

  MS> If we are to use recursive serialization for classes then we'd end
  MS> up with fat bytecodes; every field and method would have the XML
  MS> header.

the xml header is only for the top level thing in the serialized
tree. if it is nonstandard you have to mark the serialized string so you
can call the matching thaw methods. each object in the serialized tree
will have to support that method or some code has to be supplied to
handle all the freeze/thaw calls made by the tree traverser code. so the
xml header is just a way to mark which external class will be used for
the freeze/thaw and it will always be called for each object in the
tree. you can't mix/match different freeze/thaw techniques in one
operation (yes, you could but then you do have to mark each node with
its technique which is a lot of overhead and painful in other ways).

uri

--
Uri Guttman  ------  u...@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Melvin Smith  
View profile  
 More options Oct 20 2003, 9:48 pm
Newsgroups: perl.perl6.internals
From: mrjoltc...@mindspring.com (Melvin Smith)
Date: Mon, 20 Oct 2003 21:23:42 -0400
Local: Mon, Oct 20 2003 9:23 pm
Subject: Re: Object freezing
At 08:41 PM 10/20/2003 -0400, Uri Guttman wrote:

See below.

That answers my question of overhead with regards to XML headers.
If there is a single header for defining the type of "stream" then the
actual serialization can be dense enough.

I just needed clarification. :)

I think a grammar should be developed for this, since it should likely
be implemented as a recursive parser.

-Melvin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Oct 20 2003, 10:48 pm
Newsgroups: perl.perl6.internals
From: u...@stemsystems.com (Uri Guttman)
Date: Mon, 20 Oct 2003 21:56:30 -0400
Local: Mon, Oct 20 2003 9:56 pm
Subject: Re: Object freezing

>>>>> "MS" == Melvin Smith <mrjoltc...@mindspring.com> writes:

  MS> That answers my question of overhead with regards to XML headers.
  MS> If there is a single header for defining the type of "stream" then the
  MS> actual serialization can be dense enough.

  MS> I just needed clarification. :)

well, that was my take on what dan said. i could have misinterpreted him.

  MS> I think a grammar should be developed for this, since it should likely
  MS> be implemented as a recursive parser.

there are many ways to encode serialized stuff and not all are conducive
to grammars IMO. some may be simple type/length/binary value things
which are best expanded in nice fast loops. other formats could just go
into pure ugly XML which can be parsed back into data. also dan
mentioned the half-frozen human readable types (which both the binary
AND XML formats need!) which don't (or may not) get parsed again.

uri

--
Uri Guttman  ------  u...@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Melvin Smith  
View profile  
 More options Oct 20 2003, 10:48 pm
Newsgroups: perl.perl6.internals
From: mrjoltc...@mindspring.com (Melvin Smith)
Date: Mon, 20 Oct 2003 22:15:21 -0400
Local: Mon, Oct 20 2003 10:15 pm
Subject: Re: Object freezing
At 09:56 PM 10/20/2003 -0400, Uri Guttman wrote:

When I say grammar, I don't mean as input to a parser generator.
I'm accustomed to using grammars for any sort of non-trivial packet or
data format. It makes dealing with and discussing the format much easier.

-Melvin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregor N. Purdy  
View profile  
 More options Oct 21 2003, 1:48 am
Newsgroups: perl.perl6.internals
From: gre...@focusresearch.com (Gregor N. Purdy)
Date: Mon, 20 Oct 2003 22:09:33 -0700
Local: Tues, Oct 21 2003 1:09 am
Subject: Re: Object freezing

> the xml header is only for the top level thing in the serialized
> tree. if it is nonstandard you have to mark the serialized string so you
> can call the matching thaw methods. each object in the serialized tree
> will have to support that method or some code has to be supplied to
> handle all the freeze/thaw calls made by the tree traverser code. so the
> xml header is just a way to mark which external class will be used for
> the freeze/thaw and it will always be called for each object in the
> tree. you can't mix/match different freeze/thaw techniques in one
> operation (yes, you could but then you do have to mark each node with
> its technique which is a lot of overhead and painful in other ways).

I find the notion of an "XML header" a bit confusing, given Dan's
statement to the effect that it was a throw to XML folks.

I think anything "XML folks" will be interested in will entail
*wrapping* stuff, not *prefixing* it.

Perhaps Dan meant "wrapper" or "envelope" when he said "header", but
its not clear to me without an example. So, I'll put some examples out
there for folks to throw stuff at:

Here's an example of a Parrot Magic Ice (PMI) header (no love here from
XML folks):

  <pmi class="foo" .../>
  # Data of some sort determined by class foo

(This is the way I read Dan's original comment.)

Taken in its entirety, this chunk isn't XML. Sure, you could pull out
the first line and pass it to something that understands XML, and it
wouldn't choke. But, if there is value here, I'm missing it. It could
just as well have been SMTP style:

  PMI-Class: foo
  Some-Other-Header: ...
  # Data of some sort

Here's an example of a wrapper:

  <pmi class="foo" ...>
    <!-- Data of some sort determined by class foo -->
  </pmi>

That's a bit better, although bear in mind that if the intent is that
you could throw the entire chunk at an XML parser and have it not
complain, you are going to have to take some care in generating the
guts. Binary data is in general right out (where does it end? What if
it contains fragments that look like XML markup?). Sure, you could
slap it in a <![CDATA[ ... ]]> film, but you'd still have to watch
out for the possibility that the body might want the sequence "]]>"
in it somewhere...

.............................................

OK. Now, I'll throw one crazy idea into the mix. Suppose for just a
moment that instead of using XML proper, or leaving things completely
open-ended, we adopted SAX events as our interchange. it would be
roughly equivalent to:

  begin-element pmi { class => 'foo' }
  # events for the guts
  end-element pmi

Now, anyone who likes XML can hook up a DOM tree builder, or an XML
renderer to the stream of events and be happy as a clam. But, for
storing stuff on disk, we are free to invent a more compact
representation of the events. Thawing entails interpreting the events
as object allocations and state changes to the objects.

I can imagine some reasonably compact representations...

Regards,

-- Gregor

--
Gregor Purdy                            gre...@focusresearch.com
Focus Research, Inc.               http://www.focusresearch.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Clites  
View profile  
 More options Oct 21 2003, 1:48 am
Newsgroups: perl.perl6.internals
From: jcli...@mac.com (Jeff Clites)
Date: Mon, 20 Oct 2003 22:41:05 -0700
Local: Tues, Oct 21 2003 1:41 am
Subject: Re: Object freezing
On Oct 20, 2003, at 10:09 PM, Gregor N. Purdy wrote:

> Here's an example of a wrapper:

>   <pmi class="foo" ...>
>     <!-- Data of some sort determined by class foo -->
>   </pmi>

> That's a bit better, although bear in mind that if the intent is that
> you could throw the entire chunk at an XML parser and have it not
> complain, you are going to have to take some care in generating the
> guts. Binary data is in general right out (where does it end? What if
> it contains fragments that look like XML markup?). Sure, you could
> slap it in a <![CDATA[ ... ]]> film, but you'd still have to watch
> out for the possibility that the body might want the sequence "]]>"
> in it somewhere...

A slight aside, but just to build on what you said since this is an
often-misunderstood facet of XML, there are a bunch of other reasons
you can't just throw binary data into an arbitrary XML document inside
of a CDATA section:

1) If you declare the encoding of the XML to be UTF-8, then your
document won't be well-formed XML if your binary data doesn't look like
legitimate UTF-8 data (which it won't in the general case--many bytes
and byte sequences can't occur in UTF-8).

2) XML parsers are free to transcode--so a document declared in one
encoding may pass data on to its client application in another
encoding, which will mangle your binary data. For instance, the expat
parser can consume documents in a variety of encodings, but always
passes text to its callbacks in UTF-8.

3) XML parsers are required to line-ending normalize, so anything which
looks like a CR or CRLF will turn into an LF, even inside of a CDATA
section.

That said, you can get around all of this by base-64 encoding your
binary data for storage in XML, since that turns binary data into text.
On the other hand, it's a waste of space and more CPU cycles to consume
than a more obvious binary format.

And that said, I read Dan's email as just meaning an XML header, but I
didn't quite understand exactly what he had in mind either.

JEff


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Oct 21 2003, 4:00 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 21 Oct 2003 09:46:20 +0200
Local: Tues, Oct 21 2003 3:46 am
Subject: Re: Object freezing

Dan Sugalski <d...@sidhe.org> wrote:
> Since this has come up again, ...

[ FYI: I was starting implementing this, based on a general traverse
       vtable with callback functions. Two patches got backed out by
       Dan after some discussion in PM ]

> ... and it's apparent that the last time around
> I wasn't sufficiently clear, it's time to go through this again, and for
> the final time.

I'd be really happy, if you could go through my concerns mentioned in
the summary in the thread:
  Subject: Re: [RfC] vtable->dump
  Date: Thu, 4 Sep 2003 12:31:08 +0200

> ... (I will beat this thing into the ground by the time we're
> done)

Sorry for the inconvenience and being ignorant ...

>   PMC *thaw(interpreter, STRING *)

This should IMHO be able to create constant PMCs out of metadata, e.g.
for subroutine objects. So there should be some means to tell thaw() to
create PMC(s) in the constant_pmc_pool.

> The chill and warm runtime methods take a PMC or a frozen representation
> of a PMC (respectively) and provide a human readable version of that PMC.

I dunno, why chill() is superior to dump() or pretty_print(), but the
name doesn't really matter.

>   1) Freezing at the destruction level may *not* use any additional memory
>      for object traversal

What is "Freezing at the destruction level"? Is this anyhow related to
destruction ordering?

> Note that I do *not* want to have multiple object traversal systems in
> parrot! We have one for DOD, and proposals have ranged upwards from there.
> No. That is *not* happening--the chance for error is significant, the
> side-effects of the error annoying and tough to track down for complex
> cases (akin to the trouble with tracking down GC issues), and just not
> necessary. (Perhaps desirable for speed/space reasons, but desirable
> isn't necessary)

DOD's mark() routine has different requirements then a general
traverse() for freeze(), chill(), clone(), and destruction ordering.
Using just mark() will have these side effects that you want to avoid.

A general traverse() can be depth first of breadth first, mark() isn't
required do have any specific ordering as long as it sets live bits
everywhere.

mark() is called permanently in a running interpreter, that does non
trivial things. There are shortcuts for scalars, DOD is highly optimized
not to destroy cache coherency. Using mark() also implies to back out
my small PMC patches. All the advantages of smaller scalars are gone
then.

While freeze() and friends have to pull in each PMC into the cache, just
setting the live bit on a PMC hasn't. Further: Lukes proposal for
speeding up timely destruction puts objects either in front or at the
end of the next_for_GC chain. This IMHO implies that mark() is unusable
as your general and solely iterator.

> ... This is something that's hidden under a number of layers
> of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> or runtime API.

So when its hidden, I really don't understand, why you are insisting on
an (IMHO) suboptimal design.

> The thread-safety is an issue,

While all schemes aren't thread-safe from user level (e.g.
manually sorting an array containing shared PMCs, while it gets
frozen), your scheme isn't thread-safe at low-level, as the next_for_GC
pointer inside the PMC is used as a duplicate marker. But if a user
changes shared resources its a user problem. We only guarantee atomic
updates per PMC (s. P6E p 86f by Dan).

>                                    Dan

Comments addressing all these issues are highly welcome,
leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Juergen Boemmels  
View profile  
 More options Oct 21 2003, 7:48 am
Newsgroups: perl.perl6.internals
From: boemm...@physik.uni-kl.de (Juergen Boemmels)
Date: 21 Oct 2003 13:33:51 +0200
Local: Tues, Oct 21 2003 7:33 am
Subject: Re: Object freezing

Leopold Toetsch <l...@toetsch.at> writes:
> >   1) Freezing at the destruction level may *not* use any additional memory
> >      for object traversal

This is a really hard problem. In some early experiments with
destruction ordering (one of the problems wich need iteration) I
didn't get around with allocating new memory, or recursing on the
stack. It may be that we can get arround with a second pointer, but
I'm not sure.

> What is "Freezing at the destruction level"? Is this anyhow related to
> destruction ordering?

> > Note that I do *not* want to have multiple object traversal systems in
> > parrot! We have one for DOD, and proposals have ranged upwards from there.
> > No. That is *not* happening--the chance for error is significant, the
> > side-effects of the error annoying and tough to track down for complex
> > cases (akin to the trouble with tracking down GC issues), and just not
> > necessary. (Perhaps desirable for speed/space reasons, but desirable
> > isn't necessary)

I did some benchmarking (to test our hash implementation, but thats a
different story). One thing I found out: We are completely dominated
by gc. I'm not sure if it was trace_systemareas or the mark method,
but don't put any load on mark.

mark should be as fast as possible. The other uses of traverse for
freeze, dump, destruction-ordering etc. are all more or less called on
user request, so the user needs to know its cost.

One other thing that makes mark different. If we ever want to use a
copying collector (Which is not reachable currently because of the
conservative stack-walking) The mark routine needs to know about the
moving of objects. All other traverse routine never get this problem.

> DOD's mark() routine has different requirements then a general
> traverse() for freeze(), chill(), clone(), and destruction ordering.
> Using just mark() will have these side effects that you want to avoid.

My words. mark() is not traverse() also they do similar things.

> A general traverse() can be depth first of breadth first, mark() isn't
> required do have any specific ordering as long as it sets live bits
> everywhere.

> mark() is called permanently in a running interpreter, that does non
> trivial things. There are shortcuts for scalars, DOD is highly optimized
> not to destroy cache coherency. Using mark() also implies to back out
> my small PMC patches. All the advantages of smaller scalars are gone
> then.

This ist just on more thing of mark() speed.

> While freeze() and friends have to pull in each PMC into the cache, just
> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> speeding up timely destruction puts objects either in front or at the
> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> as your general and solely iterator.

> > ... This is something that's hidden under a number of layers
> > of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> > or runtime API.

> So when its hidden, I really don't understand, why you are insisting on
> an (IMHO) suboptimal design.

We have at the moment 15 (in words fifteen) vtable slots for
dividing/remainder, 5 for multiplikation, 24 for bitwise ops. So
bloating the vtable is by design, but it is the end of world if we
special case the most often called function and have 2 (in words two)
walking functions. Sorry, I think there are other places in the vtable
which need some cleanup.

> > The thread-safety is an issue,

> While all schemes aren't thread-safe from user level (e.g.
> manually sorting an array containing shared PMCs, while it gets
> frozen), your scheme isn't thread-safe at low-level, as the next_for_GC
> pointer inside the PMC is used as a duplicate marker. But if a user
> changes shared resources its a user problem. We only guarantee atomic
> updates per PMC (s. P6E p 86f by Dan).

The thread safty is less a problem for marking. It only needs to make
sure that other threads don't munge the data they are walking. Write
barriers or mutexes might help here. But how to freeze an object of an
other thread? This needs to freeze the whole thread.

> >                                       Dan

> Comments addressing all these issues are highly welcome,
> leo

I think we should address this issue like experimentalists: Create the
general traverse function. (No don't call it mark). Implement freeze,
dump, destruction ordering using this function. When this all is
running, port the mark function to use this new
functionality. Benchmark, and watch the speedup of the brandnew
design (or just find out that the slowdown is not bad enough to
satisfy two walking functions). When the benchmarking is done lets
descide if we need only one walk-function, and only than remove the
mark function.

bye
boe
--
Juergen Boemmels                        boemm...@physik.uni-kl.de
Fachbereich Physik                      Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern             Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 8:48 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 08:11:29 -0400 (EDT)
Local: Tues, Oct 21 2003 8:11 am
Subject: Re: Object freezing

On Mon, 20 Oct 2003, Melvin Smith wrote:
> At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote:
> >The encoding methods for freezing (and corresponding decoding methods for
> >thawing) may be overridden to provide an alternate serialization format.
> >The only requirement of the serialziation format is that it starts with a
> >minimally valid piece of XML that encodes the format and version of the
> >serialized format. The rest of the serialization format need not be XML.
> >This is done because the format and version of the serialized data are
> >required in the stream, and making it XML incoveniences nobody and makes
> >the XML folks happy. It's good enough, and not up for discussion.

> Does that mean all encodings will start with the "standard" markup header:

> <xml version="1.0"?>

Each serialized data stream, yeah. Just once, and probably a few extra
characters in there, to make it legit XML.

Definitely not once per object, just once per stream--if you do:

  freeze S3, P5

and P5 happens to have a PMC that points to your top level symbol table,
the string in S3 will be darned huge, and have the <XML foo=?> thing in
there exactly once. (Well, unless you've chosen an XML encoding, in which
case I expect you've just blown memory... :)

                                Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 8:48 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 08:21:42 -0400 (EDT)
Local: Tues, Oct 21 2003 8:21 am
Subject: Re: Object freezing

Nah, I expect what they'll want is for the entire data stream of
serialized objects to be in XML format. Which is fine--they can have that.
(It's why I mentioned the serialization routines can be overridden)

For an XML stream the header might be <xml parrot format='xml'
version=1.0> with the rest of the stream in XML. A YAML stream would start
<xml parrot format='yaml' version=1.0> with the rest in YAML, and teh
binary format as <xml parrot format='binary' version=1.0>. Or something
like that, modulo actual correct XML.

This way we have a single, fixed-format type/version header, which makes
the initial identification easier and less error-prone. (Possibly even
fit for file and programs of its ilk to note) The binary format won't
care, and teh YAML format shouldn't care (as long as the indenting's
right) but the XML format would, so it seems to make sense to use the XML
stuff for the initial header.

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 9:00 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 08:53:39 -0400 (EDT)
Local: Tues, Oct 21 2003 8:53 am
Subject: Re: Object freezing

On Tue, 21 Oct 2003, Leopold Toetsch wrote:
> Dan Sugalski <d...@sidhe.org> wrote:
> > Since this has come up again, ...

> [ FYI: I was starting implementing this, based on a general traverse
>        vtable with callback functions. Two patches got backed out by
>        Dan after some discussion in PM ]

Right, because you'd implemented some stuff I'd specifically said we
weren't doing, and didn't back them out any of the times I asked...

> > ... and it's apparent that the last time around
> > I wasn't sufficiently clear, it's time to go through this again, and for
> > the final time.

> I'd be really happy, if you could go through my concerns mentioned in
> the summary in the thread:

That's why I did this, in part. It's the plan, until declared otherwise.

> >   PMC *thaw(interpreter, STRING *)

> This should IMHO be able to create constant PMCs out of metadata, e.g.
> for subroutine objects. So there should be some means to tell thaw() to
> create PMC(s) in the constant_pmc_pool.

There should be a way to put PMCs in the constant pool in general. I was
thinking a constant op would work--something like

   constant Ix, [SP]y

to make the string or PMC Y a constant at slot X in the constant pool.
Passing in the PMC header to be filled in also works, though both fail if
you want full PMC trees marked as constants since thawing out a PMC stream
may involve creating multiple PMCs. (In which case we might be better
temporarily switching allocation pools at constant creation time, rather
than passing in PMCs)

> > The chill and warm runtime methods take a PMC or a frozen representation
> > of a PMC (respectively) and provide a human readable version of that PMC.

> I dunno, why chill() is superior to dump() or pretty_print(), but the
> name doesn't really matter.

The important thing is that it's not a vtable method. It's a function that
belongs in the freeze/thaw API as it's just an alternate encoding or
decoding. (Arguably it ought not be a separate API entry at all and just
another encoding scheme, but that requires transcoding serialization
forms, and I'd rather not get into that)

> >   1) Freezing at the destruction level may *not* use any additional memory
> >      for object traversal

> What is "Freezing at the destruction level"? Is this anyhow related to
> destruction ordering?

No. There are some valid cases where an object, after having been declared
dead by the DOD, wants to serialize itself. Persistent object stores
apparently do this, and it makes a certain amount of sense--when the
object goes out of scope the current state is flushed to disk.

It puts a number of unpleasant constraints on the core freeze routines.
User code can violate them and take the consequences, but we can't.

> > Note that I do *not* want to have multiple object traversal systems in
> > parrot! We have one for DOD, and proposals have ranged upwards from there.
> > No. That is *not* happening--the chance for error is significant, the
> > side-effects of the error annoying and tough to track down for complex
> > cases (akin to the trouble with tracking down GC issues), and just not
> > necessary. (Perhaps desirable for speed/space reasons, but desirable
> > isn't necessary)

> DOD's mark() routine has different requirements then a general
> traverse() for freeze(), chill(), clone(), and destruction ordering.
> Using just mark() will have these side effects that you want to avoid.

The only thing that mark does that the general traversal doesn't, in the
abstract, is flip the object's live flag. Everything else is an
optimization of code which we can, if we need, discard.

> A general traverse() can be depth first of breadth first, mark() isn't
> required do have any specific ordering as long as it sets live bits
> everywhere.

I'm pretty sure that with a singly linked list we can get a generally
properly-ordered flattened tree without having to do an insane number of
passes across the dead object store. I may be incorrect in this, but I
don't think so, and for our purposes the live bit can be safely ignored if
we end up setting it, though potentially with another pass over the dead
store, which may end up prohibitively expensive. We'll see.

> mark() is called permanently in a running interpreter, that does non
> trivial things. There are shortcuts for scalars, DOD is highly optimized
> not to destroy cache coherency. Using mark() also implies to back out
> my small PMC patches. All the advantages of smaller scalars are gone
> then.

All of this stuff for freezing is going to end up killing the small PMC
patch anyway, unfortunately, since we're going to have to be able to
traverse PMCs in the destruction phase, which means we have to have the
means of traversal at hand as we can't guarantee that we can allocate more
PMCs or resize the PMCs ext data.

> While freeze() and friends have to pull in each PMC into the cache, just
> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> speeding up timely destruction puts objects either in front or at the
> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> as your general and solely iterator.

We may end up playing macro games with the code. I can live with multiple
instantiations of the code, but I want only a single chunk of source to be
maintained.

> > ... This is something that's hidden under a number of layers
> > of API, so regardless of the outcome it doesn't affect the assembly, PMC,
> > or runtime API.

> So when its hidden, I really don't understand, why you are insisting on
> an (IMHO) suboptimal design.

YHO, in this case, turns out to not consider all the issues involved.
Either way, we do it this way for now. Once the implementation of the
specified API is in, we can fight over reworking the internals.

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 9:48 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 09:12:27 -0400 (EDT)
Local: Tues, Oct 21 2003 9:12 am
Subject: Re: Object freezing

I'm not an XML guy, and I'm making all this up as I go along. If that's
better, fine with me. :)

> >This way we have a single, fixed-format type/version header, which makes
> >the initial identification easier and less error-prone. (Possibly even
> >fit for file and programs of its ilk to note) The binary format won't
> >care, and teh YAML format shouldn't care (as long as the indenting's
> >right) but the XML format would, so it seems to make sense to use the XML
> >stuff for the initial header.

> So are we talking about a header or a wrapper?  If it is really a
> header, it's not XML and then it's prettyy useless from an XML point
> of view.

We're talking about the first thing in a file (or stream, or whatever). I
was under the impression that XML files should be entirely composed of
valid XML, hence the need for the stream type marker being valid XML. YAML
doesn't care as much, so far as I understand, and for our own internal
binary format we cna do whatever we want. If that's not true, then we can
go for a more compact header.

Note that the serialized stream will be different depending on the encoder
chosen. If you have the structure:

  $bar = 1;
  @foo[0] = \$bar;
  @foo[1] = "Baz";

The XML stream serializing @foo might look like:

  <XML type=parrot-xml version=1.0>
  <PMC name=foo>
     <type>
       PerlArray
     </type>
     <value>
        <PMC>bar</pmc>
        <string>Baz</string>
     </value>
  </PMC>
  <PMC name=bar>
    <type>
      PerlInt
    </type>
    <value>
      <integer>1</integer>
    </value>
  </PMC>

Only not inevitably horribly broken, invalid, and poorly done. :) The YAML
form might look like

  <XML type=parrot-yaml version=1.0>
  PMC: foo
    type: PerlArray
    values:
      pmc: bar
      string: Baz
  PMC: bar
    type: PerlInt
    values:
      integer:1

Once again, modulo my limited and inevitably incorrect YAML knowledge. So
if the header says it's XML the whole thing is valid XML, while if it
doesn't the rest of the stream doesn't have to be. (Just enough of the
header so that an XML processing program can examine the stream and decide
that the valid XML chunk at the beginning says that the rest of the
stream's not XML)

Basically we want some nice, fixed (mostly) thing at the head of the
stream that doesn't vary regardless of the way the stream is encoded, and
XML seemed to be the most restrictive of the forms I know people will
clamor for. (I know, it means the stream can't be valid Lisp-style sexprs,
but XML's more widespread :)

                                                Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Oct 21 2003, 10:48 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 21 Oct 2003 16:13:47 +0200
Local: Tues, Oct 21 2003 10:13 am
Subject: Re: Object freezing

Dan Sugalski <d...@sidhe.org> wrote:
> On Tue, 21 Oct 2003, Leopold Toetsch wrote:

[ thaw ]

>> This should IMHO be able to create constant PMCs out of metadata, e.g.
>> for subroutine objects. So there should be some means to tell thaw() to
>> create PMC(s) in the constant_pmc_pool.
> There should be a way to put PMCs in the constant pool in general. I was
> thinking a constant op would work--something like
>    constant Ix, [SP]y
> to make the string or PMC Y a constant at slot X in the constant pool.

You can append items to the constant table. You can't declare existing
items as constant, because you can't change the underlying object pool,
where the object was allocated. This would change the objects address.

> Passing in the PMC header to be filled in also works, though both fail if
> you want full PMC trees marked as constants since thawing out a PMC stream
> may involve creating multiple PMCs. (In which case we might be better
> temporarily switching allocation pools at constant creation time, rather
> than passing in PMCs)

These are either serious shortcomings or unneeded workarounds. An extra
parameter to relevant vtables can take care of such special cases.

>> I dunno, why chill() is superior to dump() or pretty_print(), but the
>> name doesn't really matter.
> The important thing is that it's not a vtable method.

Ah, that's the difference. How shall the system pretty-print dynamically
loaded PMCs then, when only a bytecode-stream is available? IMHO only a
vtable in the class can perform that job.

>> >   1) Freezing at the destruction level may *not* use any additional memory
>> >      for object traversal
> It puts a number of unpleasant constraints on the core freeze routines.

Constructing the frozen stream definitely needs memory. I don't see the
difference, to memory consumed by a seen hash. Can you please elaborate
a bit more on this.

> The only thing that mark does that the general traversal doesn't, in the
> abstract, is flip the object's live flag. Everything else is an
> optimization of code which we can, if we need, discard.

Yes, mark() can be written in terms of a general traverse, which gets a
vtable function (and a data pointer). mark is basically traverse(mark,
0). But this isn't true the other way round. You can't do freeze based
on the mark iterator. How do you pass the desired output format?

>> mark() is called permanently in a running interpreter, that does non
>> trivial things. There are shortcuts for scalars, DOD is highly optimized
>> not to destroy cache coherency. Using mark() also implies to back out
>> my small PMC patches. All the advantages of smaller scalars are gone
>> then.
> All of this stuff for freezing is going to end up killing the small PMC
> patch anyway, unfortunately, since we're going to have to be able to
> traverse PMCs in the destruction phase, which means we have to have the
> means of traversal at hand as we can't guarantee that we can allocate more
> PMCs or resize the PMCs ext data.

A scalar can't contain or reference other PMCs, so it can't be a
potential source of freeze loops. If I now spit out (PMC: Int, ID=xy,
value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take
care of duplicates, if needed. Other PMCs have the next_for_GC pointer.

Albeit I'm not convinced, that we can't have a seen hash.

> YHO, in this case, turns out to not consider all the issues involved.

That might very well be true, yes. So it would be fine, if you could fill
the gaps.

>                                    Dan

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 10:48 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 10:49:27 -0400 (EDT)
Local: Tues, Oct 21 2003 10:49 am
Subject: Re: Object freezing

The object's address should be irrelevant for the constant table. PMCs are
referenced in the opstream by table offset. This offset can be into a PMC
pool, or into a pointer table. While the pointer table has an extra level
of indirection to it it adds flexibility and takes some pressure off of
the ordering of PMCs for instantiated constants.

> > Passing in the PMC header to be filled in also works, though both fail if
> > you want full PMC trees marked as constants since thawing out a PMC stream
> > may involve creating multiple PMCs. (In which case we might be better
> > temporarily switching allocation pools at constant creation time, rather
> > than passing in PMCs)

> These are either serious shortcomings or unneeded workarounds. An extra
> parameter to relevant vtables can take care of such special cases.

Not necessarily, no. The number of PMCs that are reconstituted for a set
of constant frozen PMCs is indeterminate. If we're instantiating bytecode
with constant PMCs in it it's possible the class that backs those PMCs has
changed and things instantiate differently than they might otherwise do.

If we've frozen 20 PMCs, all we can guarantee is that when we unthaw them
that we've got at least 20 PMCs, though we may have more, and the extras
arguably should be allocated from the constant PMC arena (though not given
slots in the constant table) so we can skip scanning the constant arenas
for dead objects needing cleanup.

> >> I dunno, why chill() is superior to dump() or pretty_print(), but the
> >> name doesn't really matter.

> > The important thing is that it's not a vtable method.

> Ah, that's the difference. How shall the system pretty-print dynamically
> loaded PMCs then, when only a bytecode-stream is available? IMHO only a
> vtable in the class can perform that job.

If the dynamically loaded PMC class doesn't have a backing Parrot class,
you can't, and get the default, relatively primitive dump.

> >> >   1) Freezing at the destruction level may *not* use any additional memory
> >> >      for object traversal

> > It puts a number of unpleasant constraints on the core freeze routines.

> Constructing the frozen stream definitely needs memory. I don't see the
> difference, to memory consumed by a seen hash. Can you please elaborate
> a bit more on this.

Constructing the frozen stream will need some memory, yes. At the moment
all it needs is a chunk of random memory and that's it, so we may well
fail because we're out of memory. We may, however, have general pool
memory handy. We can't guarantee that we have *any* headers, however,
since we can legitimately be called from within the destruct phase of a
DOD run, which may have been triggered by an out-of-headers condition.

Depending on how we flesh things out freezing may also not require any
additional memory--if we relax the requirement for freezing to allow the
output to be a PMC, we may be backed directly to a file or other storage
that doesn't involve RAM allocation.

> > The only thing that mark does that the general traversal doesn't, in the
> > abstract, is flip the object's live flag. Everything else is an
> > optimization of code which we can, if we need, discard.

> Yes, mark() can be written in terms of a general traverse, which gets a
> vtable function (and a data pointer). mark is basically traverse(mark,
> 0). But this isn't true the other way round. You can't do freeze based
> on the mark iterator. How do you pass the desired output format?

What does the desired output format have to do with any of this? All
marking does is put things on the list of PMCs to be visited if it hasn't
already been visited, so we get to that PMC at some point as we walk the
visited list. In the context of the DOD sweep it also sets the live flag,
but we could, if we chose, skip that and use the presence of a non-NULL
value in the mark chain address for a PMC as an indicator of liveness.
(Though yes, I realize that this means potentially skipping some of the
optimizations, so I'm not proposing it as a requirement for the DOD sweep
implementation)

Thaw can only properly take care of duplicates if the duplicates are
correctly indicated in the serialization stream. Identical end-values are
*not* sufficient to note multiple references to the same PMC.

> Albeit I'm not convinced, that we can't have a seen hash.

It takes an insane amount of memory and requires header allocation. We
can't allocate headers, and the memory requirements are extreme. Been
there, done that, it was a bad idea. Consider this arbitrarily and
unconditionally ruled out if you're unwilling to believe the stats that
were previously posted about this.

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Juergen Boemmels  
View profile  
 More options Oct 21 2003, 11:48 am
Newsgroups: perl.perl6.internals
From: boemm...@physik.uni-kl.de (Juergen Boemmels)
Date: 21 Oct 2003 17:11:57 +0200
Local: Tues, Oct 21 2003 11:11 am
Subject: Re: Object freezing

Dan Sugalski <d...@sidhe.org> writes:

[...]

> > > The chill and warm runtime methods take a PMC or a frozen representation
> > > of a PMC (respectively) and provide a human readable version of that PMC.

> > I dunno, why chill() is superior to dump() or pretty_print(), but the
> > name doesn't really matter.

> The important thing is that it's not a vtable method. It's a function that
> belongs in the freeze/thaw API as it's just an alternate encoding or
> decoding. (Arguably it ought not be a separate API entry at all and just
> another encoding scheme, but that requires transcoding serialization
> forms, and I'd rather not get into that)

This is really just a naming problem. Dan wants to call the
vtable-function freeze and have different encodings for all kinds of
dumping/pretty_printing/marking. Leo calls the function traverse and
controlls it by callbacks.

My personal opinion on this naming problem is: traverse describes more
generaly what the function does. Marking live objects by freezing them
in an encoding that does return nothing just sounds plain wrong.

Freeze should be just a user of the general traverse function. (And
this does mean it is also no vtable function)

STRING *freeze(PMC *pmc, whatever *encoding)
{
   return (STRING *)pmc->vtable->traverse(pmc, freeze_callbacks, encoding);

}

or even the freeze_encodings are callback_sets: freeze_xml,
freeze_yaml, freeze_binary, whatever.

>>>   1) Freezing at the destruction level may *not* use any additional memory
>>>      for object traversal

>> What is "Freezing at the destruction level"? Is this anyhow related to
>> destruction ordering?

> No. There are some valid cases where an object, after having been declared
> dead by the DOD, wants to serialize itself. Persistent object stores
> apparently do this, and it makes a certain amount of sense--when the
> object goes out of scope the current state is flushed to disk.

This is a question of what is allowed at destruction time. You don't
want to allow memory allocation, but allow freezing. That gets hard,
because you need at least allocate the STRING where you want to put
your frozen stream.

> It puts a number of unpleasant constraints on the core freeze routines.
> User code can violate them and take the consequences, but we can't.

We can call (hopefully) arbitary user code in destruction routines. So
this argument does not count

>>> Note that I do *not* want to have multiple object traversal systems in
>>> parrot! We have one for DOD, and proposals have ranged upwards from there.
>>> No. That is *not* happening--the chance for error is significant, the
>>> side-effects of the error annoying and tough to track down for complex
>>> cases (akin to the trouble with tracking down GC issues), and just not
>>> necessary. (Perhaps desirable for speed/space reasons, but desirable
>>> isn't necessary)

Freeze is just another traversal method. Just calling it freeze
instead of traverse does not change this fact. You can limit the power
of encodings, but this does not change the fact that you need to walk
all children

>> DOD's mark() routine has different requirements then a general
>> traverse() for freeze(), chill(), clone(), and destruction ordering.
>> Using just mark() will have these side effects that you want to avoid.

> The only thing that mark does that the general traversal doesn't, in the
> abstract, is flip the object's live flag. Everything else is an
> optimization of code which we can, if we need, discard.

mark() may be implemented in form of a general traverse. Let the
profiler decide if a special purpose mark() or a general traverse is
better.

>> A general traverse() can be depth first of breadth first, mark() isn't
>> required do have any specific ordering as long as it sets live bits
>> everywhere.

> I'm pretty sure that with a singly linked list we can get a generally
> properly-ordered flattened tree without having to do an insane number of
> passes across the dead object store. I may be incorrect in this, but I
> don't think so, and for our purposes the live bit can be safely ignored if
> we end up setting it, though potentially with another pass over the dead
> store, which may end up prohibitively expensive. We'll see.

I'm pretty sure that a singly linked list is not enough. I had done
some experiments with this. One pass my be enough, but you need to
keep track of the tree-traversal and of the partial ordered
list. These to things don't play well together. Maybe this can be cut
down to two lists, or one list and one bit per pmc.

>> mark() is called permanently in a running interpreter, that does non
>> trivial things. There are shortcuts for scalars, DOD is highly optimized
>> not to destroy cache coherency. Using mark() also implies to back out
>> my small PMC patches. All the advantages of smaller scalars are gone
>> then.

> All of this stuff for freezing is going to end up killing the small PMC
> patch anyway, unfortunately, since we're going to have to be able to
> traverse PMCs in the destruction phase, which means we have to have the
> means of traversal at hand as we can't guarantee that we can allocate more
> PMCs or resize the PMCs ext data.

Destruction ordering just enforces that small PMCs can't have
destructors. If they have destructors they must be big, big enough to
construct the ordered list of objects without allocating any memory.

If you think about it: The call to the destructors is done after
free_unused_pobjects completed. The memory of the objects without
destructors is already freed. If you are still in an out of memory
situation when the destructors are run, then its also very likely that
you are in a not much better situation afterwards.

>> While freeze() and friends have to pull in each PMC into the cache, just
>> setting the live bit on a PMC hasn't. Further: Lukes proposal for
>> speeding up timely destruction puts objects either in front or at the
>> end of the next_for_GC chain. This IMHO implies that mark() is unusable
>> as your general and solely iterator.

> We may end up playing macro games with the code. I can live with multiple
> instantiations of the code, but I want only a single chunk of source to be
> maintained.

You know we already have two versions of pobject_lives lying around.

[...]

boe


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 11:48 am
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 11:39:38 -0400 (EDT)
Local: Tues, Oct 21 2003 11:39 am
Subject: Re: Object freezing

It's more than just a naming issue (or if it is, then traverse is the
wrong name). The traversal must be done externally, since we can't be
recursive.

Mark puts a PMC on the list of PMCs to be frozen. Freeze dumps the
PMC being frozen (and *only* that PMC) to the stream. The freeze routine
for a PMC must mark (generally indirectly by calling the "add this pmc to
the stream" api function) any PMCs that it needs to be in the stream.

The external function that traverses this list of PMCs to be dumped is
responsible for making sure there are no duplicates--the easiest way is to
do what the DOD sweep does and note that a PMC has already been put on the
list and thus not mark it.

Mark and freeze are separate, though related by the subsystems that use
them.

> This is a question of what is allowed at destruction time. You don't
> want to allow memory allocation, but allow freezing. That gets hard,
> because you need at least allocate the STRING where you want to put
> your frozen stream.

It's more a question of what we we require the engine to do, vs what user
code is allowed to do. A user program is allowed to write code that can
fail at destroy time, however the infrastructure we provide (including, in
this case, freezing--while I don't like it there's no choice) can't fail
that way. It's the reason the DOD and GC systems don't allocate memory (or
didn't--they shouldn't) when they run. The engine's not allowed to have
failure modes in critical sections.

Basically the engine may fail because of user code, but user code can't
fail because of the engine. It makes some things annoyingly restrictive,
but some problems are inherently annoyingly restrictive.

> > It puts a number of unpleasant constraints on the core freeze routines.
> > User code can violate them and take the consequences, but we can't.

> We can call (hopefully) arbitary user code in destruction routines. So
> this argument does not count

See above. User code can fail, we can't.

There may be a little more infrastructure--I've not dug out the algorithms
books and gone hunting. The common algorithms tend to cheat by just
dodging the whole problem. :)

> Destruction ordering just enforces that small PMCs can't have
> destructors. If they have destructors they must be big, big enough to
> construct the ordered list of objects without allocating any memory.

Can't have destructors *or* refer to PMCs that may either have a
destructor or (indirectly) refer to a PMC that has a destructor.

If we have 2 PMCs with destructors they may be connected by a chain of 100
PMCs that don't, but we still need to walk that chain.

> If you think about it: The call to the destructors is done after
> free_unused_pobjects completed. The memory of the objects without
> destructors is already freed.

Then we reorder. This can't happen, and it didn't used to happen--if
that's how it works now then there's a bug in the DOD system. *All*
destructors *must* be called before any headers are collected.

> >> While freeze() and friends have to pull in each PMC into the cache, just
> >> setting the live bit on a PMC hasn't. Further: Lukes proposal for
> >> speeding up timely destruction puts objects either in front or at the
> >> end of the next_for_GC chain. This IMHO implies that mark() is unusable
> >> as your general and solely iterator.

> > We may end up playing macro games with the code. I can live with multiple
> > instantiations of the code, but I want only a single chunk of source to be
> > maintained.

> You know we already have two versions of pobject_lives lying around.

Then we need to fix that, too.

                                                Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Clites  
View profile  
 More options Oct 21 2003, 12:48 pm
Newsgroups: perl.perl6.internals
From: jcli...@mac.com (Jeff Clites)
Date: Tue, 21 Oct 2003 09:18:36 -0700
Local: Tues, Oct 21 2003 12:18 pm
Subject: Re: Object freezing
On Oct 21, 2003, at 5:53 AM, Dan Sugalski wrote:

I don't believe that is quite true. There are a couple of important
differences between traversal-for-GC and traversal-for-serialization,
which will be a challenge to reconcile in the one-true-traversal:

1) Serialization traversals need to "take note" of logical int and
float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they
can be serialized, but for GC you only need to worry about GC-able
objects. It's difficult to come up with a reasonable callback which can
take either int, float, or PObj arguments.

2) It's reasonable for an object to have a pointer to some sort of
cache object, which is not logically part of the object, and shouldn't
be serialized along with it. This needs to be traversed for GC
purposes, but needs to not be traversed for serialization. (Situations
such as this--physical but not logical membership--are the origin of
the "mutable" keyword in C++.)

3) Traversal for GC needs to do loop detection, but can just stop going
down a particular branch of the object graph once it encounters an
object it's seen before. Serialization traversals would need to have a
way, upon encountering an object seen before, to include in the
serialization stream an indication that the current object has already
been serialized, and enough information to enable deserialization code
to go find it and recreate the loop. The only options I see here are
either for serialization to involve the allocation of unbounded
additional memory, or to expand the PObj structure to include a slot
for a UUID which can be used as a back-reference in a stream, or to
have serialization break loops (so that deserialized structures never
have loops).

I'm not 100% convinced that a single approach can't handle both
applications, but it's looking as though their requirements are
different enough that it may not work well.

Two other questions/concerns/comments/issues:

1) I assume that ultimately a user-space iterator would end up calling
the traversal code, right? If so, you can't reasonably mandate that
only one traversal be in progress at one time. That would be the
canonical way to compare two ordered collections--get an iterator for
each, and compare element-by-element.

2) I don't see it as a huge problem that serialization code could end
up creating additional objects if called from a destroy() method.
(Though yes, it would be a problem for GC infrastructure code to.) I
say that for two reasons: (a) destroy() methods can really do anything
they want, and if that task involves allocating additional memory, that
just makes it a risk to perform that task in a destroy() method--it may
fail due to out-of-memory conditions. I think that Java design experts
tend to argue against doing things like serialization in finalization
methods. It sounds elegant, but it's problematic. One reason for this
is that you tend to want to serialize structures as a whole, not
piece-by-piece as they are garbage-collected. The second reason it is
not always a problem in practice is that (b) a DOD run may be triggered
by an out-of-headers conditions, but that doesn't mean that an
additional chunk of memory for headers can't be allocated. If it can't
be, then this is no more problematic that it would be in other user
code--think of the case where I have some big tree of objects I want to
make some sort of copy of, with the intention of then letting go of the
original when I'm done. I'll be freeing up headers at the end of that
process, but if I run out of memory part-way-through, then I'm just
stuck.

3) I assume that not every object is assumed to be serializable? For
instance, an object representing a filehandle can't really be
serialized in a useful way. So I'm not sure of what sort of "fidelity"
is required of a generic serialization method--that is, how similar a
deserialized structure is guaranteed to be to the original.

JEff


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Oct 21 2003, 12:48 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 21 Oct 2003 18:14:38 +0200
Local: Tues, Oct 21 2003 12:14 pm
Subject: Re: Object freezing

Dan Sugalski <d...@sidhe.org> wrote:
> On Tue, 21 Oct 2003, Leopold Toetsch wrote:
>> You can append items to the constant table. You can't declare existing
>> items as constant, because you can't change the underlying object pool,
>> where the object was allocated. This would change the objects address.
> The object's address should be irrelevant for the constant table. PMCs are
> referenced in the opstream by table offset.

Only in the opstream. But not when such PMCs are used then. I.e. when
constant Sub PMC is refered to in the global stash.

>> Ah, that's the difference. How shall the system pretty-print dynamically
>> loaded PMCs then, when only a bytecode-stream is available? IMHO only a
>> vtable in the class can perform that job.
> If the dynamically loaded PMC class doesn't have a backing Parrot class,
> you can't, and get the default, relatively primitive dump.

I was thinking of plain PMCs, that where loaded to provide some special
functionality. Parrot doesn't know anything about these, so will be
unable to pretty print the opstream. Loaded classes OTOH as based on
ParrotClass and should be printable.

>> Constructing the frozen stream definitely needs memory. I don't see the
>> difference, to memory consumed by a seen hash. Can you please elaborate
>> a bit more on this.
> Constructing the frozen stream will need some memory, yes. At the moment
> all it needs is a chunk of random memory and that's it, so we may well
> fail because we're out of memory.

So, with the same argument I can say, (destructor level) freezing will
need *system* memory for the stream plus the hash. So we may well fail.
I don't see any difference. The hash hasn't to be a "fat" PerlHash.

If we don't want a hash one bit inside the objects arena flags should be
able to serve the same functionality - this PMC already got serialized.

Anyway - how does/would freezing at destructor level look like from HLL
POV?  Shortly before, there ought to be a full DOD run (or all possible
garbage would be frozen). At this time, the amount of still active and
then to be serialized PMCs is known (an upper boundary is always known).
So it should be possible to work around such constraints.

> ... We may, however, have general pool
> memory handy. We can't guarantee that we have *any* headers, however,
> since we can legitimately be called from within the destruct phase of a
> DOD run, which may have been triggered by an out-of-headers condition.

I really doubt, that thawing a program (or some data of it), that died
in middle of some non trivial operation, because it ran out of headers,
will be of any use.

>> A scalar can't contain or reference other PMCs, so it can't be a
>> potential source of freeze loops. If I now spit out (PMC: Int, ID=xy,
>> value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take
>> care of duplicates, if needed. Other PMCs have the next_for_GC pointer.
> Thaw can only properly take care of duplicates if the duplicates are
> correctly indicated in the serialization stream. Identical end-values are
> *not* sufficient to note multiple references to the same PMC.

Sorry I thought of PMC IDs, which are the address of the frozen PMCs.

>> Albeit I'm not convinced, that we can't have a seen hash.
> It takes an insane amount of memory and requires header allocation.

A PerlHash takes more memory, and yes. But we just need a hash of PMC
addresses, or a bit inside the objects arena.

We have several different traverse-like functions:

* mark (DOD): called frequently, should get all possible speed
* freeze (destruction): no speed issues, can't take Parrot resources
* freeze (user): rarely used, can take resources
* destruction ordering: only active objects to be visited
* clone: can take resources thaw(freeze()), or separate vtable
* dump/pretty-print: no vtable?
* thaw: special class method, is different anyway

The first 2 critical items have diametral usage patterns. This does not
really imply, that they should be implemented based on the same scheme.

> ... We
> can't allocate headers, and the memory requirements are extreme. Been
> there, done that, it was a bad idea. Consider this arbitrarily and
> unconditionally ruled out if you're unwilling to believe the stats that
> were previously posted about this.

You are speaking of Storable.pm? I'm not aware of any stats regarding
that. But I'm not thinking of using a full fledged hash for such a
special case.

>                                    Dan

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Oct 21 2003, 12:48 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 21 Oct 2003 17:20:30 +0200
Local: Tues, Oct 21 2003 11:20 am
Subject: Re: Object freezing

Melvin Smith <melv...@us.ibm.com> wrote:
>> Albeit I'm not convinced, that we can't have a seen hash.
> A seen hash most likely would:
> 1) Kill GC performance especially in pathological cases. The GC
>    should be quiet and invisible.
> 2) Cause memory usage to double upon a mark run.

GC isn't involved. A mark() run sets the live bit in the PMCs arena. No
hash is needed for both cases. I have stated several times, that I
don't like to mix mark() and the other traverse functions.

> -Melvin

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Clites  
View profile  
 More options Oct 21 2003, 12:48 pm
Newsgroups: perl.perl6.internals
From: jcli...@mac.com (Jeff Clites)
Date: Tue, 21 Oct 2003 09:40:17 -0700
Local: Tues, Oct 21 2003 12:40 pm
Subject: Re: Object freezing
On Oct 21, 2003, at 6:12 AM, Dan Sugalski wrote:

Yeah, you can't put extra things in the "<?xml..." at the start of a
document, and you can't create a tag of your own whose name starts with
"XML" or "xml".

>> So are we talking about a header or a wrapper?  If it is really a
>> header, it's not XML and then it's prettyy useless from an XML point
>> of view.

> We're talking about the first thing in a file (or stream, or
> whatever). I
> was under the impression that XML files should be entirely composed of
> valid XML, hence the need for the stream type marker being valid XML.

No, XML _documents_ must be XML, but that doesn't mean that document ==
file. (For another example where this comes up, consider an XML
document transmitted over HTTP. There are headers and other textual
things in the stream along with the xml, and it's the HTTP protocol
which determines where the document begins and ends, not xml's.) You
can certainly have more than one XML document in a single file, but
something needs to decide where an xml document begins and ends, and
hand only that data to the xml parser.

> YAML doesn't care as much, so far as I understand, and for our own
> internal
> binary format we cna do whatever we want. If that's not true, then we
> can
> go for a more compact header.

Yes, if you want the whole serialized steam to count as a well-formed
xml document, then you can't but arbitrary binary data in the middle.
See my previous post for why.

> Once again, modulo my limited and inevitably incorrect YAML knowledge.
> So
> if the header says it's XML the whole thing is valid XML, while if it
> doesn't the rest of the stream doesn't have to be. (Just enough of the
> header so that an XML processing program can examine the stream and
> decide
> that the valid XML chunk at the beginning says that the rest of the
> stream's not XML)

Most XML parsers aren't expecting to handle this. That is, there's no
such thing as a valid half-of-an-xml document, from the perspective of
the xml spec, and in many cases you'd have trouble getting a parser to
stop before hitting something problematic and blowing up. In other
words, you can't rely on an xml parser to process something which
starts out looking like xml, but isn't.

> Basically we want some nice, fixed (mostly) thing at the head of the
> stream that doesn't vary regardless of the way the stream is encoded,
> and
> XML seemed to be the most restrictive of the forms I know people will
> clamor for. (I know, it means the stream can't be valid Lisp-style
> sexprs,
> but XML's more widespread :)

Yeah, if you're just needing to tag the stream with a label to indicate
the type plus a version number, then xml's on the one hand overkill and
on the other hand not necessarily a big help to xml proponents.

JEff


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 1:00 pm
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 12:53:05 -0400 (EDT)
Local: Tues, Oct 21 2003 12:53 pm
Subject: Re: Object freezing

> Yeah, if you're just needing to tag the stream with a label to indicate
> the type plus a version number, then xml's on the one hand overkill and
> on the other hand not necessarily a big help to xml proponents.

So, in a nutshell, throwing an XML format type tag at the beginning buys
us nothing regardless of whether it's an XML stream or not?

In that case, nuts to that. It's already terribly obvious I'm going to
mess it up if I try, so we'll just skip it and move on to the next
headache. :)

(FWIW, with respect to binary data in the output stream--if an encoded
format doesn't allow binary data then the encoder is responsible for
changing it to a non-binary format. So for XML and YAML (and any other
text encoding format, I expect) that'll likely be a base64 encoding or
something)

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Oct 21 2003, 1:48 pm
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Tue, 21 Oct 2003 13:20:34 -0400 (EDT)
Local: Tues, Oct 21 2003 1:20 pm
Subject: Re: Object freezing

On Tue, 21 Oct 2003, Jeff Clites wrote:
> I don't believe that is quite true. There are a couple of important
> differences between traversal-for-GC and traversal-for-serialization,
> which will be a challenge to reconcile in the one-true-traversal:

> 1) Serialization traversals need to "take note" of logical int and
> float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they
> can be serialized, but for GC you only need to worry about GC-able
> objects. It's difficult to come up with a reasonable callback which can
> take either int, float, or PObj arguments.

That's not an issue for us. A PMC is responsible for serializing itself,
so if its got a string, float, or int component then it must take
respnsibility for dumping those components to the serialization stream.
Basically PMCs *must* dump themselves out completely, but the engine
provides support to defer dumping of PMCs so that we don't get into
recursive dumping and blow stack, as well as to make sure that we properly
maintain multiple references to the same PMC.

> 2) It's reasonable for an object to have a pointer to some sort of
> cache object, which is not logically part of the object, and shouldn't
> be serialized along with it. This needs to be traversed for GC
> purposes, but needs to not be traversed for serialization. (Situations
> such as this--physical but not logical membership--are the origin of
> the "mutable" keyword in C++.)

That's what custom mark routines are for, though it does argue that we
should have a separate mark for freezing.

> 3) Traversal for GC needs to do loop detection, but can just stop going
> down a particular branch of the object graph once it encounters an
> object it's seen before. Serialization traversals would need to have a
> way, upon encountering an object seen before, to include in the
> serialization stream an indication that the current object has already
> been serialized, and enough information to enable deserialization code
> to go find it and recreate the loop. The only options I see here are
> either for serialization to involve the allocation of unbounded
> additional memory, or to expand the PObj structure to include a slot
> for a UUID which can be used as a back-reference in a stream, or to
> have serialization break loops (so that deserialized structures never
> have loops).

The loop breaking needs for freezing are the same as for DOD sweeps,
though with freezing we're at an advantage as we know where the tree
starts.

In all cases (I made sure this was in the example, but it might not have
been clear) we only include a marker for child PMCs in the parent PMC's
serialized data, and serialize the child PMCs later on in the stream. So
if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the
place of PMC2's data is just a marker saying "refer to PMC2 here" and then
after the end of PMC1's data in the stream we dump out PMC2's data.

> 1) I assume that ultimately a user-space iterator would end up calling
> the traversal code, right? If so, you can't reasonably mandate that
> only one traversal be in progress at one time. That would be the
> canonical way to compare two ordered collections--get an iterator for
> each, and compare element-by-element.

While it could, I think it's infeasable to use the serialization iterator
for normal user-space iteration, if only because the limits that have to
be on the serialization iterator for use in restricted circumstances are a
bit onerous for general use.

I'm not entirely sure that parrot's going to provide this form of
iteration as it stands anyway--it's not necessary for the core langauge
support and while it'd be really useful there's a limit to the number of
Big Problems I'm up to solving. (Having said that there may, probably
will, be enough introspective capabilites to do this without engine
support)

> 2) I don't see it as a huge problem that serialization code could end
> up creating additional objects if called from a destroy() method.

User code may, parrot may not. The reasons are twofold--while parrot will
let you shoot yourself in the foot, it provides the gun, not the foot.
It should also be possible for carefully written destroy methods to
serialize but not eat any headers or memory. (I can see this being the
case in some embedded applications or systems) If we make it so freezing
is not a guaranteed possibility at destroy time then this can't happen and
it lessens the utility of the system some.

We can, if we choose, loosen the restriction later if sufficient reason is
presented. Can't really tighten it, though, so for now...

> 3) I assume that not every object is assumed to be serializable? For
> instance, an object representing a filehandle can't really be
> serialized in a useful way. So I'm not sure of what sort of "fidelity"
> is required of a generic serialization method--that is, how similar a
> deserialized structure is guaranteed to be to the original.

No fidelity is required at the moment, as we've not put any requirements
at all on what goes in the output stream. It could, I suppose, consist of
a near-infinite stream of fnords or something. That's the next bridge to
burn, but I don't think I'm done being cooked over the current one :)

                                        Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 48   Newer >
« Back to Discussions « Newer topic     Older topic »