The last weeks i've been in joy to implement a small vm for a
special runtime enviroment. As i'm a regular reader of the perl.perl6.*
mailing lists and know about parrot. I wondered how easy it would be
to throw away my own vm solution and use parrot.
There are currently only two things that bug me a little bit:
1. i wonder how to load bytecode from the memory to parrot when
embedding it. i've read embed.pod and couldn't find a function that let me
create a packfile or something i can run, from a memory buffer that holds the
bytecode.
2. as i currently use refcounting and like the timely destruction
semantics, i wondered whether parrot can give me that.
(as i have a database in background, and don't want too much
garbage to pile up in it)
I've read that parrot wants to do a gc run over special marked objects, that
'need' timely destruction. (correct me there..)
So, what would happen if i have many marked objects and call subroutines
often?
Or are there any other plans? What do other languages do, which
have refcounting and want to port to parrot?
cya,
Robin
--
el...@x-paste.de / ro...@nethype.de
Robin Redeker
> The last weeks i've been in joy to implement a small vm for a
> special runtime enviroment. As i'm a regular reader of the perl.perl6.*
> mailing lists and know about parrot. I wondered how easy it would be
> to throw away my own vm solution and use parrot.
> There are currently only two things that bug me a little bit:
> 1. i wonder how to load bytecode from the memory to parrot when
> embedding it. i've read embed.pod and couldn't find a function that let me
> create a packfile or something i can run, from a memory buffer that holds the
> bytecode.
from src/embed.c:Parrot_readbc()
pf = PackFile_new(interpreter, is_mapped);
PackFile_unpack(interpreter, pf, (opcode_t *)program_code, program_size);
albeit it isn't part of the official embedding API (yet).
> 2. as i currently use refcounting and like the timely destruction
> semantics, i wondered whether parrot can give me that.
Yes.
> I've read that parrot wants to do a gc run over special marked objects, that
> 'need' timely destruction. (correct me there..)
You can flag a PMC for timely destruction:
needs_destroy $P0
On subroutine return (or scope exit) you can trigger a lazy DOD run,
which does nothing if there aren't any objects that need timely
destruction:
sweep 1 # lazy DOD
This will eventually be replaced by:
enter_scope 0 | 1 # NEEDS_TIMELY_DESTRUCTION
exit_scope
or some such. (Currently the register frame, where the "sweep 1" is in,
keeps these objects alive, so the lazy DOD run can only be run after the
register frame is discarded)
> So, what would happen if i have many marked objects and call subroutines
> often?
We'll try to make the lazy DOD run as fast as possible. Objects that
need timely destruction are counted. If all such objects have been seen
during DOD, the DOD run is aborted.
> Or are there any other plans? What do other languages do, which
> have refcounting and want to port to parrot?
You might have a look at ponie (perl5 on parrot). And, while Python
doesn't guarantee timely destruction, people nethertheless rely on it,
because CPython provides timely destruction due to refcounting.
That means: most of our major target HLLs need timely destruction and
we'll provide it.
> cya,
> Robin
leo
Ok, thanks, i'll have a look at that.
Any idea when there will be a more complete official embedding API?
Any hints when looking for functions that 'maybe' go into the API?
> > I've read that parrot wants to do a gc run over special marked objects, that
> > 'need' timely destruction. (correct me there..)
>
> You can flag a PMC for timely destruction:
>
> needs_destroy $P0
>
> On subroutine return (or scope exit) you can trigger a lazy DOD run,
> which does nothing if there aren't any objects that need timely
> destruction:
>
> sweep 1 # lazy DOD
>
> This will eventually be replaced by:
>
> enter_scope 0 | 1 # NEEDS_TIMELY_DESTRUCTION
> exit_scope
Does this mean, i have to know whether this subroutine handles objects
that need timely destruction?
>
> or some such. (Currently the register frame, where the "sweep 1" is in,
> keeps these objects alive, so the lazy DOD run can only be run after the
> register frame is discarded)
>
> > So, what would happen if i have many marked objects and call subroutines
> > often?
>
> We'll try to make the lazy DOD run as fast as possible. Objects that
> need timely destruction are counted. If all such objects have been seen
> during DOD, the DOD run is aborted.
Well, what happens if some objects are not seen at the beginning, but
very late in the run? Does that mean, i maybe have to wait 100ms (or
longer, don't have any idea how long it may take) or such if i have around
1*10^3 - 1*10^5 objects that need timely destruction?
And does that mean, the time spend in GC is very indeterministic?
>
> > Or are there any other plans? What do other languages do, which
> > have refcounting and want to port to parrot?
>
> You might have a look at ponie (perl5 on parrot). And, while Python
> doesn't guarantee timely destruction, people nethertheless rely on it,
> because CPython provides timely destruction due to refcounting.
Thanks, i'll have a look.
>
> That means: most of our major target HLLs need timely destruction and
> we'll provide it.
cu,
robin
>> albeit it isn't part of the official embedding API (yet).
> Ok, thanks, i'll have a look at that.
> Any idea when there will be a more complete official embedding API?
> Any hints when looking for functions that 'maybe' go into the API?
API's aren't finished yet and are extended more or less on demand.
>> This will eventually be replaced by:
>>
>> enter_scope 0 | 1 # NEEDS_TIMELY_DESTRUCTION
>> exit_scope
> Does this mean, i have to know whether this subroutine handles objects
> that need timely destruction?
Normally you don't know it in the general case. So the default for
languages with timely destruction semantics would be to emit
"enter_scope 1" except the compiler (or an optimization hint) can deduce
that the scope wouldn't need timely destruction.
> Well, what happens if some objects are not seen at the beginning, but
> very late in the run? Does that mean, i maybe have to wait 100ms (or
> longer, don't have any idea how long it may take) or such if i have around
> 1*10^3 - 1*10^5 objects that need timely destruction?
> And does that mean, the time spend in GC is very indeterministic?
It really depends, but worst case can be a full GC run through all
objects with O(n) performance. But the worst case can usually be avoided
by good coding paractice.
We'll have a generational garbage collector eventually, where one
generation is one scope that needs timely destruction. We can have
counts of active objects per generation. The "all seen" is then usually
achived by scavenging just the youngest generation on scope exit, *if*
the numbers of the old generation are still valid.
But there is of course the possibility of inter-generational stores. If
your code sticks a freshly created object into a rather old one,
live-ness of the object is determined by the old objects. If the
overwritten object needs timely destruction too, it's not know, if the
last reference to it is gone or not.
Note: "old" and "young" are terms of reachability in the object graph,
mostly related to object creation time too (for subroutines) but not for
more global-like objects.
> cu,
> robin
leo
Ah, ok :) Who will decide what is demanded?
> > Well, what happens if some objects are not seen at the beginning, but
> > very late in the run? Does that mean, i maybe have to wait 100ms (or
> > longer, don't have any idea how long it may take) or such if i have around
> > 1*10^3 - 1*10^5 objects that need timely destruction?
>
> > And does that mean, the time spend in GC is very indeterministic?
>
> It really depends, but worst case can be a full GC run through all
> objects with O(n) performance. But the worst case can usually be avoided
> by good coding paractice.
Hm, ok, i guess i will try to port the language and environment i have
to parrot and see what i get.
Just for the curious me: What was the design decision behind the GC
solution? Was refcounting that bad? Refcounting gives a more global
speed hit indeed, but it's more deterministic and you wont run into
(probably) long halts during GC. (Java programs often suffer from this,
and it results in bad latency).
You said, that most languages will have refcount semantics, just sounds
funny for me to implement a GC then.
>
> We'll have a generational garbage collector eventually, where one
> generation is one scope that needs timely destruction. We can have
> counts of active objects per generation. The "all seen" is then usually
> achived by scavenging just the youngest generation on scope exit, *if*
> the numbers of the old generation are still valid.
Hm, that sounds not that bad if this works. Are there prototypes? Has
this been tried out?
>
> But there is of course the possibility of inter-generational stores. If
> your code sticks a freshly created object into a rather old one,
> live-ness of the object is determined by the old objects. If the
> overwritten object needs timely destruction too, it's not know, if the
> last reference to it is gone or not.
Well, whats the solution here then? Sticking fresh objects into old ones
isn't that rare i guess (and also in my case it wont be that rare).
>
> Note: "old" and "young" are terms of reachability in the object graph,
> mostly related to object creation time too (for subroutines) but not for
> more global-like objects.
>
How to combine generational garbage collection and timely destruction
there? Having a policy that doesn't check 'old' objects (often) wont
work for 'old' objects that need timely destruction.
So, are timely destruction objects in the younges generation all the time?
How to check whether this 'young' object, that need timely destruction,
isn't referenced by old ones?
Don't you have to go through all (even old) objects to determine whether
'this' object that needs timely destruction, is referenced?
I'm not an expert in garbage collectors, that are just some questions
that pop into my mind there. Maybe you can point me some paper or some
old discussion about this?
I'll answer this one, since I'm the one responsible for it.
Refcounting has three big issues:
1) It's expensive
2) It's error-prone
3) It misses circular garbage
#2 was the big one, so I suppose it's in the list wrong, but I'm
feeling too lazy to edit. If you watch the changelogs for the P
languages (perl, python, and ruby) you'll find that for almost every
release there's a refcount bug patched, sometimes more than one.
Throw in the problems that extensions have and you'll find a *lot* of
refcount problems.
The expense is non-trivial as well. Yeah, it's all little tiny bits
of time, but that adds up. It's all overhead, and useless overhead
for the most part.
The circular garbage thing's also an issue. Yes, there are
interesting hacks around it (python has one -- clever, but definitely
a hack) that essentially involves writing a separate mark&sweep
garbage collector.
The thing is, there aren't any languages we care about that require
true instant destruction -- the ones that care (or, rather, perl.
Python and Ruby don't care) only guarantee block boundary collection,
and in most cases (heck, in a near-overwhelming number of cases)
timely destruction is utterly irrelevant to the code being run. Yeah,
you *want* timely destruction, but you neither need nor notice it in
most program runs, since there's nothing that will notice.
Having been too deep into the guts of perl, and having written more
extensions in C than I care to admit to, I wanted refcounting *dead*.
It's just spread across far too much code, tracking down errors is a
massive pain, and, well, yech. Yes, non-refcounting GC systems are a
bit more complex, but the complexity is well-contained and
manageable. (I wrote the first cut of the GC system in an afternoon
sitting in my local public library) There's also the added bonus that
you can swap in all sorts of different GC schemes without disturbing
99% of the code base.
There are a couple of features built into the codebase to reduce GC
pauses. (Though none of these are, strictly speaking, required by the
semantics parrot expresses)
Firstly, we don't allocate base structures (PMC and buffer structs)
from random memory -- they all come out of arenas, which leaves us
with a small and relatively restricted set of places to scan for
objects. There are a number of operations that the GC system needs to
do full sweeps for, and keeping things tight together makes that a
lot easier.
We also have a split GC system (or, at least we do now -- it might
not stay). Looking for dead objects is a separate operation from
looking for dead memory. This is generally a win for us, since
*generally* we're chewing through the memory hanging off of objects
much faster than we are the objects themselves. This isn't
characteristic of all languages, certainly, but for perl and it's
cohorts it is.
The PMCs themselves are designed to reduce the number of objects that
have to actually exist (though cutting out the keyed version of all
the binary ops reduced the effectiveness of it a bit) since your
aggregates don't have to actually instantiate the things inside them
if they don't want to. A 10 million element integer array can be
represented with exactly one PMC.
Finally we're really, *really* picky about where pointers to objects
can be put, and I think this is the place where we get a win over
other GC schemes, and one of the spots where Java traditionally falls
down. If pointers to objects can be anywhere, it complicates the GC
system a lot -- you have to assume that any random blob of memory
*may* hold pointers to objects. It probably doesn't, but... how can
the GC know? Generally it can't, since it just doesn't have
sufficient knowledge of the contents, so it has to be careful, and
careful costs. For us, the only spot where we have to be conservative
is with the stack, the rest is all exact for GC purposes, which is
nice, and there are a few common special cases (like the 'this is an
array of PMC pointers' one)
All this does make it more expensive, on a per-incident basis, to
guarantee timely destruction. Scope exit, if there are extant objects
that need timely destruction, has a lot more one-shot overhead than a
refcounting scheme does because you do need a full sweep but... we
make it so that full sweeps are relatively cheap (with the known
pointer location stuff) and have to go through a relatively small
number of objects (since most of your objects will end up in
aggregates, and we're trying to make it so that aggregates don't have
to actually contain objects too often).
For regular GC sweeps we can cut down on the pauses with generational
collectors and other such interesting things, which can be plugged in
mostly seamlessly. Right now we're usually in stop-the-world mode but
heck we compile parrot with *no* C compiler optimizations too, given
that things are all in development as it is.
Also, with all this stuff, people are going to find timely
destruction is less useful than they might want, what with threads
and continuations, which'll screw *everything* up, as they are wont
to do. I know I've been making heavy use of continuations myself, and
this is for a compiler for a language that doesn't even have
subroutines in it. Continuations screw everything up, which is always
fun, for unusual values of 'fun'.
I really need to go profile perl 5 some time to get some real stats,
but I think it's likely that most programs (well, most programs I run
at least) have less than 0.1% of the variables with destructors, and
maybe one or two variables *total* that have a need for timely
destruction. (And most of the time they get cleaned up by global
destruction, which makes 'em not actually timely cleaned up)
>You said, that most languages will have refcount semantics, just sounds
>funny for me to implement a GC then.
Actually most languages won't have refcount semantics. Perl 5's the
only one that really guarantees that sort of thing now, though I
think it's in there for perl 6. I doubt the python, ruby, Lisp, or
Tcl compilers will emit the cleanup-at-block-boundary sweep code.
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
At 5:40 PM +0200 4/27/05, Robin Redeker wrote:
>Just for the curious me: What was the design decision behind the GC
>solution? Was refcounting that bad? . . .
I'll answer this one, since I'm the one responsible for it.
Refcounting has three big issues:
1) It's expensive
2) It's error-prone
3) It misses circular garbage
. . .
The circular garbage thing's also an issue. Yes, there are
interesting hacks around it (python has one -- clever, but definitely
a hack) that essentially involves writing a separate mark&sweep
garbage collector.
The circular garbage issue is just as important as the others, IMHO.
Because Perl5 can't deal with circular garbage, many reasonable data
structures are awkward to implement, and Perl5 lexical closures are
close to useless because they leak memory.
. . .
Having been too deep into the guts of perl, and having written more
extensions in C than I care to admit to, I wanted refcounting *dead*.
Yay, Dan!
>You said, that most languages will have refcount semantics, just sounds
>funny for me to implement a GC then.
Actually most languages won't have refcount semantics. Perl 5's the
only one that really guarantees that sort of thing now, though I
think it's in there for perl 6. I doubt the python, ruby, Lisp, or
Tcl compilers will emit the cleanup-at-block-boundary sweep code.
--
Dan
For the record, Common Lisp does not even provide a finalization
mechanism; if you want something destroyed, you must do it explicitly,
while you still have a reference to the thing.
-- Bob Rogers
http://rgrjr.dyndns.org/
>> API's aren't finished yet and are extended more or less on demand.
> Ah, ok :) Who will decide what is demanded?
A patch on p6l is a good indication for a demand :)
> Hm, ok, i guess i will try to port the language and environment i have
> to parrot and see what i get.
Fine.
> Just for the curious me: What was the design decision behind the GC
> solution? Was refcounting that bad? Refcounting gives a more global
> speed hit indeed, but it's more deterministic and you wont run into
> (probably) long halts during GC. (Java programs often suffer from this,
> and it results in bad latency).
Yep. Well there is one more issue besides speed: code complexity and
correctness. Writing correct refcounting code is tedious and all
extensions suffer from that problem too. Long halts can be avoided by
either an incremental GC or by doing less work with an generational GC.
None of the GC philosophies are faster per se it really depends on the
implementaion.
> You said, that most languages will have refcount semantics, just sounds
> funny for me to implement a GC then.
They have timely destruction semantics. I'm pretty sure that we can run
generational GC as fast as a refcounting one.
> Hm, that sounds not that bad if this works. Are there prototypes? Has
> this been tried out?
A very early and not yet functional generational GC is in src/gc_gms.c.
It's lacking basically the interpreter support for scope enter/exit.
> ... Sticking fresh objects into old ones
> isn't that rare i guess (and also in my case it wont be that rare).
Combined with:
>> Note: "old" and "young" are terms of reachability in the object graph,
It depends. Having everything in globals will hurt. But it's a bit too
early to say more about that.
>> mostly related to object creation time too (for subroutines) but not for
>> more global-like objects.
>>
> How to combine generational garbage collection and timely destruction
> there? Having a policy that doesn't check 'old' objects (often) wont
> work for 'old' objects that need timely destruction.
When objects are ordered according to their scopes, you basically have
only to cleanup the top scope (or generation of objects).
> So, are timely destruction objects in the younges generation all the time?
No. But objects that need timely destruction can be put in front of
their generation so that they are scanned first.
> How to check whether this 'young' object, that need timely destruction,
> isn't referenced by old ones?
> Don't you have to go through all (even old) objects to determine whether
> 'this' object that needs timely destruction, is referenced?
Worst case yes.
> I'm not an expert in garbage collectors, that are just some questions
> that pop into my mind there. Maybe you can point me some paper or some
> old discussion about this?
I've no bookmarks, sorry. But google has them :)
> robin
leo
Could you point out what i get?
I use TD is to handle resources: filehandles, database handles, gui
windows and all that. Refcounting does this with a little overhead, but
in a fast and deterministic O(1) way.
And how do you want to implement glib objects with parrot?
They are refcounted.
If it's deterministic behavior that you want, you probably ought to be
doing things explicitly. What you get as a Perl programmer is, well,
the ability to use circular data structure. Big deal, right? But if
you're an extention writer, you get relief from days of refcount
debugging and a nice fast throughput (though the former is what
convinces me). Hacking on parrot is such a joy, because I can just
forget that I need to free memory.
> And how do you want to implement glib objects with parrot? They are
> refcounted.
How do you do it in Perl 5? Glib doesn't use Perl's refcounts, it uses
its own right? (I don't actually know). But if the behavior of your
program depends on where things are destoryed, and caching a reference
somewhere can keep your window from closing, I'm under the impression
that you should be doing it explicitly. Perl 6 in particular provides
so many hooks that pure destruction-based program behavior is probably
already too obfuscated.
At least, that's my experience writing the ODE module. I initially had
the destroy_body calls inside the destructor. But I found that bodies
would magically disappear under my nose, and others would never go away.
I ended up switching to an explicit destroy call, and now my life is
much less painful.
But that's my opinion. :-)
Luke
When I programmed in C++ with proper use of the STL and other such
abstractions, almost the only time I needed destructors were for
block-exit actions. Perl 6 has block-exit hooks, so you don't need to
use destructors to fudge those anymore. And there's a lot less explicit
memory management in languages with GC (that's the point), so
destructors become even less useful.
I agree with Dan completely here. People make such a big fuss over
timely destruction when they don't realize that they don't really need
it. (But they want it). I think, more importantly, they don't
understand what they're getting in return for giving it up.
Luke
Well, i and the people i know don't think that collection of circular
data structures gives much.
> But if you're an extention writer, you get relief from days of refcount
> debugging and a nice fast throughput (though the former is what
> convinces me). Hacking on parrot is such a joy, because I can just
> forget that I need to free memory.
Memory is everything that you can forget. A pure GC costs you automatic
resource management. Right: A GC is for memory management.
>
> > And how do you want to implement glib objects with parrot? They are
> > refcounted.
>
[..]
> I ended up switching to an explicit destroy call, and now my life is
> much less painful.
Well, i don't want the explict destrutor-calling hell you have in java.
Some thing i just don't want to do explicit: Filedestruction for
example. And what if my class manages a database handle? do i have to do
something like:
$my_cool_handle_obj->free_my_resources (); ?
This would be manual memory management:
{
my $s = malloc CoolClass;
...
free $s;
}
And with explicit resource handling (without timely destruction) it may be:
{
my $s = new CoolClass;
...
destroy $s;
}
Not that big difference. And this is what we have with
refcounting/timely destruction:
{
my $s = new CoolClass;
...
}
It's just that in my opinion timely destruction isn't something people
use rare. It's a do-what-i-mean feature. And wasn't that a feature of
perl6?
I don't know whether most/many people think that timely destruction is
something that we don't need much.
But it lets you forget memory management AND resource management.
And if Leopold and Dan and others think we can get efficent timely
destruction with this, and it works, and it's fast,
i'm completly fine with it.
>
> But that's my opinion. :-)
>
And this is mine ;)
cu,
Yes, but do we know whether refcounting is really slower than a garbage
collector in the end?
>
> The circular garbage thing's also an issue. Yes, there are
> interesting hacks around it (python has one -- clever, but definitely
> a hack) that essentially involves writing a separate mark&sweep
> garbage collector.
I don't think circular references are used that much. This is maybe
something a programmer still has to think a little bit about.
And if it means, that timely destruction maybe becomes slow only for the
sake of collecting circular references... don't know if thats a big
feature.
Are cicrular references such a big issue in perl5? I heard that the
buildin GC in perl5 only runs at program end and captures the circular
references, what sometimes causes a segfault (as a friend of mine experienced).
>
> The thing is, there aren't any languages we care about that require
> true instant destruction -- the ones that care (or, rather, perl.
> Python and Ruby don't care) only guarantee block boundary collection,
> and in most cases (heck, in a near-overwhelming number of cases)
> timely destruction is utterly irrelevant to the code being run. Yeah,
> you *want* timely destruction, but you neither need nor notice it in
> most program runs, since there's nothing that will notice.
In many programruns you wont notice the overhead of refcounting too.
And in scripts, that only run up to (max) a minute, you won't even
notice if the memory isn't managed at all.
And timely destruction is still a feature thats used much more than
collection of circular references would be (IMHO)
>
> Having been too deep into the guts of perl, and having written more
> extensions in C than I care to admit to, I wanted refcounting *dead*.
> It's just spread across far too much code, tracking down errors is a
> massive pain, and, well, yech. Yes, non-refcounting GC systems are a
> bit more complex, but the complexity is well-contained and
> manageable. (I wrote the first cut of the GC system in an afternoon
> sitting in my local public library) There's also the added bonus that
> you can swap in all sorts of different GC schemes without disturbing
> 99% of the code base.
Just because refcounting is error-prone it doesn't mean that a garbage
collector is better (and less error-prone).
I agree, the code is more localized. But i guess that memory leaks
(and resource leaks) that are caused by a bug in a garbage collector aren't that easy
to find and fix also.
And there are many other things that are error-prone:
- a jit is error-prone
- optimizations are error-prone
- register allocation is error-prone
- parsers/lexers/* is error-prone
But i understand your argument and see that it's maybe better to
localize the complex memory management stuff.
>
[.snipped interesting technical stuff about optimizing GC.]
>
I guess we will see what we get, and if GC turns out to deal
with all the issues (timely destruction etc.) well and is the
perfect solution, thats great!
I don't have the ability to look into the future, but Java has
still (after so many years) quite some GC issues (my freenet process
seems to leak like hell, and the gc runs are causing full stops for
0.6s, what causes heavy latency).
But i don't say (nor i know) that parrot will have these problems.
I was just afraid that GC won't be faster and less overhead than a
refcounted solution, and we end up with a laggy parrot ;)
But many people here are convinced that the GC solution will be
faster and easier, and as parrot is still in developmen, i guess
there will be profiles and benchmarks in future.
> I really need to go profile perl 5 some time to get some real stats,
> but I think it's likely that most programs (well, most programs I run
> at least) have less than 0.1% of the variables with destructors, and
> maybe one or two variables *total* that have a need for timely
> destruction. (And most of the time they get cleaned up by global
> destruction, which makes 'em not actually timely cleaned up)
IMHO timely destruction is not that useless. The alternative would be
to free resources and close files explicit, like one has to do in Java.
And doing resource freeing explicit doesn't feel very Perl*-ish to me.
> >You said, that most languages will have refcount semantics, just sounds
> >funny for me to implement a GC then.
>
> Actually most languages won't have refcount semantics. Perl 5's the
> only one that really guarantees that sort of thing now, though I
> think it's in there for perl 6. I doubt the python, ruby, Lisp, or
> Tcl compilers will emit the cleanup-at-block-boundary sweep code.
Well, python doesn't gurantee it. But i heard some people rely on
that feature as the implementation does provide it.
But ok, as you have to manage resources manually in Lisp and Ruby
anyway, the compilers won't output the boundary-sweep.
It's there for Perl6, right, and i thought that parrot is in first place
there for Perl6. And there maybe will be new and other languages that
rely at least on timely destruction.
cya,
Research indicates that in theory, garbage collection can even be faster
than explicit memory management. One paper I remember in particular was
done by IBM using various GCs for the JVM.
-Melvin
Yes, we do. This is a well-researched topic and one that's been gone
over pretty thouroughly for the past twenty years or so. There's a
lot of literature on this -- it's worth a run through citeseer for
some of the papers or, if you've a copy handy in a local library, the
book "Garbage Collection" by Jones and Lins, which is a good summary
of most of the techniques, their costs, drawbacks, and implementation
details.
> > The circular garbage thing's also an issue. Yes, there are
>> interesting hacks around it (python has one -- clever, but definitely
>> a hack) that essentially involves writing a separate mark&sweep
>> garbage collector.
>
>I don't think circular references are used that much. This is maybe
>something a programmer still has to think a little bit about.
>And if it means, that timely destruction maybe becomes slow only for the
>sake of collecting circular references... don't know if thats a big
>feature.
Circular references are far more common than objects that truly need
timely destruction, yes, and the larger your programs get the more of
an issue it is. Neither are terribly common, though.
>Are cicrular references such a big issue in perl5? I heard that the
>buildin GC in perl5 only runs at program end and captures the circular
>references, what sometimes causes a segfault (as a friend of mine
>experienced).
Perl 5, like all other refcounting GC systems, is essentially
incremental and things get collected as the program runs. There's a
final global destruction sweep that clears up anything still alive
when the program exits, but this sweep doesn't guarantee ordering
(and it really can't in the general case when there's circular
garbage) and can in some cases cause segfaults if you've got
finalizers written in C that don't properly handle last-gasp out of
order cleanup.
> > The thing is, there aren't any languages we care about that require
>> true instant destruction -- the ones that care (or, rather, perl.
>> Python and Ruby don't care) only guarantee block boundary collection,
>> and in most cases (heck, in a near-overwhelming number of cases)
>> timely destruction is utterly irrelevant to the code being run. Yeah,
>> you *want* timely destruction, but you neither need nor notice it in
>> most program runs, since there's nothing that will notice.
>
>In many programruns you wont notice the overhead of refcounting too.
>And in scripts, that only run up to (max) a minute, you won't even
>notice if the memory isn't managed at all.
We're building a general purpose engine, remember. It needs to handle
programs with 10 objects that run in 10ms as well as ones with 10M
objects that run for 10 weeks.
That argument actually favors non-refcount GC -- there's a minor
speed win to a non-refcount system at the short end of program runs
and a significant one at the large end of program runs.
>And timely destruction is still a feature thats used much more than
>collection of circular references would be (IMHO)
In this case, YHO would turn out to be incorrect. Don't get me wrong,
it's a sensible thing to think, it's just that it doesn't hold up on
closer inspection.
> > Having been too deep into the guts of perl, and having written more
>> extensions in C than I care to admit to, I wanted refcounting *dead*.
>> It's just spread across far too much code, tracking down errors is a
>> massive pain, and, well, yech. Yes, non-refcounting GC systems are a
>> bit more complex, but the complexity is well-contained and
>> manageable. (I wrote the first cut of the GC system in an afternoon
>> sitting in my local public library) There's also the added bonus that
>> you can swap in all sorts of different GC schemes without disturbing
>> 99% of the code base.
>
>Just because refcounting is error-prone it doesn't mean that a garbage
>collector is better (and less error-prone).
>I agree, the code is more localized. But i guess that memory leaks
>(and resource leaks) that are caused by a bug in a garbage collector
>aren't that easy
>to find and fix also.
Actually they are, significantly. Bugs in a centralized GC system
show up reasonably quickly and usually very fatally -- they're pretty
darn catastrophic in most cases, either causing programs to consume
all memory and die, or collecting too soon and causing things to die.
Indeed, if you take a look through the list archives, most of the
subtle GC issues are in those places where we bypass the GC for some
reason, usually in tracking external resources.
> > I really need to go profile perl 5 some time to get some real stats,
>> but I think it's likely that most programs (well, most programs I run
>> at least) have less than 0.1% of the variables with destructors, and
>> maybe one or two variables *total* that have a need for timely
>> destruction. (And most of the time they get cleaned up by global
>> destruction, which makes 'em not actually timely cleaned up)
>
>IMHO timely destruction is not that useless. The alternative would be
>to free resources and close files explicit, like one has to do in Java.
>
>And doing resource freeing explicit doesn't feel very Perl*-ish to me.
Oh, I'm not saying that automatic resource freeing's a bad idea -- it
isn't. Neither am I saying that timely destruction's entirely useless
-- it isn't. What I *am* saying is that a true need for timely
destruction is rare when other facilities are provided, and as such
it doesn't warrant being optimized for.
> > >You said, that most languages will have refcount semantics, just sounds
>> >funny for me to implement a GC then.
>>
>> Actually most languages won't have refcount semantics. Perl 5's the
>> only one that really guarantees that sort of thing now, though I
>> think it's in there for perl 6. I doubt the python, ruby, Lisp, or
>> Tcl compilers will emit the cleanup-at-block-boundary sweep code.
>
>Well, python doesn't gurantee it. But i heard some people rely on
>that feature as the implementation does provide it.
Guido explicitly told me it's OK to break this sort of thing if I wanted. :)
>But ok, as you have to manage resources manually in Lisp and Ruby
>anyway, the compilers won't output the boundary-sweep.
>
>It's there for Perl6, right, and i thought that parrot is in first place
>there for Perl6.
Well.... no. Not for a number of years, but that's a very long story.
> And there maybe will be new and other languages that
>rely at least on timely destruction.
At the moment there aren't, and certainly not in the group of
languages that we primarily care about. Parrot has, for example,
features that make languages like C significantly difficult to
implement on top of parrot, but we don't really care because C isn't
one of our target languages..
Sure, most people do. Heck, I do. And... I don't need timely
destruction for them. Not in the "at scope exit this will be cleaned
up" sense, at least.
"Timely" may be getting in the way here. Timely is as soon as
possible but with very few exceptions it isn't necessary. (There are
a couple of perl idioms that really demand it at the block level, but
that's about it) A number of languages use it as an alternative to
providing block exit actions, though those aren't really a great
place for forcing finalizers, since you can't be sure that a thing is
actually in a state to be finalized.
You've really got three classes of things that the GC system needs to
know about. There are the things that need cleaning eventually
(basically the no-finalizer objects), things that need cleaning up
reasonably quickly (most things with finalizers, including most file
and DB handles) where reasonably quickly's in the 100-200ms range,
and things that need cleaning as soon as they die.
We don't provide any good support for instant destruction. People are
just going to have to live with that, but honestly I don't think
anyone's going to notice. (Though compilers can force a call to a
PMC's finalizer if they want) For block-level cleanup there's
conditional garbage collection and tagging PMCs as needing GC
attention, which provides the support that most everyone needs. How
expensive a sweep is depends entirely on how many PMCs you have, but
the tracing part of the collector's darned fast, especially if you're
using the aggregate PMCs that we provide, since they're special-cased
in the collector.
>Refcounting does this with a little overhead, but
>in a fast and deterministic O(1) way.
Deterministic, yes. Fast, no. Refcounting, in general, is about twice
as expensive over the life of a program as other GC schemes.
>And how do you want to implement glib objects with parrot?
>They are refcounted.
They'll be wrapped, so it's no big deal. When the PMC wrapping them
dies, its finalizer decrements the refcount of the wrapped glib thing
and leaves it to die on its own.
Or, with the "block hooks" that I keep claiming makes timely destruction
almost never needed, it is:
{
my $s = new CoolClass;
# ... do stuff that may throw ...
LEAVE { destroy $s }
}
This destroys properly and even propagates the exception.
Luke
Actually it destroys improperly, since it destroys unconditionally,
which is likely wrong. The right thing is to have the constructor for
CoolClass tag its constructed object as needing expedited destruction
and have the language compiler emit conditional sweep ops as part of
its block finishing code.
Explicit destruction like that will clean up the object even if there
are outstanding references, which is likely the wrong thing to do.
That's not necessary to prevent the resource from leaking. It's only
necessary if you want to unconditionally destroy the object when
control is leaving that block for the first time. You probably don't
want to, since that means any outstanding references will now refer
to a dead object.
If you don't have the destroy, and don't tag the object as needing
expedited cleanup, then the finalizer *will* still be called. You
just don't have any control over when its called.
> > Not that big difference. And this is what we have with
> > refcounting/timely destruction:
>>
>> {
>> my $s = new CoolClass;
>> ...
>> }
>>
>
>The latter example will destruct nicely if something throws.
Regardless of GC method, yes. The only question is when.
It's *really* important to note that I explained how parrot does GC
-- that wasn't opening a descussion on redesigning the feature.
Parrot doesn't have, and isn't going to have, a refcounting GC
system. That's just not an option.
Parrot's got a tracing GC system triggered on demand by failure to
allocate resources, explicitly by user code, or at regular intervals.
Switching to a different tracing system than what's in now is doable.
Switching away from a tracing system *isn't*. That'd require changing
the entire source base, and just isn't feasible.
On Thu, Apr 28, 2005 at 05:00:53PM +0200, Robin Redeker wrote:
> > Robin Redeker writes:
> And with explicit resource handling (without timely destruction) it may be:
>
> {
> my $s = new CoolClass;
> ...
> destroy $s;
> }
This should actually be, to prevent the resource from leaking:
{
my $s = new CoolClass;
eval {
... do stuff that may throws ...
};
destroy $s;
}
>
> Not that big difference. And this is what we have with
> refcounting/timely destruction:
>
> {
> my $s = new CoolClass;
> ...
> }
>
The latter example will destruct nicely if something throws.
--
I'm one of the maintainers of the perl5 core. perl5 is very mature, with
relatively few new features being added, and much emphasis on fixing
existing bugs. A quick look at my last 60 patches shows that about 1/3 of
them were related to ref count and memory leak bugs. Ref counting is near
impossible to get 100% right.
One of the advantages of GC over RC is that you are automating the
process, and once the automation has the bugs wrinkled out of it, the whole
rest of your code reaps the advantages. With RC, every single line of code
in the core (and extensions) has to be carefully thought about to make
sure there aren't leaks or premature frees, and every single change risks
introducing new errors.
(On the other hand, I *enjoy* hunting down refcnt bugs, so
Down With GC !! :-)
--
That he said that that that that is is is debatable, is debatable.
Circular references are useful any time you need to be able to iterate
over a collection, and also have to identify which collection a given object
is in.
This may even be implicit from other requirements, such as:
* enforcing that an object may be in at most one collection of objects at
a time
* instructing an object to detach from its current collection (especially
when forcibly destroying the object)
In fact I often want BOTH timely destruction AND circular datastructures;
for example, I want to hold an object that represents a row in a database,
and I don't want to have to care which transaction, table, database or
connection it belongs to.
But at the other end, the object representing a transaction has to be able
to reverse changes to all its component row objects. And when the
transaction goes out of scope it needs to be either committed or rolled
back.
-Martin
On Wed, Apr 27, 2005 at 12:33:30PM -0600, Luke Palmer wrote:
> I agree with Dan completely here. People make such a big fuss over
> timely destruction when they don't realize that they don't really need
> it. (But they want it). I think, more importantly, they don't
> understand what they're getting in return for giving it up.
Could you point out what i get?
I use TD is to handle resources: filehandles, database handles, gui
windows and all that.
My idea of "timely" is RIGHT NOW for filehandles, when the program ends
for DB handles, and whenever for the rest. So I don't see TD as much of
a win, even if it were completely for free. But it's not; the rest of
the world would have to forego entire classes of data structures in
order to save the few lines of extra code you would need to manage a few
of yours explicitly. This seems like a bad tradeoff.
Refcounting does this with a little overhead, but in a fast and
deterministic O(1) way.
This is the first half of an apples-to-oranges comparison, and so is
misleading even if partly true. Refcounting may be proportional
(approximately) to the amount of reference manipulation, but GC is
proportional (though even more approximately, and with a different
constant) to the amount of memory allocated [1]. Because of this, a
program that runs for a long time without allocating much additional
memory incurs zero GC cost during that time; this is a clear (and even
deterministic) win over refcounting.
From: Dan Sugalski <d...@sidhe.org>
Date: Thu, 28 Apr 2005 13:10:00 -0400
. . .
>I don't think circular references are used that much. This is maybe
>something a programmer still has to think a little bit about.
>And if it means, that timely destruction maybe becomes slow only for the
>sake of collecting circular references... don't know if thats a big
>feature.
Circular references are far more common than objects that truly need
timely destruction, yes, and the larger your programs get the more of
an issue it is. Neither are terribly common, though.
I'm astounded. Do neither of you ever design data structures with
symmetrical parent<->child pointers? No trees with parents? No
doubly-linked lists? In my (probably skewed) experience, circular
references are used frequently in languages like C or Lisp that don't
penalize them.
-- Bob Rogers
http://rgrjr.dyndns.org/
[1] Work done by a copying GC when it runs is proportional to the
amount of non-garbage copied, but that doesn't apply to Parrot, and
we were talking about GC frequency in any case.
That's good ... but can we have a shorthand; stealing an old C keyword, I'd
suggest:
{
auto $a = new CoolClass;
# ... do stuff that my throw ...
}
RR> I don't think circular references are used that much. This is
RR> maybe something a programmer still has to think a little bit
RR> about. And if it means, that timely destruction maybe becomes
RR> slow only for the sake of collecting circular references... don't
RR> know if thats a big feature.
ever do any callback stuff? an object needs to be called back from a
service (say a i/o handler). it must pass itself to the service to be
stored there. the service returns a handle which needs to be stored in
the object so it can be used to manage it (start/stop/abort/etc.). there
is a quick circular ref.
now, weakrefs can fix that but i don't like that solution. GC fixes it
with no problems at all. you can just drop the object and it and the
service handle (whatever that is) will be reclaimed soon enough.
and there are many useful data structures that need backrefs. not
everything can be represented in clean tree. i won't go into these but
there is plenty of literature on them.
by far the biggest problem of refcounting (from the coder's perspective,
not it speed or internal debugging stuff) is circular refs. i currently
require explicit destroys to make sure i break them. i could (maybe i
will one day) switch to weakrefs. but i would much prefer to not know or
care about circular refs at all and allow stuff be collected when they
can.
but there is another reason i need explicit destroys, the service must
be shut down for a given handle. you can't have a select loop running
with lots of dead sockets and callbacks to nowhere. even weakrefs and
refcounting can't fix this and neither can GC. waiting later to close
down external resources like DB handles can work but select loops need
finer control.
in any case, i am totally for dan's decision to go with GC. it removes a
major weakness in perl's memory management. refcounting just doesn't
win in speed nor in internal complexity nor in safety of coding.
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
{
my $s does LEAVE { destroy $s } = new CoolClass;
# ... do stuff that may throw ...
}
Or something like that?
The information contained in this e-mail message is privileged and/or
confidential and is intended only for the use of the individual or entity
named above. If the reader of this message is not the intended
recipient, or the employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please immediately
notify us by telephone (330-668-5000), and destroy the original
message. Thank you.
Yes,it's
my $s will leave { destroy $s } = new CoolClass;
and then it only destroyes if $s has been initialized. Also handy
are transactional versions that work on success or failure of the block:
my $s will keep { destroy $s } = new CoolClass;
my $s will undo { destroy $s } = new CoolClass;
There is no requirement for timely destruction in Perl 6.
Larry
> Isn't there something like:
> {
> my $s does LEAVE { destroy $s } = new CoolClass;
> # ... do stuff that may throw ...
> }
> Or something like that?
Not currently. There used to be a C<destroy> opcode, but I've deleted
it, because I thought it's too dangerous. But it could be useful *if*
the compiler or parrot hacker really knows that there is not any
reference to that object around.
Above would roughly translate to
leave = newsub .Closure, _LEAVE
pushaction leave
...
s = newclass "CoolClass"
...
.end
.sub _LEAVE
s = find_lex "$s"
destroy s
.end
(variable declarations omitted)
leo
Oh, I realize that, along with a number of other useful uses of
circular references. I can't speak for Robin, but when I said
circular refs weren't that common I was talking in the overall number
of things case. The large majority of objects are dead-stupid things
that have no finalizers and no references to anything, the second
largest (and definitely smaller case) is objects that refer to other
objects in a non-circular fashion, then circular objects, then
objects that need timely destruction.
Anyway, that usage pattern argues for efficiency handling simple PMCs
first, reference/aggregate PMCs second, and ones with timely
destructors last, which is how parrot's DOD/GC system's set up.
Actually it's proportional to the number of live objects.
A refcounting scheme has to touch *all* objects at least twice, while
a tracing scheme generally has to touch only the objects that are
live at trace time. For the most part, refcount O(n) time is
proportional to the total number of objects created, while tracing
O(n) time is proportional to the number of live objects.
It's definitely possible to work up degenerate examples for both
refcount and tracing systems that show them in a horribly bad light
relative to the other, but in the general case the tracing schemes
are significantly less expensive.
> From: Dan Sugalski <d...@sidhe.org>
> Date: Thu, 28 Apr 2005 13:10:00 -0400
>
> . . .
>
> >I don't think circular references are used that much. This is maybe
> >something a programmer still has to think a little bit about.
> >And if it means, that timely destruction maybe becomes slow only for the
> >sake of collecting circular references... don't know if thats a big
> >feature.
>
> Circular references are far more common than objects that truly need
> timely destruction, yes, and the larger your programs get the more of
> an issue it is. Neither are terribly common, though.
>
>I'm astounded. Do neither of you ever design data structures with
>symmetrical parent<->child pointers? No trees with parents? No
>doubly-linked lists? In my (probably skewed) experience, circular
>references are used frequently in languages like C or Lisp that don't
>penalize them.
I responded to Uri on this, but note that I said "neither are
terribly common", and they aren't. Relative to the total number of
GC-able things, objects in circular structures are a very small
minority. Which, of course, doesn't help much as an app designer when
you have to deal with these things, but it is important to know when
doing the design of the back end, since relative usage of features
needs to be kept in mind when making design tradeoffs. One of those
annoying engineering things. (Just once I'd love to have my cake
*and* eat it too, dammit! :)
We really need to put it back in -- I knew it was dangerous, but it
was necessary. We should probably make it 'safe' by forcing the
destroyed PMC to be an Undef after destruction, in case something was
still referring to it.
At 10:55 PM -0400 4/28/05, Bob Rogers wrote:
> From: Robin Redeker <ro...@nethype.de>
> Date: Thu, 28 Apr 2005 00:12:50 +0200
> Refcounting does this with a little overhead, but in a fast and
> deterministic O(1) way.
>
>This is the first half of an apples-to-oranges comparison, and so is
>misleading even if partly true. Refcounting may be proportional
>(approximately) to the amount of reference manipulation, but GC is
>proportional (though even more approximately, and with a different
>constant) to the amount of memory allocated [1].
Actually it's proportional to the number of live objects.
When running a sweep, yes, but the frequency of sweeping is in turn
proportional to the rate at which new memory is allocated, depending of
course on GC settings. So the overall cost is proportional to the
product, which (and this was my point) can be effectively zero for some
programs some of the time [1].
A refcounting scheme has to touch *all* objects at least twice, while
a tracing scheme generally has to touch only the objects that are
live at trace time.
That is true for a copying GC, but a mark/sweep GC also needs to visit
both the quick and the dead during the sweep phase in order to free
them.
For the most part, refcount O(n) time is
proportional to the total number of objects created, while tracing
O(n) time is proportional to the number of live objects.
But you have to increment/decrement/test the refcount each time you pass
a refcounted object to a sub, don't you? So that makes the cost of
refcounting proportional to runtime, doesn't it? (I don't know how
Perl5 does it in detail.)
It's definitely possible to work up degenerate examples for both
refcount and tracing systems that show them in a horribly bad light
relative to the other, but in the general case the tracing schemes
are significantly less expensive.
Agreed.
>I'm astounded. Do neither of you ever design data structures with
>symmetrical parent<->child pointers? No trees with parents? No
>doubly-linked lists? In my (probably skewed) experience, circular
>references are used frequently in languages like C or Lisp that don't
>penalize them.
I responded to Uri on this, but note that I said "neither are
terribly common", and they aren't. Relative to the total number of
GC-able things, objects in circular structures are a very small
minority.
I can think of many programs I've written or otherwise hacked on where
this is not the case. In some cases, the majority of objects are
directly circular (i.e. part of a cycle as opposed to being referenced
from an object in a cycle). But I suppose that just means that we've
worked on very different apps. Before Perl5, I used to use parent
pointers at the drop of a hat. But, 'nuff said, I guess.
Which, of course, doesn't help much as an app designer when
you have to deal with these things, but it is important to know when
doing the design of the back end, since relative usage of features
needs to be kept in mind when making design tradeoffs. One of those
annoying engineering things. (Just once I'd love to have my cake
*and* eat it too, dammit! :)
--
Dan
In engineer heaven, perhaps. "RePentium, and ye shall be saveall'ed!"
;-}
-- Bob
[1] This was commonly true of programs written for the early Lisp
Machines; programmers tried really hard to avoid allocation of
short-term garbage, since real memory was expensive and GC was
painfully slow.
> ... For the most part, refcount O(n) time is
> proportional to the total number of objects created, while tracing
> O(n) time is proportional to the number of live objects.
Not quite. Refcount is O(work) or O(ptr-assign), which can be quite
different. Python refcounts all, Perl5 doesn't do refcounts on the
stack. Mark&sweep is O(live + mem) the O(live) part is the mark phase,
while the sweep goes through all allocated memory. Mark&Copy would be
O(live). But all these numbers don't account for cache misses due to GC
and are therefore rather useless.
leo
>>Not currently. There used to be a C<destroy> opcode, but I've deleted
>>it, because I thought it's too dangerous.
> We really need to put it back in -- I knew it was dangerous, but it
> was necessary.
Yeah.
> ... We should probably make it 'safe' by forcing the
> destroyed PMC to be an Undef after destruction, in case something was
> still referring to it.
That sounds sane. Or maybe be: convert to an Undef and put a Null PMC
vtable into it which would catch any further access to that PMC.
BTW shouldn't we really separate C<destroy> and C<finalize>? The latter
would be overridable by user code, the former frees allocate memory.
leo
Actually I think you'll find that isn't the case, though it's easy to
overlook, and is probably mostly my sloppy terminology. Programs,
when they run, tend to create a lot of what I've been calling
objects, but for parrot they'd be PMCs, and not necessarily actually
real objects as such.
Languages parrot's going for -- perl, python, ruby, php, tcl, and
their ilk -- do chew through a lot temps. For the GC's purposes, and
mine, they count. When you add those in, the plain GCable thing count
tends to skyrocket. Yeah, the *important* data structures in a
program may be partly, or mostly, in circular structures, but the
hundred thousand temps that get created to hold intermediate values
are all uncircular things. This is definitely not the case for
languages like C++ where you're a lot closer to the metal, but we're
a bit further away from it here. (It'd be interesting, since parrot
*can* keep stats, to run some complex programs and see what the GC
numbers look like. I ought to turn it on for some of the
longer-running reports I have and see what there is to see, though
the results may be as much a condemnation of my compiler as anything
else. :)
> BTW shouldn't we really separate C<destroy> and C<finalize>? The latter
> would be overridable by user code, the former frees allocate memory.
I think that's wise and I thought that was the plan. Certainly for
*Struct PMCs I don't care when Parrot releases memory, but I want a
chance to make sure that I can release unmanaged memory at some point.
For filehandles I want to make sure that I can close them at some point,
not necessarily have them collected.
It's finalization that matters here, not destruction.
-- c
Yes. Or no, depending on your terminology. We're calling them
finalizers since that's what they really are, as there's no memory to
destroy. There's a vtable method that's called by the GC system when
an object is no longer reachable from the root set.
> And if so, what
>would the purpose of them be?
It's there so that if there's any sort of cleanup that needs doing --
closing DB handles or files, destroying windows, updating stats --
can be done. You don't have to free memory unless you've managed to
allocate it in a way that parrot's not tracking it.
Nope, I don't think so. There's really only one action -- "You are
dead. Go clean up after yourself" -- that PMCs should be getting.
There's no need to clean up memory since we do that for you
automatically, and if you have to release memory back to a
third-party library it's part of the cleaning up after yourself bit.
I can't really think of a reason to have two cleanup actions. Maybe
I'm missing something here.
Just a small question:
On Thu, Apr 28, 2005 at 04:37:21PM -0400, Dan Sugalski wrote:
> If you don't have the destroy, and don't tag the object as needing
> expedited cleanup, then the finalizer *will* still be called. You
> just don't have any control over when its called.
>
Will there be destructors on imcc or language level? And if so, what
would the purpose of them be?
cya,
robin
Actually, not when, but some (indefinite) time after that has happened,
right?
> > And if so, what
> >would the purpose of them be?
First, let me define "destructor" as something that runs very soon after
an object is no longer reachable (perl5 sense), and a "finalizer" as
somethign that gets called before the memory of an object gets reused
(java sense).
The first quoted sentence obviously defined "destructors", as they get
called when an object is no longer reachable (I understand "when" as "as
soon as" here, which is not neccesarily what you meant). I suspect that
reality will actually match the java sense: when a GC runs (at some
indefinite time, as there is no need to call the GC as long as memory is
still available, or at least that's the only definite GC
triggering-event) it detects dead objects and finalizes them, to be
reused later (next GC run for example).
> It's there so that if there's any sort of cleanup that needs doing --
> closing DB handles or files, destroying windows, updating stats --
> can be done.
Yes, that's actually the point - many java programs do it that way, and
this leads to a number of problems, for example programs that works fine
on a 1GB machine suddenly stop working on a 4GB machine, because it
reaches a filehandle resource limit. Often these filehandles control
resources outside the knowledge of the virtual machine, for example,
database connections.
It's a pretty common bug in java to close database handles in the
finalizer, as databases often have some limits on the number of database
handles, of which the program doesn't have any knowledge, so one program
can keep unrelated other programs from connecting to the database when
the database runs out of resources that are tied to the java program.
The examples you gave are considered bugs in other languages that have
tracing GC's exactly because of this "action at a distance". In java,
for example, there is no guarentee that a finalizer will ever be called
for an object, and I don't think parrot will guarentee it either (simply
because defining such a guarentee is difficult).
> You don't have to free memory unless you've managed to
> allocate it in a way that parrot's not tracking it.
Exactly, a GC (any kind of) will manage memory for the program, and
nothing else (one can extend running a memory-GC when filehandles run
out, and hope that finalizing unused objects will also free up some
filehandles, but that only helps for resources the gc explicitly knows
about and is pretty inefficient, as the GC is optimized for memory).
I was asking the question about a useful example of what can be done in
a finalizer because I, frankly, still don't know of a useful example.
The "obvious" examples like closing filehandles or database handles are
actually bugs, because you have no definite guarentee on when they will
be freed.
For these cases, the workaround is easy: explicit resource management,
such as "auto $a = ..." or the using() directive of C♯.
This is a valid solution, but it doesn't rely on finalizers, but
on manual resource management - the "close/dispose/etc." function gets
called explicitly.
That's why I wonder wether valid examples for finalizers exist - so far
I can only imagine finalizers for objects that have non-GC'ed memory
attached, such as C library objects. I can't imagine a use for
finalizers within the language that uses the GC itself, because all such
uses seem to result in the same class of spurious, memory-dependent
bugs as in the examples above, because the GC is used to manage
non-memory resources.
In summary: The GC only manages the memory "resource", and the only
resource that can be managed with finalizers also is "memory",
everything else would need to be managed explicitly, as the GC
has no knowledge about it or cannot be taught about it, for example,
database connection limits (actually, in the latter case, the GC would
need an even from the database to notify it of the need to GC, which is
obviously obscure).
As finalizers in a tracing-GC language only make use for memory, they
are not useful at all, as memory is already being taken care of (at
least I have never heard of a contradicting example).
I think the java community realized this a long time ago, and I think
that m$ tried to rectify this by introducing the Dispose interface and
the using() statement within C♯.
However, you don't need finalizers for that style of management.
(I'm not arguing for timely destruction, I am just wondering about the
theoreticel usefulness of a finalizer method, and cannot find any).
Most of this thought is from portland pattern repository's wiki:
http://c2.com/cgi/wiki?FinalizeInsteadOfProperDestructor
I have been enjoying the recent discussion of GC vs refcounting. Thanks.
While you're rehashing/justifying sensible design decisions made years
ago ;-) I was wondering why you decided to roll-your-own GC rather than
use an established one e.g. Hans Boehm's.
I ask this not because I have any criticisms of your decision but rather
that I have been using this GC in a number of other projects (including
a small programming language) and would be interested in why it was
rejected for Parrot (I assume it *was* considered even fleetingly).
Is there something important I should know about it?
Cheers,
Sam Phillips
<lurk>
Just so you don't talk past each other, I'll point out that this is
backwards from how we usually talk about these.
Luke
>>BTW shouldn't we really separate C<destroy> and C<finalize>? The latter
>>would be overridable by user code, the former frees allocate memory.
> I can't really think of a reason to have two cleanup actions. Maybe
> I'm missing something here.
I just have the gut feeling that folks want to run user code in
finalizers for some cleanup action.
leo
> While you're rehashing/justifying sensible design decisions made years
> ago ;-) I was wondering why you decided to roll-your-own GC rather than
> use an established one e.g. Hans Boehm's.
Mainly two reasons: no one did try to implement e.g. Boehm GC with
Parrot. Second: our own GC systems are aware of all data structures,
they know where pointers to other objects are and the GC system can use
information from the running interpreter like hints for scope enter and
exits.
This should lead to a more performant GC system, but if someone wants
to plug in Boehm GC, why not.
> Cheers,
> Sam Phillips
leo
I did look at the Boehm collector when we started this. It's a good
piece of software and works really well. The reason I didn't go with
it was basically one of speed.
The Boehm collector is a conservative collector -- that is, it
doesn't make any assumptions about what is and isn't a pointer to a
collectable thing, and errs on the side of caution. For the uses
people put it to, that's a good thing indeed. That caution brings two
issues along with it, though. The first is that it's relatively slow,
since it can't make any assumptions about what is and isn't a
pointer, and therefore has to treat everything as if it might be a
pointer and test it accordingly. The second issue is that since it
can't assume things really are pointers or not you can have random
memory mis-identified as a pointer to something and therefore have an
object left as live when it really isn't. Neither of these problems
is a bug per se in the Boehm collector -- it's one of those nature of
the beast things.
Parrot, though, because of what it is doesn't need a conservative
collector. We can know exactly where pointers to collectable things
live, which means that we can both be faster (no pointer tests, no
spending time on memory that can't be pointers) and exact. That means
collection runs don't leave objects alive when they're not because an
IEEE float or bitvector was mis-identified as pointing to something
real, and it means that collection runs are faster since we check
fewer places for pointers, the pointer checks are faster (no checking
to see if it's a real pointer), and there are on average fewer
objects alive at any one time. The only place this falls down is when
we check the system stack, but it's generally only a few dozen words,
so the extra overhead's OK, and it certainly beats the alternative.
(Which is crashing)
Basically if you're in a position to build an exact collector you'll
get a nice speed win over using a conservative one. If you can reduce
the uncertainty you get a speed boost. A lot of programs aren't in a
position to do that, which is fine. Parrot, because of what it is,
*is* in a position to do so, so we did.