Garbage Collector

Martin Wartens

unread,

Apr 11, 2002, 1:09:59 PM4/11/02

to

Hi,
I'm looking for a garbage collector package written in c++. I'd like it to
have the following features:
=> It should let me to do all the things with pointers to garbage collected
objects that I can do with raw pointers
=> It should be able to deal with circular references. (That's what makes a
garbage collector different from a reference counted smart pointer).
=> It should support inheritance and multiple inheritance
=> It should have cast operators
=> It should be STL compliant
=> It should support exceptions
=> It should allow me to choose for which objects I want to use garbage
collection.
=> It should be thread safe (and support multiple platforms)
=> It should not be tangled with other code
=> It should be free
I already had a short look at the following packages, but they don't seem to
be what I want:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
The Boehm garbage collector that comes with gcc. Apparently it can only be
used to switch on garbage collection globally for the whole program, because
it completely replaces C++ new (?)
http://www.codeproject.com/cpp/garbage_collect2.asp
"A garbage collection framework for c++" by William E. Kempf. It is said
there that it supports multiple inheritance, but there doesn't seem to be
any casting operators. Also, the code is only for Microsoft VC++.
http://giggle.sourceforge.net/
"Gene's Intrepid Garbage-Gathering Library Engine" by Gene Michael Stover.
The project is still in the alpha state. You can choose different methods of
collection. It doesn't support inheritance right now.

I would greatly appreciate any comments on the topic.
Thanks,
Martin Wartens

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

James Kanze

unread,

Apr 12, 2002, 1:00:11 PM4/12/02

to

"Martin Wartens" <mwar...@hotmail.com> writes:

This is an artifact of the way it is integrated into the language, I
think. Globally, there is nothing inherent in the Boehm collector which
prevents maintaining separate pools for garbage collected objects and
non garbage collected ones.

One of the problems, I think, is that by default, all of the allocations
in the standard library will not use garbage collection. And frankly,
if std::string doesn't use garbage collection, why bother. As far as I
know, in g++, the Boehm collector is just thrown in; there has been no
attempt to really integrate it into the rest (yet ?).

The Boehm collector does meet all of your other demands. The real
question is how to best integrate it into the language.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Hans-J. Boehm

unread,

Apr 12, 2002, 9:18:25 PM4/12/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message news:<a946np$js8$1...@news.cs.tu-berlin.de>...

> => It should allow me to choose for which objects I want to use garbage
> collection.

In general, this has some fairly tricky consequences, since you need
to deal with pointers from uncollected to collected objects and
vice-versa. You either need a "smart pointer" mechanism to track
pointers from uncollected to collected objects, or the collector needs
to scan the uncollectable objects anyway. Both have their
disadvantages, cycles between collected and uncollected objects don't
work, etc.

You may not be able to avoid this completely. But I think you're
generally better off sticking to a single memory management scheme as
much as possible, with a few carefully considered exceptions. It
often seems to be desirable to exempt a few large pointerfree objects
from garbage collection, for example. So long as they're pointerfree
and directly referenced from roots, that has minimal impact.

>...

> http://www.hpl.hp.com/personal/Hans_Boehm/gc/
> The Boehm garbage collector that comes with gcc. Apparently it can only be
> used to switch on garbage collection globally for the whole program, because
> it completely replaces C++ new (?)

In general, it provide alternative implementations of malloc/free or
new/delete. There are several possible ways to use it from C++ code.
Completely replacing new/delete is often desirable because it ensures
that the collector can see all pointers to collectable objects.

There is some small amount of discussion of alternative collector
interfaces in

http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcinterface.html

(The latest version also includes a more portable STL allocator. In
theory it should be usable with any fully standard-conforming
compiler. In practice it seems to work with at least g++ and VC++.)

A few other GC alternatives specifically for C++ are described in or
pointed to by

http://www.cs.rpi.edu/~schupp/entries/SOFTWARE/fgc/fgc_home/tgc.html.

Hans Boehm
(Hans_Boehm<at>hp<dot>com)

Hans-J. Boehm

unread,

Apr 12, 2002, 10:18:38 PM4/12/02

to

James Kanze <ka...@gabi-soft.de> wrote in message news:<86elhly...@alex.gabi-soft.de>...

> As far as I
> know, in g++, the Boehm collector is just thrown in; there has been no
> attempt to really integrate it into the rest (yet ?).

It's part of the gcc distribution primarily because it's part of the
runtime needed by the Java (gcj) front end. It's reasonably well
integrated for that purpose :-) . As far as the C++ compiler is
concerned, I think the above statement is exactly right.

Note that this is distinct from the garbage collector used by gcc
itself, i.e. at compile time. It's an entirely different piece of
code and requires explicit root registration. I am told that the
primary reason for that is that not all of gcc's host platforms (a
very large number) are supported by our garbage collector.

Hans

(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Martin Wartens

unread,

Apr 15, 2002, 6:52:50 PM4/15/02

to

First of all, thank you for your replies.

There is one more feature I forgot to mention, because it seemed so natural
to me as a C++-programmer:
=> The garbage collector should call the destructor of the object when
removing it.
There are two useful things that a destructor can do:
1) release resources: "file.close()"
2) delete uncollectable objects: "delete my_std_string;"

I tried the C++-"Class inheritance based interface" for the Boehm collector
(http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcinterface.html). Classes
that want garbage collection must inherit from a base class "gc". If you
want the collector to call the destructor of the object, you have to inherit
from "gc_cleanup". For some reason, the "gc_cleanup" variant doesn't work
with cyclic references and is therefore practically useless. Also the
interface should be template-based rather than inheritance-based. [This is
most likely because the interface was written in 1994 when there was no real
compiler support for templates.]

I also had a look at the "Template Garbage Collector"
(http://www.cs.rpi.edu/~schupp/entries/SOFTWARE/fgc/fgc_home/tgc.html), but
I couldn't figure out how it has to be used. There are no examples and the
project seems to be dead.

No I am going with the simple garbage collector by William E. Kempf
(http://www.codeproject.com/cpp/garbage_collect2.asp). I needed to add some
casting operators and adapt the code for Linux. It is quite slow, but it
works.

Martin Wartens

James Kanze

unread,

Apr 16, 2002, 2:53:09 PM4/16/02

to

"Martin Wartens" <mwar...@hotmail.com> writes:

|> There is one more feature I forgot to mention, because it seemed so natural
|> to me as a C++-programmer:
|> => The garbage collector should call the destructor of the object when
|> removing it.

I disagree. If the object needs a destructor, then it probably needs it
to be called at a deterministic time. The garbage collector cannot do
this, so such objects shouldn't be garbage collected.

--
James Kanze mailto:ka...@gabi-soft.de

Conseils en informatique oriente objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Hans-J. Boehm

unread,

Apr 17, 2002, 4:07:01 AM4/17/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message news:<a9f0g4$s25$1...@news.cs.tu-berlin.de>...

>
> I tried the C++-"Class inheritance based interface" for the Boehm collector
> (http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcinterface.html). Classes
> that want garbage collection must inherit from a base class "gc". If you
> want the collector to call the destructor of the object, you have to inherit
> from "gc_cleanup". For some reason, the "gc_cleanup" variant doesn't work
> with cyclic references and is therefore practically useless.

[ The following discussion uses "destruction" and "finalization"
interchangeably. This is dangerous, since garbage-collector-triggered
finalization is and must be asynchronous, making it different from
conventional destructor calls, whiich occur at well-defined program
points.]

There is a good reason for the treatment of cycles. It should work
the way it does. If object A points to Object B, and both need to be
finalizable, i.e.need to have their destructors invoked, I claim you
clearly want object A to be finalized first. Otherwise A's destructor
may refer to B and see a destroyed object. Thus the collector
generally ensures that objects are finalized in "topological order",
i.e. objects are finalized before the objects they reference.

So what happens if A points to B which points to A? A priori, there
is no safe order in which to invoke destructors. If A's destructor
actually refers to B, and B's destructor refers to A, something is
wrong, since at least one will see a partially destroyed object. The
safest thing the collector can do is perhaps print a warning and not
run the destructors. That's essentially what it does.

If objects contain pointers that will not be followed during
destructor execution, it's possible to let the collector know that.
There are several ways to do this, but the easiest is probably to
split the object:

Consider an object P that holds a system resource R which should be
deallocated before the object is reclaimed. (It's useful to think of
R as a file descriptor, if you're willing to assume that we either
have a reasonable number of them, or that finalization is only a
back-up mechanism for closing them.) The destructor for P does
nothing but deallocate the object, and follows none of the many
pointers in P. Instead of having P inherit from gc_cleanup, P should
point to another object P'. P' contains either R or a copy of it, but
no pointers. Only P' inherits from gc_cleanup. In this way P still
has access to R (either by following the pointer to P', or by
accessing a local copy). It doesn't matter whether P is involved in a
cycle, since it need not be finalized. P' must be finalized, but can't
be a member of a cycle, since it contains no pointers.

This is essentially the solution that was used and reasonably well
tested in languages such as Modula-3 and Cedar. I believe it is
closely related to what's usually done in Smalltalk implementations.

Java does not impose this sort of finalization ordering. Hence it's
actually fairly easy to turn off in our collector. But it's not
recommended except in Java runtime systems.

I still believe the Java rule was a mistake. But it's not a critical
mistake, since:

1) It can be worked around with static references to objects, which
are removed by finalizers, thus keeping objects accessible until it's
safe to finalize them, and

2) Garbage collected code rarely uses finalizers, thus it's relatively
easy to tolerate some inconveniences. (I don't think it's possible to
live completely without finalization. They are occasionally
unavoidable, but only occasionally.)

The exisiting (synchronous) destructor facility in C++ does impose
some ordering constraints. They look a bit different, but the idea is
clearly also to make it possible to destroy objects before the ones
they refer to.

> Also the
> interface should be template-based rather than inheritance-based. [This is
> most likely because the interface was written in 1994 when there was no real
> compiler support for templates.]
>

Did you look at gc_allocator.h in the latest version? (This doesn't
address the finalization issue, though you can still use the C
interface. And it would be fairly easy to support directly, I think.)

Hans

(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Martin Wartens

unread,

Apr 17, 2002, 7:40:08 PM4/17/02

to

Thanks for your patience in explaining all those garbage collector basics to
me.

I understand the problem with following pointers in the destructors of
collectable objects, but I don't understand why this should be a trouble of
the garbage collector. Why don't you just set up a rule "if your class
inherits from gc, you are not allowed to follow pointers to collected
objects in your destructor" and let it be the programmer's problem?
If you want to have the problem handled by the garbage collector, you could
use a templated approach and overwrite all pointer-operators (look, for
example, at Kempf's garbage collector for this). This way you could let the
garbage collector supervise all pointer, new and delete operations. I
believe that you could solve most of the cyclic reference problems by this
approach.

What about uncollectable objects inside collectable objects?, e.g.
class CollectMe: public gc
{
...
std::string mystring; //uncollectable object
}
Unless I re-typedef "std::string" to use gc_allocator, mystring will never
be freed, because the destructor of CollectMe is not called (this was
already pointed out by James Kanze earlier in this thread). If I want to
use external classes that can't be parametrized by an allocator, I am lost.

I still don't think that the gc_cleanup class is very useful: If memory
management is so complicated that I want to plug-in a garbage collector, it
is difficult to make sure that there are no cyclic references. And when I
use the gc class that can handle cyclic references, I have to be aware of
two things:
1) The destructors of my classes are never called, what is a quite odd
thing for a C++-Programmer
2) I must not use uncollectable objects inside my collected objects,
even if they are as simple as std::string
This isn't very satisfactory to me.

The request to use garbage collection as a "plug-in" for some classes only
seems perfectly sensible to me in the C++-context. When memory management is
difficult, use garbage collection, otherwise use explicit deallocation and
save some time.

What do you think of a templated approach like the one used by Kempf and
others? You loose a bit of elegance, because you have to use specialized
pointers ("GC_Ptr<CollectMe>") and new/delete operators. But you are able to
track all the pointer operations this way and facilitate the work of the
garbage collector (true?).

Martin Wartens

Pierre Baillargeon

unread,

Apr 18, 2002, 7:39:13 PM4/18/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message news:<a9k9qt$plc$1...@news.cs.tu-berlin.de>...

>
> I understand the problem with following pointers in the destructors of
> collectable objects, but I don't understand why this should be a trouble of
> the garbage collector. Why don't you just set up a rule "if your class
> inherits from gc, you are not allowed to follow pointers to collected
> objects in your destructor" and let it be the programmer's problem?

Another way would be to keep track of the allocation order in the
gc_cleanup base class and destroy the objects in reverse order when
they form cycle. Since the first object existed prior to the other, it
is reasonable to assume that it can live without the other.
Documenting this behavior does the rest of the job for odd designs.

To keep track of order a simple counter can be used. It does not even
need to be thread- safe since objects created at "the same time" in
different threads don't have well-defined ordering anyway.

For total safety, a special gc_cleanup_cycle base class could be
provided which has a virtual function takinga pointer to another
gc_cleanup_cycle object and returns true the provided object must be
destroyed first, false if second. Then a standard sort algorithm can
be run on cycles to determine destruction order.

In light of Mr Boehm extensive knowledge about GC, I suppose all this
is trite and obvious.

Hans Boehm

unread,

Apr 19, 2002, 9:26:39 AM4/19/02

to

A correction to my last posting:

I ws apparently looking at an older version of William Kempf's collector.
If I read the new one correctly, it should be thread-safe, though at the
(substantial) expense of a Windows CriticalSection entry per pointer
assignment.

Hans Boehm

(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Hans Boehm

unread,

Apr 19, 2002, 9:28:45 AM4/19/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message

news:a9k9qt$plc$1...@news.cs.tu-berlin.de...
> ...

> Why don't you just set up a rule "if your class
> inherits from gc, you are not allowed to follow pointers to collected
> objects in your destructor" and let it be the programmer's problem?

You could, and that's essentially the Java approach. The problem is that
this makes it very easy to accidentally dereference destroyed objects in
destructors. That will typically result in dereferencing of dangling
pointers and in memory overwrite errors, which the collector was supposed to
avoid. (In pure Java, you don't get dangling references, since finalized
objects are still allocated. Once those finalizers are used to manually
deallocate C/C++ objects, it's an issue there, too.)

> If you want to have the problem handled by the garbage collector, you
could
> use a templated approach and overwrite all pointer-operators (look, for
> example, at Kempf's garbage collector for this). This way you could let
the
> garbage collector supervise all pointer, new and delete operations. I
> believe that you could solve most of the cyclic reference problems by this
> approach.

I'm reasonably familiar with "smart pointers". But I'm not sure how they
help in dealing with cycles here unless you effectively start introducing
pointers that are invisible to the collector. I personally would avoid
doing that, since again it largely defeats the purpose of the collector.
(Smart pointers by themselves have some of this problem, since they always
seem to leave a few rough spots around the edges. E.g. Kempf's collector
doesn't track references to objects, eventhough there may not be a pointer
to the same object.)

> What about uncollectable objects inside collectable objects ...
You will need to manage those with some form of finalization, e.g.
gc_cleanup. (Sometimes it's also possible to replace the underlying
malloc/new to make those "uncollectable" objects collectable instead.)

> I still don't think that the gc_cleanup class is very useful: If memory
> management is so complicated that I want to plug-in a garbage collector,
it
> is difficult to make sure that there are no cyclic references.

I don't think it's very difficult, but it does require a different
programming style. The simplest discipline is probably to not have any
publically visible classes inherit from gc_cleanup. Introduce helper
classes that hold only the necessary data inherit form gc_cleanup. If you
need to deallocate an uncollectable string, introduce a wrapper class for
the string. Only the wrapper class inherits gc_cleanup. Since the wrapper
class does not point to collectable objects, there is no issue of cycles.
(If the cycle involves both collectable and uncollectable objects, I claim
you are unavoidably sunk no matter what you do.)

> The request to use garbage collection as a "plug-in" for some classes only
> seems perfectly sensible to me in the C++-context. When memory management
is
> difficult, use garbage collection, otherwise use explicit deallocation and
> save some time.

I think this is much less attractive in practice than it first seems, though
it's not always avoidable. As I mentioned earlier, you either have to scan
uncollectable objects as well, or you have to register traceable pointers in
uncollectable objects, as with the Kempf collector. I suspect you can do
this more cheaply than Kempf does, but neither alternative seems that
attractive. Cycles between collectable and uncollectable objects are
unavoidably leaked.

In my experience, the performance tradeoff is also more complicated than you
imply. (Please don't use the Kempf collector to measure this; it does some
things in very expensive ways. For example, it looks to me like the mark
phase takes time that can be quadratic in the size of the heap.) For very
small objects (8-16bytes), garbage collectors can often outperform
new/delete. For very large objects new/delete can be a substantial win.

>
> What do you think of a templated approach like the one used by Kempf and
> others? You loose a bit of elegance, because you have to use specialized
> pointers ("GC_Ptr<CollectMe>") and new/delete operators. But you are able
to
> track all the pointer operations this way and facilitate the work of the
> garbage collector (true?).

It's tricky to do this right, and I've generally found that it leaves rough
spots around the edges. It's also often hard to do this efficiently (e.g.
creating a gc_ptr in the Kempf collector seems to involve adding an element
to a potentially huge STL set) and in a thread-safe manner. (The Kempf
collector claims to be thread-safe for win32, but it seems to me that
pointer assignments occurring concurrently with a collection may cause
objects to be reclaimed prematurely.) At this point, it probably still
makes sense if you need 100% portable C++ code. I used it with reference
counting in the SGI STL rope implementation for that reason.

Hans-J. Boehm

unread,

Apr 19, 2002, 10:09:30 PM4/19/02

to

pier...@hotmail.com (Pierre Baillargeon) wrote in message news:<6df0c6a8.02041...@posting.google.com>...

> Another way would be to keep track of the allocation order in the
> gc_cleanup base class and destroy the objects in reverse order when
> they form cycle. Since the first object existed prior to the other, it
> is reasonable to assume that it can live without the other.
> Documenting this behavior does the rest of the job for odd designs.

That's an interesting idea. I'm not sure how usable it would be in
practice, since creation order seems to be less significant here than
with stack
allocated objects. Consider

local_ptr = link_in_cycle(get_new_objA(), get_new_objB());

where get_new_objX() returns a newly allocated collectable object.
IIRC, it's not defined which object is allocated first.

In general, I've encountered lots of people who believe that
topologically ordered finalization is a bad idea. I've also
encountered a reasonable number who have tried it in a language like
Modula-3 or Cedar. But I'm not sure I've met anyone who is in the
intersection of those groups. Thus I'm not sure this is a real issue.

>
> To keep track of order a simple counter can be used. It does not even
> need to be thread- safe since objects created at "the same time" in
> different threads don't have well-defined ordering anyway.

I think this is a minor issue, but it's not safe to do this with an
unsynchronized global counter. Thread A may be going along
incrementing the counter on a frequent basis. Thread B may read the
counter and be suspended. Some time later, after many increments from
thread A, it will write back one plus the old counter value. Thus
thread A will see the counter jumping back.

In reality, this can be avoided, since allocation will either need a
lock or access to thread local storage in any case. Thus the overhead
of protecting the counter is negligible. I mention it only because
it's an error I've seen repeatedly in other places.

>
> For total safety, a special gc_cleanup_cycle base class could be
> provided which has a virtual function takinga pointer to another
> gc_cleanup_cycle object and returns true the provided object must be
> destroyed first, false if second. Then a standard sort algorithm can
> be run on cycles to determine destruction order.

I think this is kind of hard to provide, since the order is a global
property that in general violates abstraction boundaries.

Hans

Hans_Boehm<at>hp<dot>com

Gene Michael Stover

unread,

Apr 21, 2002, 11:04:24 AM4/21/02

to

An informal article I wrote for some friends.
Doesn't say anything that hasn't been said in this thread
already.

http://gms.freeshell.org/essays/garbage-collection-cpp/

gene

Gene Michael Stover

unread,

Apr 21, 2002, 11:14:55 AM4/21/02

to

James Kanze wrote:
> "Martin Wartens" <mwar...@hotmail.com> writes:
>
> |> There is one more feature I forgot to mention, because it seemed so natural
> |> to me as a C++-programmer:
> |> => The garbage collector should call the destructor of the object when
> |> removing it.
>
> I disagree. If the object needs a destructor, then it probably needs it
> to be called at a deterministic time. The garbage collector cannot do
> this, so such objects shouldn't be garbage collected.
>

Hi all, Sorry to but-in late.

Very true about a destructor that _needs_ to be called being
called at deterministic time.

In the general case, no garbage collector can guarrantee
that all destructors will be called in the order you want
them called. That's 'cause of cyclic references. Either a
garbage collector can guarantee that they are called in a
proper order (unused objects first) but that some won't be
called, or it can guarrantee that all destructors are called
but not always in a deterministic order. (I wrote an
article about this for some friends recently. I'll post it
on my web page & put the URL here.)

The destructor problem isn't really a problem, though. (I
wrote an article about it just yesterday. Intended for
print publication, but I don't want to wait months for that
if there's interest right now, so I'll think about
publishing it on my web page. (Is there any interest?))

Anyway, if a particular destructor must be called (like if
you are using the ``object creation is resource allocation''
idiom), then you can usually put the object on the stack
(where garbage collection isn't involved at all), or you can
delete it explicitly, in which case the destructor will be
called, converting the object to raw memory, & the garbage
collector can collect the raw memory in its own time.

This also relates to Marten's requirement that he be able to
determine which objects are under GC's control. If you can
code an explicit delete, then the object isn't quite under
GC control, but the memory it occupies is.
For what it's worth, I finished a new release of Giggle
(haven't put it on Source Forge yet). It's pretty complete
& allowed me to rum some performance & ease-of-use tests.
My personal conclusion: Use Boehm. Boehm rocks. But I used
a simple C++ wrapper around it (which is partly what I
article I wrote yesterday is about).

Thanks for reading.
TTFN.
A very interesting thread of discussion.
gene

Pierre Baillargeon

unread,

Apr 21, 2002, 11:18:56 AM4/21/02

to

hans_...@hp.com (Hans-J. Boehm) wrote in message news:<1ce90e9f.0204...@posting.google.com>...

>
> That's an interesting idea. I'm not sure how usable it would be in
> practice, since creation order seems to be less significant here than
> with stack
> allocated objects. Consider
>
> local_ptr = link_in_cycle(get_new_objA(), get_new_objB());

You are right that allocation order may be completely indifferent to
destruction order. Thinking back, I have example of this in my own
code where I allocate a pool of objects which may be linked in
arbitrary order. So this is a no-go.

> > For total safety, a special gc_cleanup_cycle base class could be
> > provided which has a virtual function takinga pointer to another
> > gc_cleanup_cycle object and returns true the provided object must be
> > destroyed first, false if second. Then a standard sort algorithm can
> > be run on cycles to determine destruction order.
>
> I think this is kind of hard to provide, since the order is a global
> property that in general violates abstraction boundaries.

I find it hard to believe that, if an object of type A has a pointer
to another of type B, it cannot know if it needs it in its destructor.
Contrary to your claim, I believe the order is a very local property
that is not even constant: sometimes an object of type A can be
destroyed before an object of type B, sometimes after. For example, a
stack of filters that can be put in arbitrary order and in which the
last points to the top for some reason.

Was it clear that the virtual function is implemented in each type
derived from gc_cleanup?

Larry Evans

unread,

Apr 21, 2002, 5:30:13 PM4/21/02

to

"Hans-J. Boehm" wrote:

> pier...@hotmail.com (Pierre Baillargeon) wrote in message news:<6df0c6a8.02041...@posting.google.com>...
> > Another way would be to keep track of the allocation order in the
> > gc_cleanup base class and destroy the objects in reverse order when
> > they form cycle. Since the first object existed prior to the other, it
> > is reasonable to assume that it can live without the other.
> > Documenting this behavior does the rest of the job for odd designs.
>
> That's an interesting idea. I'm not sure how usable it would be in
> practice, since creation order seems to be less significant here than
> with stack
> allocated objects. Consider
>
> local_ptr = link_in_cycle(get_new_objA(), get_new_objB());
>
> where get_new_objX() returns a newly allocated collectable object.
> IIRC, it's not defined which object is allocated first.
>
> In general, I've encountered lots of people who believe that
> topologically ordered finalization is a bad idea. I've also
> encountered a reasonable number who have tried it in a language like
> Modula-3 or Cedar. But I'm not sure I've met anyone who is in the
> intersection of those groups. Thus I'm not sure this is a real issue.
>

[snip]

> > For total safety, a special gc_cleanup_cycle base class could be
> > provided which has a virtual function takinga pointer to another
> > gc_cleanup_cycle object and returns true the provided object must be
> > destroyed first, false if second. Then a standard sort algorithm can
> > be run on cycles to determine destruction order.
> I think this is kind of hard to provide, since the order is a global
> property that in general violates abstraction boundaries.
>

What about using non-intrusive smart-pointers, as in boost's shared_ptr.
Then modify the gc method with Christopher's "global mark-scan" or
Lins's "local mark-scan"(p.62 of Jones' _Garbage Collection_ book).
The order wouldn't matter as long as the destructors check whether the
non-intrusive count( which contains the actual pointer to the object,
as done by boost's shared_ptr) has been nullified (i.e. the pointer set to 0).

OTOH, if the a destructor really needs the actual pointed-to object, then
this could fail, but I'm wondering if that's very common.

Allan W

unread,

Apr 23, 2002, 8:56:15 PM4/23/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote

> I tried the C++-"Class inheritance based interface" for the Boehm collector
> (http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcinterface.html). Classes
> that want garbage collection must inherit from a base class "gc". If you
> want the collector to call the destructor of the object, you have to inherit
> from "gc_cleanup". For some reason, the "gc_cleanup" variant doesn't work
> with cyclic references and is therefore practically useless.

I have no experience with the Boehm collecter, but something about that
phrase jumped out at me. Do you really have cyclic dependancies so often
that this restriction makes GC "practically useless?"

I've been programming for a quarter-century. Obviously not all of that
used object-oriented techniques. Other than "callback" mechanisms, I
can think of ONE project where I gave a dependant class a pointer to
it's owner -- and if I had it to do over again, I'd do it differently.

Perhaps you're trying to push GC into solving some problem it wasn't
really designed for?

Hans-J. Boehm

unread,

Apr 24, 2002, 5:26:14 AM4/24/02

to

Larry Evans <jcamp...@prodigy.net> wrote in message

> What about using non-intrusive smart-pointers, as in boost's shared_ptr.
> Then modify the gc method with Christopher's "global mark-scan" or
> Lins's "local mark-scan"(p.62 of Jones' _Garbage Collection_ book).
> The order wouldn't matter as long as the destructors check whether the
> non-intrusive count( which contains the actual pointer to the object,
> as done by boost's shared_ptr) has been nullified (i.e. the pointer set to 0).

I'm not sure how that solves any of the problems here. It turns out
to be relatively easy to implement any of the standard proposals for
finalization ordering, including topological ordering (ours,
Modula-3), none (Java), or some priority-based scheme (which you
probably don't really want). The trick seems to be agreeing on the
interface, not the implementation.

Boost style reference counted pointers do raise another issue, in that
destructors can (in the acyclic case, at least) be invoked
synchronously, i.e. exactly at the point at which an objecty becomes
inaccessible. That has sometimes been advocated as an advantage, but
I believe it's actually a bug,
in that it prevents required locking in destructors. See
http://www.hpl.hp.com/personal/Hans_Boehm/gc/det_destr.html for
details. (As far as I can determine from the source I found, Boost
shared_ptrs are not thread-safe? Thus the deadlock scenario described
there can't occur, and the issues become more obscure, but I don't
think they completely disappear.)

>
> OTOH, if the a destructor really needs the actual pointed-to object, then
> this could fail, but I'm wondering if that's very common.

My experience is that it's perhaps less common than the case in which
you just deallocate a non-memory system resource, but it definitely
does happen. If it never did, the Java ordering model would be
preferable. But this case is also trivially handled in the
topologically-ordered case by just wrapping the system resource.

Hans
(Hans_Boehm<at>hp<dot>com)

Joshua Lehrer

unread,

Apr 24, 2002, 3:32:16 PM4/24/02

to

Hans_...@hp.com (Hans-J. Boehm) wrote in message

news:<1178a29f.02042...@posting.google.com>...

> details. (As far as I can determine from the source I found, Boost
> shared_ptrs are not thread-safe? Thus the deadlock scenario described

> there can't occur, and the issues become more obscure, but I don't
> think they completely disappear.)

This is not true. The boost::shared_ptr ultimately uses
boost::detail::atomic_count, which is then platform dependent. They
supply atomic counts for win32, linux, and pthreads. We wrote our own
that uses VMS's native atomic_increment and atomic_decrement functions,
and it was merely a matter of changing a typedef to get boost to use our
VMS class rather than any of the ones they supply.

joshua lehrer
factset research systems

Martin Wartens

unread,

Apr 25, 2002, 9:07:39 AM4/25/02

to

> I have no experience with the Boehm collecter, but something about that
> phrase jumped out at me. Do you really have cyclic dependancies so often
> that this restriction makes GC "practically useless?"

When you are writing a library you have no choice than to take care of all
eventualities. I have to manage a large graph-like structure consisting of
many small objects. The inter-object connections are defined by the user. In
this case, it's hard to avoid cyclic references by design.
The other way round, if it is easy to avoid cyclic references by design, it
seems also more likely that the memory management is easy and can be done by
explicit deallocations. [A simple cyclic reference example: a double linked
list.]
Martin Wartens

Martin Wartens

unread,

Apr 25, 2002, 9:11:26 AM4/25/02

to

I used a cheap trick to use the Boehm collector with my application. In the
gc_cpp.h header file, I replaced all occurences of
"GC_register_finalizer_ignore_self" with
"GC_register_finalizer_no_order".
This version of the finalizer is said to "ignore all cycles". Under Windows
(MS Visual Studio), the trick worked immediately. Under Linux I found that I
had to compile the collector with "DREDIRECT_MALLOC=GC_malloc". I am not
quite sure what it does, but otherwise the collector seems to loose the
track of some pointers, especially of pointers in stl containers.
I noticed that some objects are never deleted, even if I force finalization
with "GC_invoke_finalizers();" at the end of the program. Maybe this is due
to "false pointers"?

> For very small objects (8-16bytes), garbage collectors can often
outperform
> new/delete.

This is also true for my application. I am managing a large number (>10000)
of small (<1000 bytes) short-lived objects.

Martin Wartens

Hans-J. Boehm

unread,

Apr 26, 2002, 3:43:18 AM4/26/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message news:<aa8s9j$589$1...@news.cs.tu-berlin.de>...

> Under Linux I found that I
> had to compile the collector with "DREDIRECT_MALLOC=GC_malloc". I am not
> quite sure what it does, but otherwise the collector seems to loose the
> track of some pointers, especially of pointers in stl containers.
> I noticed that some objects are never deleted, even if I force finalization
> with "GC_invoke_finalizers();" at the end of the program. Maybe this is due
> to "false pointers"?

This most likely has a different cause. The gcc/SGI default STL
allocator obtains large chunks from malloc, and then divides them up.
If you redirect malloc to GC_malloc, the garbage collector will scan
and collect those large chunks, but it can at best collect one if the
whole chunk is inaccessible. This isn't what you want.

The correct solution is to use a GC-aware allocator, such as one of
those in the latest GC distribution, instead of redirecting malloc.
But that requires modification of the client code. Alternatively,
most (all) versions of the SGI/gcc standard allocator can be convinced
to call malloc directly and not bother with the chunks. In the
version I have here, you need to define __USE_MALLOC. That should
work in combination with malloc redirection. Of course that's hardly
portable.

Under at least some versions of Windows, the collector happens to scan
the malloc heap (mostly because it can be hard not to), so this mostly
works without malloc redirection. But the GC-aware allocator would
still be a far cleaner solution.

Hans Boehm

(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Hans Aberg

unread,

Apr 27, 2002, 3:10:26 PM4/27/02

to

In article <1178a29f.02042...@posting.google.com>,

Hans_...@hp.com (Hans-J. Boehm) wrote:
>The correct solution is to use a GC-aware allocator, such as one of
>those in the latest GC distribution, instead of redirecting malloc.
>But that requires modification of the client code.

Here is an abstract definition of a GC ("object" is short for "runtime object"):

A GC is a object to which other objects may be registered, and further, to
the registered objects, user references may be registered. In a
conservative GC, if a registered object has a user registered reference,
its object is considered alive; if it has no such user registered
reference, it is considered dead, and may at some point be removed by the
GC (depending the nature of the GC).

In a conservative GC, one way to find the user registered references to
the GC is to track from the root system.

I arrived at this definition, in part from a runtime model I am
developing, and in part in order to be able to handle the seemingly
endless arguments about GC: For example, what if some objects are put in a
file and the file is closed, can the GC then see them? The answer is, it
depends on the GC, whether the object still has a user registered
reference or not.

This way one should be able to handle the use of multiple GC's: One needs
to make sure that the same object is not registered to several different
GC's in a way that the cleanup operations collide.

Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove...@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>

Hans-J. Boehm

unread,

Apr 28, 2002, 11:48:36 AM4/28/02

to

remove...@matematik.su.se (Hans Aberg) wrote in message news:<remove.haberg-2...@du133-226.ppp.su-anst.tninet.se>...

> Here is an abstract definition of a GC ("object" is short for "runtime object"):
>
> A GC is a object to which other objects may be registered, and further, to
> the registered objects, user references may be registered. In a
> conservative GC, if a registered object has a user registered reference,
> its object is considered alive; if it has no such user registered
> reference, it is considered dead, and may at some point be removed by the
> GC (depending the nature of the GC).
>
> In a conservative GC, one way to find the user registered references to
> the GC is to track from the root system.
>

> ...

>
> This way one should be able to handle the use of multiple GC's: One needs
> to make sure that the same object is not registered to several different
> GC's in a way that the cleanup operations collide.
>

There are unfortunately some practical issues with turning this
definition into a GC interface. These are avoided if you treat the
collector more like a special allocator that doesn't require
deallocation:

- It is very hard to collect cycles involving objects registered with
different GCs. Doing so seems to require a fairly elaborate protocol
for cooperation among GCs. I have yet to be convinced that supporting
multiple GCs in the same process is worth the complexity and other
cost.

- The performance advantage of a GC for small objects comes from the
fact that allocation, deallocation and collection cooperate closely,
and thus allocation and deallocation potentially become more efficient
than with manual memory management. For example, it becomes easy to
avoid the need for one lock acquisition per deallocation in a
multithreaded environment. Bookkeeping data structures can be shared
between the allocator and collector. Registering objects with the GC
after the fact and layering it on a conventional allocator removes
this advantage.

- (perhaps solvable) For performance reasons, you really want the
garbage collector to at least know which objects have destructors that
must be invoked, and which don't. Destructor invocation from the
collector is not cheap. And normally the vast majority of destructors
in a garbage collected system will do nothing.

Hans

(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Hans-J. Boehm

unread,

Apr 28, 2002, 11:51:21 AM4/28/02

to

usene...@lehrerfamily.com (Joshua Lehrer) wrote in message news:<31c49f0d.02042...@posting.google.com>...
[I had questioned the thread-safety of Boost shared_ptr.]

> This is not true. The boost::shared_ptr ultimately uses
> boost::detail::atomic_count, which is then platform dependent.

Thanks for the correction. I apparently confused this with the
version used for the timing measurements
(http://www.boost.org/libs/smart_ptr/smarttests.htm). That still looks
to me like it is purely single-threaded, and thus excludes any
synchronization overhead.

Hans
(Nans_Boehm<at>hp<dot>com)

Hans Aberg

unread,

Apr 29, 2002, 9:34:18 AM4/29/02

to

In article <1178a29f.02042...@posting.google.com>,
Hans_...@hp.com (Hans-J. Boehm) wrote:

>> This way one should be able to handle the use of multiple GC's: One needs
>> to make sure that the same object is not registered to several different
>> GC's in a way that the cleanup operations collide.
>>
>There are unfortunately some practical issues with turning this
>definition into a GC interface. These are avoided if you treat the
>collector more like a special allocator that doesn't require
>deallocation:
>
>- It is very hard to collect cycles involving objects registered with
>different GCs. Doing so seems to require a fairly elaborate protocol
>for cooperation among GCs. I have yet to be convinced that supporting
>multiple GCs in the same process is worth the complexity and other
>cost.

The thing is that under C++, one may want to make use of different GC's in
order to optimize for different types of memory management -- just as one
may want make use of different allocator/deallocator pairs. If say a
library is optimized for GC A, and your program for GC B, how should that
problem be resolved. (Practical example: Your program is calling the
Haskell interpreter Hugs kernel, which implements its own GC. Your program
is using a different memory management technique.)

Are you sure that these problems are not as the result of how you perceive
multiple GC's to be implemented? -- I gather that such GC's must use
separate memory block, for instance.

Problem would then be to efficiently search the root system and trace for
objects belonging to a particular GC. Then perhaps on might need different
stacks for keeping tracks of the different root systems. -- It might work
like a super-program calling different sub-programs, each with its own
(single) GC.

Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove...@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Martin Wartens

unread,

Apr 29, 2002, 5:15:33 PM4/29/02

to

> This most likely has a different cause. The gcc/SGI default STL
> allocator obtains large chunks from malloc, and then divides them up.
> If you redirect malloc to GC_malloc, the garbage collector will scan
> and collect those large chunks, but it can at best collect one if the
> whole chunk is inaccessible. This isn't what you want.

You are right. I found that this version uses lots of memory.

> The correct solution is to use a GC-aware allocator, such as one of
> those in the latest GC distribution, instead of redirecting malloc.
> But that requires modification of the client code.

It took some time until figured out that I had to use "traceable_alloc"
(gc6.0) or "traceable_allocator<T>" (gc6.1a4). I believe that
traceable_alloc is for pointers to collectable objects or for uncollectable
objects that contain pointers to collectable objects:
list<CollectMe*, traceable_alloc> or
list<ContainsCollectMePtr, traceable_alloc>
And "gc_alloc" or "gc_allocator<T>" are for containers of collectable
objects:
list<CollectMe, gc_alloc>
Did I get that right?

> Alternatively,
> most (all) versions of the SGI/gcc standard allocator can be convinced
> to call malloc directly and not bother with the chunks. In the
> version I have here, you need to define __USE_MALLOC. That should
> work in combination with malloc redirection. Of course that's hardly
> portable.

Instead of defining __USE_MALLOC it is also possible to specify
std::malloc_alloc as a container's allocator. This works.

> Under at least some versions of Windows, the collector happens to scan
> the malloc heap (mostly because it can be hard not to), so this mostly
> works without malloc redirection.

Extra info: I am using Windows XP.

Martin Wartens

Hans-J. Boehm

unread,

Apr 30, 2002, 2:54:00 AM4/30/02

to

remove...@matematik.su.se (Hans Aberg) wrote in message news:<remove.haberg-2...@du134-226.ppp.su-anst.tninet.se>...
> In article <1178a29f.02042...@posting.google.com>,
> Hans_...@hp.com (Hans-J. Boehm) wrote: ...

> >- It is very hard to collect cycles involving objects registered with
> >different GCs. Doing so seems to require a fairly elaborate protocol
> >for cooperation among GCs. I have yet to be convinced that supporting
> >multiple GCs in the same process is worth the complexity and other
> >cost.
>
> The thing is that under C++, one may want to make use of different GC's in
> order to optimize for different types of memory management -- just as one
> may want make use of different allocator/deallocator pairs. If say a
> library is optimized for GC A, and your program for GC B, how should that
> problem be resolved. (Practical example: Your program is calling the
> Haskell interpreter Hugs kernel, which implements its own GC. Your program
> is using a different memory management technique.)
>
> Are you sure that these problems are not as the result of how you perceive
> multiple GC's to be implemented? -- I gather that such GC's must use
> separate memory block, for instance.
>

Let me make this a little more concrete: we have object A managed by
GCa and object B managed by GCb. A points to B and B points to A, and
neither are referenced from any other place. Thus both should be
collected.

Let's assume both of the collectors are "smart pointer" based, to keep
the discussion as simple as possible. The naive way to handle this
would be to ensure that

1) Pointers to objects collected by GCa but residing outside the GCa
heap have some "smart pointer" type SPa. Similarly we use SPb for
GCb.

2) SPa pointers are viewed as roots by GCa. SPb pointers are viewed
as roots by GCb.

3) When an object is destroyed (finalized really) by a collector, the
embedded smart pointers to other heaps are destroyed, and thus no
longer treated as roots.

In this scheme, clearly neither A nor B will be collected, since they
are each referenced by a "root" in the other heap.

I don't know of a way to fix this other than by having GCa and GCb
cooperate. Somehow both heaps must be traced before we can really
identify garbage in either. Thus we would need a standardized
"interGC" protocol, which the Haskell collector probably wouldn't
immediately obey.

Along these lines, one could not treat these cross-pointers as roots,
but instead ask the other collector to trace through its heap and
notify us of, for example, GCa objects that are reachable through the
GCb heap by starting at a certain GCb object. IIRC, CMM
(http://www.di.unipi.it/~attardi/cmm.html) takes this approach. If
you still let them collect somewhat independently, it's difficult to
ensure that the total collection time for the entire heap is still
roughly linear in the heap size. Without that property, you risk
making the garbage collection process much slower than if you had
stuck to any of the component GCs individually. (I don't remember
exactly how CMM handles this. Perhaps a CMM expert could add
something.)

Once you need enough cooperation between collectors, and they are
sufficiently tightly coupled, it seems easier to view it as just one
tunable collector. Even supporting multiple "pluggable" collectors is
much easier than allowing them to be active at the same time.

Things become much more manageable if the pointers between heaps all
go in the same direction, so that cycles can't occur. But in general
that's a hard property to maintain.

Hans
(Hans_Boehm<at>hp<dot>com)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Hans-J. Boehm

unread,

Apr 30, 2002, 7:15:38 PM4/30/02

to

"Martin Wartens" <mwar...@hotmail.com> wrote in message news:<aajp4r$r4$1...@news.cs.tu-berlin.de>...
[I wrote:]

> > The correct solution is to use a GC-aware allocator, such as one of
> > those in the latest GC distribution, instead of redirecting malloc.
> > But that requires modification of the client code.
>
> It took some time until figured out that I had to use "traceable_alloc"
> (gc6.0) or "traceable_allocator<T>" (gc6.1a4). I believe that
> traceable_alloc is for pointers to collectable objects or for uncollectable
> objects that contain pointers to collectable objects:
> list<CollectMe*, traceable_alloc> or
> list<ContainsCollectMePtr, traceable_alloc>
> And "gc_alloc" or "gc_allocator<T>" are for containers of collectable
> objects:
> list<CollectMe, gc_alloc>
> Did I get that right?

Yes. The "traceable" allocators allocate memory that must be
explicitly deallocated, but that is traced by the collector.
Effectively anything inside a traceable object is viewed as a root by
the collector. The "gc" allocators allocate collectable memory. It
is of course also scanned by the collector, but only if that block
itself is still reachable.

>
> > Alternatively,
> > most (all) versions of the SGI/gcc standard allocator can be convinced
> > to call malloc directly and not bother with the chunks. In the
> > version I have here, you need to define __USE_MALLOC. That should
> > work in combination with malloc redirection. Of course that's hardly
> > portable.
>
> Instead of defining __USE_MALLOC it is also possible to specify
> std::malloc_alloc as a container's allocator. This works.

Right. That's a bit more intrusive and slightly cleaner.

>
> > Under at least some versions of Windows, the collector happens to scan
> > the malloc heap (mostly because it can be hard not to), so this mostly
> > works without malloc redirection.
>
> Extra info: I am using Windows XP.

You really shouldn't rely on the malloc heap getting traced, no matter
what the Windows version. I looked at the collector code again, and
it explicitly tries to avoid tracing the system malloc heap unless
REDIRECT_MALLOC is defined. It is quite possible that the test for
whether a memory section is part of the system malloc heap is
imperfect, but I certainly wouldn't rely on that.

Having the collector naively scan the system malloc heap is a bad
idea, if it can be avoided. The problem is that it also contains
objects that have been free'd or deleted, but not yet overwritten. It
may be better to call GC_malloc_atomic_uncollectable rather than the
system malloc, if you want untraced uncollectable objects, so that the
population of the system malloc heap is minimized.

Hans

Larry Evans

unread,

May 1, 2002, 3:45:30 PM5/1/02

to

Martin Wartens wrote:

> > I have no experience with the Boehm collecter, but something about that
> > phrase jumped out at me. Do you really have cyclic dependancies so often
> > that this restriction makes GC "practically useless?"
> When you are writing a library you have no choice than to take care of all
> eventualities. I have to manage a large graph-like structure consisting of
> many small objects. The inter-object connections are defined by the user. In
> this case, it's hard to avoid cyclic references by design.

So, for example, the user specifies something like:
vertex "a" is connected to vertices "{b,c,d}".
Then I'd think you'd have to keep a 1-to-1 map between the vertex names
and their address in order to connect the vertex names "a" to "{b,c,e}".
Couldn't this map be a map of boost vertex name to shared_ptr<vertex>'s, and
each vertex contains a list<weak_ptr<vertex> >. Then, when the map
goes out-of-scope, the weak_ptrs are destructed and no more cycles!

Hans Aberg

unread,

May 1, 2002, 3:49:39 PM5/1/02

to

In article <1178a29f.02042...@posting.google.com>,
Hans_...@hp.com (Hans-J. Boehm) wrote:
>> Are you sure that these problems are not as the result of how you perceive
>> multiple GC's to be implemented? -- I gather that such GC's must use
>> separate memory block, for instance.
>>
>Let me make this a little more concrete: we have object A managed by
>GCa and object B managed by GCb. A points to B and B points to A, and
>neither are referenced from any other place. Thus both should be
>collected.

I see your point. But are you not leaping at conclusions on how the GC's
would collect such objects? -- Take distributed programming for example,
where two different programs manage A and B respectively, using their own
GC's, and where the references are URL's. Then a fully acceptable
behavior, it seems, is that the GC's of your example does not bother
keeping track of the life objects where the references cross the Internet:
In some cases it might be great if outdated URL's could be somehow kept
track of automatically, but that is not currently the typical case.

So I think one has to add as an additional feature of the GC whether it
should keep track of pointers that pass over object belonging to other
GC's.

>Let's assume both of the collectors are "smart pointer" based, to keep
>the discussion as simple as possible. The naive way to handle this
>would be to ensure that
>
>1) Pointers to objects collected by GCa but residing outside the GCa
>heap have some "smart pointer" type SPa. Similarly we use SPb for
>GCb.
>
>2) SPa pointers are viewed as roots by GCa. SPb pointers are viewed
>as roots by GCb.
>
>3) When an object is destroyed (finalized really) by a collector, the
>embedded smart pointers to other heaps are destroyed, and thus no
>longer treated as roots.
>
>In this scheme, clearly neither A nor B will be collected, since they
>are each referenced by a "root" in the other heap.
>
>I don't know of a way to fix this other than by having GCa and GCb
>cooperate. Somehow both heaps must be traced before we can really
>identify garbage in either. Thus we would need a standardized
>"interGC" protocol, which the Haskell collector probably wouldn't
>immediately obey.

Right. This is a picture that I also arrive at: Either the GC's has some
requests that they can communicate with each other, or they are both
registered to a super-GC that can coordinate such things. The former
picture is probably to be preferred in say distributed programming but
probably less efficient than the latter.

>Along these lines, one could not treat these cross-pointers as roots,
>but instead ask the other collector to trace through its heap and
>notify us of, for example, GCa objects that are reachable through the
>GCb heap by starting at a certain GCb object. IIRC, CMM
>(http://www.di.unipi.it/~attardi/cmm.html) takes this approach.

I would have to think more about this, to see if I can come up with some inputs.

>Once you need enough cooperation between collectors, and they are
>sufficiently tightly coupled, it seems easier to view it as just one
>tunable collector. Even supporting multiple "pluggable" collectors is
>much easier than allowing them to be active at the same time.

Right. That is one of the pictures I mentioned above, the one with a
super-GC. My hunch is that this is the most efficient one, but one could
not expect it to work always in a distributed environment.

>Things become much more manageable if the pointers between heaps all
>go in the same direction, so that cycles can't occur. But in general
>that's a hard property to maintain.

There are efficient algorithms for keeping track of cycles in a graph: In
the context of parser-generators (computing FIRST and FOLLOW), one can use
using Tarjan's SCC (strongly connected component) algorithm; see:
http://compilers.iecc.com/comparch/article/01-04-079
http://www1.ics.uci.edu/~eppstein/161/960220.html

So if one resolves the other problems, my hunch is that this should be
possible to solve in an efficient was as well.

Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove...@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Martin Wartens

unread,

May 2, 2002, 10:49:17 AM5/2/02

to

> So, for example, the user specifies something like:
> vertex "a" is connected to vertices "{b,c,d}".
> Then I'd think you'd have to keep a 1-to-1 map between the vertex names
> and their address in order to connect the vertex names "a" to "{b,c,e}".
> Couldn't this map be a map of boost vertex name to shared_ptr<vertex>'s,
and
> each vertex contains a list<weak_ptr<vertex> >. Then, when the map
> goes out-of-scope, the weak_ptrs are destructed and no more cycles!

To be a little bit more specific: The graph changes during program
execution. Vertices can be deleted and pointers can be reassigned. Your
suggestion seems to apply to a static graph only. Additionally, I don't
currently have some global information about all vertices that live in my
system.
Of course I could try to do some global vertex-bookkeeping, e.g. build and
update an adjacency matrix and use it to detect cycles.
Weak pointers can be used to break cycles, provided that you have some way
to detect possible cycles. One way is to start with a well-formed graph and
update it: The strong pointers form an acyclic graph with distinguished root
that contains all the objects. All other arcs are weak pointers. You then
have to walk through your graph on all assignment, new and delete operations
to keep the graph consistent (possibly convert weak pointers to strong
pointers). Both sorts of pointers are equipped with a counter to help the
operations. The algorithm to do the graph updating is quite complicated and
most likely pretty expensive.
The boost shared_ptr / weak_ptr system doesn't do any graph walking and
transformation of weak pointers to shared pointers
[http://www.boost.org/libs/smart_ptr/smart_ptr.htm]. That's okay when you
know what you are doing. You can use weak pointers to break cycles if you
never expect a weak pointer to become a strong pointer.

Martin Wartens