Reconciling Garbage Collection with Deterministic Finalization

29 views
Skip to first unread message

Andrei Alexandrescu (See Website For Email)

unread,
Mar 17, 2006, 11:13:57 PM3/17/06
to
The recent thread "Can GC be beneficial" was quite beneficial :o) - to
me at least. I've reached a number of conclusions that allow me to
better place the conciliation between garbage collection and
deterministic finalization in the language design space - in C++ and in
general.

The following discussion focuses on C++-centric considerations, with
occasional escapes into "the right thing to do if we could break away
with the past.

Basic Tenets, Constraints, and Desiderata
=========================================

Garbage collection is desirable because:

(1) It automates a routine and error-prone task

(2) Reduces client code

(3) Improves type safety

(3) Can improve performance, particularly in multithreaded environments

On the other hand, C++ idioms based on constructors and destructors,
including, but not limited to, scoped resource management, have shown to
be highly useful. The large applicability of such idioms might actually
be the single most important reason for which C++ programmers shy away
from migrating to a garbage-collected C++ environment.

It follows that a set of principled methods that reconcile C++-style
programming based on object lifetime, with garbage collection, would be
highly desirable for fully exploiting garbage collection's advantages
within C++. This article discusses the challenges and to suggests
possible designs to address the challenges.

The constraints include compatibility with existing C++ code and styles
of coding, a preference for type safety at least when it doesn't
adversely incur a performance hit, and the functioning of today's
garbage collection algorithms.

A Causal Design
===============

Claim #1: The lifetime management of objects of a class is a decision of
the class implementer, not of the class user.

In support of this claim we come with the following examples:

a) A class such as complex<double> is oblivious to destruction
timeliness because it does not allocate scarce resource that need timely
release;

b) A class such as string doesn't need to worry about destruction
timeliness within a GC (Garbage Collected) environment;

c) A class such as temporary_file does need to worry about destruction
timeliness because it allocates scarce resources that transcend both the
lifetime of the object (a file handle) and the lifetime of the program
(the file on disk that presumably temporary_file needs to delete after
usage).

In all of these examples, the context in which the objects are used is
largely irrelevant (barring ill-designed types that employ logical
coupling to do entirely different actions depending on their state).
There is, therefore, a strong argument that the implementer of a class
decides entirely what the destruction regime of the class shall be. This
claim will guide design considerations below.

We'll therefore assume a C++ extension that allows a class definition to
include its destruction regime:

// garbage collected
class [collected] Widget {...};
// deterministically destroyed
class [deterministic] Midget {...};

These two possible choices could be naturally complemented by the other
allowed storage classes of a class:

// garbage collected or on stack
class [collected, auto] Widget {...};
// deterministically destroyed, stack, or static storage
class [deterministic, auto, static] Midget {...};

It is illegal, however, that a class specifies both collected and
deterministic regime:

// illegal
class [collected, deterministic] Wrong {...};

Claim #2: Collected types cannot define a destruction-time action.

This proposal makes this claim in wake of negative experience with
Java's finalizers.

Claim #3: Collected types can transitively only embed fields of
collected types (or pointers thereof of any depth), and can only derive
from such types.

If a collected type would have a field of a non-collected type, that
type could not be destroyed (as per Claim #2).

If a collected type would have a field of pointer to a non-collected
type, one of two things happens:

a) A dangling pointer access might occur;

b) The resource is kept alive indeterminately and as such cannot be
destroyed (as per claim #2).

If a collected type would have a field of pointer to pointer to (notice
the double indirection) deterministic type, inevitably that pointer's
destination would have to be somehow accessible to the garbage-collected
object. This implies that at the some place in the points-to chain, a
"jump" must exist from the collected realm to the uncollected realm (be
it automatic, static, or deterministic) that would trigger either
post-destruction access, or would prevent the destructor to be called.

Design fork #1: Weak pointers could be supported. A collected type could
hold fields of type weak pointer to non-collected types. The weak
pointers are tracked and are zeroed automatically during destruction of
the resource that they point to. Further dereference attempts accesses
from the collected realm become hard errors.

Claim #4: Deterministic types must track all pointers to their
respective objects (via a precise mechanism such as reference counting
or reference linking).

If deterministic types did allow untracked pointer copying, then
post-destruction access via dangling pointers might occur. The recent
discussion in the thread "Can GC be beneficial" has shown that it is
undesirable to define post-destruction access, and it's best to leave it
as a hard run-time error.

Design branch #2: For type safety reasons, disallow type-erasing
conversions from/to pointers to deterministic types:

class [deterministic] Widget { ... };
Widget * p = new Widget;
void * p1 = p; // error
p = static_cast<Widget *>(p1); // error, too

Or: For compatibility reasons, allow type-erasing conversion and incur
the risk of dangling pointer access.

Design branch #3: For purpose of having a type that stands in as a
pointer to any deterministic type (a sort of "deterministic void*"), all
deterministic classes could be thought as (or required to) inherit a
class std::deterministic.

Design branch #3.1: std::deterministic may or may not define virtuals,
and as such confines or not all deterministic classes to have virtuals
(and be suitable for dynamic_cast among other things).

Claim #5: When an object of deterministic type is constructed in
automatic or static storage, its destructor will automatically issue a
hard error if there are any outstanding pointers to it (e.g., the
reference count is greater than one).

If that didn't happen, dangling accesses to expired stack variables
might occur:

class [deterministic] Widget { ... };
Widget * p;
int Fun() {
Widget w;
p = &w;
// hard runtime error upon exiting this scope
}

Discussion of the basic design
==============================

The desiderata set up and the constraints of the current C++ language
created a causal chain that narrowly guided the possible design of an
integrated garbage collection + deterministic destruction in C++:

* The class author decides whether the class is deterministic or garbage
collected

* As a natural extension, the class author can decide whether objects of
that type are allowed to sit on the stack or in static storage. (The
regime of automatic and static storage will be discussed below.)

* Depending on whether a type is deterministic versus collected, the
compiler generates different code for copying pointers to the object.
Basically the compiler automates usage of smart pointers, a
widely-followed semiautomatic discipline in C++.

* The heap is conceptually segregated into two realms. You can hold
unrestricted pointers to objects in the garbage-collected realm, but the
garbage-collected realm cannot hold pointers outside of itself.

* The operations allowed on pointers to deterministic objects are
restricted.

Regime of Automatic Storage
===========================

Claim 6: Pointers to either deterministic or collected objects that are
actually stack allocated should not escape the scope in which their
pointee object exists.

This obvious claim prompts a search in look for an efficient solution to
a class of problems. Here is an example:

class [auto, collected] Widget { ... };
void Midgetize(Widget & obj) {
obj.Midgetize();
}
void Foo() {
Widget giantWidget;
Midgetize(giantWidget);
}

To make the example above work, Foo is forced to heap-allocate the
Widget object even though the Midgetize function works on it
transitorily and stack allocation would suffice.

To address this problem a pointer/reference modifier, "auto", can be
defined. Its semantics allow only "downward copying": an
pointer/reference to auto can only be copied to lesser scope, never to
object of larger scope. Examples:

void foo() {
Widget w;
Widget *auto p1 = &w1; // fine, p1 has lesser scope
{
Widget *auto p2 = &w; // fine
p2 = p1; // fine
p1 = p2; // error! Escaping assignment!
}
}

Then the example above can be made modularly typesafe and efficient like
this:

class [auto, collected] Widget { ... };
void Midgetize(Widget &auto obj) {
obj.Midgetize();
}
void Foo() {
Widget giantWidget;
Midgetize(giantWidget); // fine
}

Claim #6: "auto"-modified pointers cannot be initialized or assigned
from heap-allocated deterministic objects.

If "auto"-modified pointers manipulated the reference count, their
efficiency advantage would be lost. If they didn't, a type-unsafe
situation can easily occur.

Does operator delete still exist?
=================================

For collected objects, delete is inoperant, as is for static or
automatic objects. On a heap-allocated deterministic object, delete can
simply check if the reference count is 1, and if so, reassign zero to
the pointer. If the reference count is greater than one, issue a hard
error.

Note that this makes delete entirely secure. There is no way to have a
working program that issues a dangling access after delete has been
invoked.

Regime of Static Storage
========================

Static storage has the peculiarity that it can easily engender
post-destruction access. This is because the order of module
initialization is not defined, and therefore cross-module dependencies
among objects of static duration are problematic.

This article delays discussion of the regime of static storage.
Hopefully with help from the community, a workable solution to the
cross-module initialization would ensue.

Templates
=========

Claim #7: The collection regime of any type must be accessible during
compilation to templated code.

Here's a simple question: is vector<T> deterministic or collected?

If it were collected, it couldn't hold deterministic types (because at
the end of the day vector<T> must embed a T*). If it were deterministic,
collected types couldn't hold vectors of pointers to collected types,
which would be a major and gratuitous restriction.

So the right answer is: vector<T> has the same regime as T.

template <class T, class A>
class [T::collection_regime] vector { // or some other syntax
...
};

The New World: How Does it Look Like?
=====================================

After this design almost happening as a natural consequence of an
initial set of constraints, the natural question arises: how would
programs look like in a C++ with these amenities?

Below are some considerations:

* Pointer arithmetic, unions, and casts must be reconsidered (a source
of unsafety not thoroughly discussed)

* Most types would be [collected]. Only a minority of types, those that
manage non-memory resources, would live in the deterministic realm.

* Efficiency of the system will not degrade compared to today's C++. The
reduced need for reference-counted resources would allow free and fast
pointer copying for many objects; the minority that need care in
lifetime management will stay tracked by the compiler, the way they
likely were manipulated (by hand) anyway.

* Given that the compiler can apply advanced analysis to eliminate
reference count manipulation in many cases, it is likely that the
quality of built-in reference counting would be superior to
manually-implemented reference counting, and on a par with advanced
manual careful manipulation of a mix of raw and smart pointers.

----------------------

Whew! Please send any comments you have to this group. Thanks!


Andrei

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

igazt...@gmail.com

unread,
Mar 18, 2006, 11:33:55 AM3/18/06
to
Hi,

After reading "Can GC be beneficial" thread (that was _huge_ ;-)) and
reading your proposal, I would like to make some comments, from the
point of view of a person that doesn't use garbage collection, and
works for embedded systems and desktop applications (I can't talk about
supercomputers or other multi-core/multi-processor architectures).

> Basic Tenets, Constraints, and Desiderata
> =========================================
>
> Garbage collection is desirable because:
>
> (1) It automates a routine and error-prone task

Ok. Although you can achieve this with smart pointers, I agree with
you.

> (2) Reduces client code

Well, when comparing to raw pointer management, yes. When comparing to
smart pointers, not a big difference.

> (3) Improves type safety

I can't talk about this. In the "Can GC be beneficial" thread this was
thoroughly discussed but I'm particularly happy with C++ type system.

> (3) Can improve performance, particularly in multithreaded environments

I don't eat this one. "Can" is a very vague statement. In some cases,
in general, or just in multithreaded? I really think that if you have
the control of the memory management, you can be faster than the
garbage collector. Of course, you can be slower, but in general this is
not the rule. I can't think of a language with garbage collection that
is faster than C++. And I think that's because of the allocation
control. I really think garbage collection is a performance penalty.

Apart from speed, what about size? How much memory do the garbage
collected languages need? It's sad to see the huge memory a simple
garbage collected program in C#/Java needs.

Reading the rest of your design, I must admit that I don't know enough
to make a thorough review of your proposal, but what I see is that with
your proposal C++ becomes another language. And I don't think that
memory management issue is enough to redefine entirely the language:

-> You tie a class with the allocation mechanism:

class [collected] Widget {...};

Currently, in C++, a class is independent from the memory type. That
allows advanced C++ uses like shared memory or memory mapped files. I
can build a class in heap, in shared memory, in the stack or in a file,
or a device if I map that device in memory.

-> You break the uniformity of resource management

You claim that memory must be treated differently than other resources
but what really makes C++ great is that you can treat all resources
with the same mechanism (RAII, for example).

> * The class author decides whether the class is deterministic or garbage collected

But does not the way C++ works. The user is the one that controls
lifetime. You might need to have deterministic destruction for one
class in an application and you maybe you don't care about it in
another application. The determinism/garbage collection is a feature
dependent on your application, not on the class itself. Do you know
beforehand how and when a class will be used?

> The New World: How Does it Look Like?
> =====================================
>
> After this design almost happening as a natural consequence of an
> initial set of constraints, the natural question arises: how would
> programs look like in a C++ with these amenities?
>
> Below are some considerations:
>
> * Pointer arithmetic, unions, and casts must be reconsidered (a source
> of unsafety not thoroughly discussed)

So we will break C++. So many discussions on the standard committee
with every feature that can produce little code breaks and now because
some want to support garbage collection we have to totally redefine
C++?

> * Efficiency of the system will not degrade compared to today's C++.


Do you have any fact to state this? We know the performance of current
C++. I don't know a garbage collected language can beat C++ in
performance. Are programs with C++ garbage collection libraries (Boehm,
etc...) faster than programs with manual management?

Most important than being faster, is the fact that we know WHEN is
going to be faster. Or we have to disable garbage collecting explicitly
in a performance critical area, because the garbage collector has can
stop the world to search unreferenced objects?

> The
> reduced need for reference-counted resources would allow free and fast
> pointer copying for many objects; the minority that need care in
> lifetime management will stay tracked by the compiler, the way they
> likely were manipulated (by hand) anyway.

If you need for many reference-counted resources, maybe that's because
your design is not correct. Normally you know when you need your data,
and when it's not needed. The difference is that I can make
reference-counted just some classes but not almost all classes. The
garbage has much more work to do that an atomic increment for each
shared_ptr copy.

> * Given that the compiler can apply advanced analysis to eliminate
> reference count manipulation in many cases, it is likely that the
> quality of built-in reference counting would be superior to
> manually-implemented reference counting, and on a par with advanced
> manual careful manipulation of a mix of raw and smart pointers.

The same analysis it can apply with manual reference counting through
smart pointers. This is a compile-time optimization, so I don't see why
the compiler can't optimize it with smart pointers if member operations
are inlined.

My opinion is that your proposal makes C++ another language. If we can
apply garbage collection to C++ just like we have added move semantics,
with no changes to user code, I wouldn't have objections to have it.
But your proposal converts C++ in a language that is unknown and
strange to me.

All this machinery for memory management? With STL containers,
std::strings, and shared_ptr/unique_ptr, manual memory management is
easy. _Very_ easy. Security problems with C/C++ are mostly related to
static buffer overflows and not with memory management.

I would prefer talking about more serious C++ problems like the lack of
some essential libraries (multi-threading, multi-processing,
networking, XML, web-services) and features, like serialization,
reflection and others, than trying once more to get garbage collection
in C++ without breaking C++. All the proposals I see, including the
"Transparent Garbage Collection for C++" need so many changes and added
complexities that if we consider garbage collection essential, we
should think about declaring C++ deprecated and start another language.

Regards,

Ion

Edward Diener No Spam

unread,
Mar 18, 2006, 11:40:14 AM3/18/06
to
Andrei Alexandrescu (See Website For Email) wrote:
> The recent thread "Can GC be beneficial" was quite beneficial :o) - to
> me at least. I've reached a number of conclusions that allow me to
> better place the conciliation between garbage collection and
> deterministic finalization in the language design space - in C++ and in
> general.
>
> The following discussion focuses on C++-centric considerations, with
> occasional escapes into "the right thing to do if we could break away
> with the past. snipped...

Some comments to your proposal:

1) KISS. The proposal is about GC in C++ and deterministic destruction
of dynamically allocated objects in such a system. Therefore eliminate
the stack and static considerations, and the syntax. I assume your
proposal means that a dynamically allocated deterministic object gets
automatically reference counted, ala smart pointer, and gets deleted
once the final pointer goes out of scope, since if not the whole bit
about deterministic objects is a waste of time. Gee, I wonder who first
suggested that.

2) Manual delete on a collected type pointer should still possible to
force that type to be garbage collected immediately. This is necessary
if very large amounts of memory are being allocated and the programmer
wants to make sure that the memory is reclaimed. Needless to say one
should also be able to force the GC to run at will.

3) Manual delete on a deterministic type should never be allowed.

4) A dynamically allocated template object which takes more than one
type by default is automatically deterministic if any one of its types
is deterministic.

Larry

unread,
Mar 18, 2006, 11:40:35 AM3/18/06
to
On 03/17/2006 10:13 PM, Andrei Alexandrescu (See Website For Email)
wrote:
[snip]

> Discussion of the basic design
> ==============================
[snip]

> * Depending on whether a type is deterministic versus collected, the
> compiler generates different code for copying pointers to the object.
> Basically the compiler automates usage of smart pointers, a
> widely-followed semiautomatic discipline in C++.

Wouldn't this lead to the infamous "smart_ptr cycle" problem. The
problem could be solved by doing a local mark-sweep ( [mart90]
on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibM.html );
however, that has the quadratic time cost mentioned on p. 3 of:

http://www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf

One partial solution proposed in the above .pdf was keeping a
collection of "to-be-locally scanned" pointers and periodically
doing a local mark sweep on this collection. However, I think
that would lead to the same quadratic problem if the collection
size were limited, and if not, it would lead to the delay in
collection which is present with current mark-sweep collectors.

Andrei Alexandrescu (See Website For Email)

unread,
Mar 19, 2006, 4:44:59 PM3/19/06
to
Larry wrote:
> On 03/17/2006 10:13 PM, Andrei Alexandrescu (See Website For Email)
> wrote:
> [snip]
>
>>Discussion of the basic design
>>==============================
>
> [snip]
>
>>* Depending on whether a type is deterministic versus collected, the
>>compiler generates different code for copying pointers to the object.
>>Basically the compiler automates usage of smart pointers, a
>>widely-followed semiautomatic discipline in C++.
>
>
> Wouldn't this lead to the infamous "smart_ptr cycle" problem. The
> problem could be solved by doing a local mark-sweep ( [mart90]
> on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibM.html );
> however, that has the quadratic time cost mentioned on p. 3 of:
>
> http://www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf

It would suffer of the cycle problem. My proposal embeds the assumption
(not stated, I should have) that deterministic types should not exhibit
cyclic dependencies.

I believe this is not a big problem in practice. If it is, weak pointers
must be introduced to manually break cycles.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 19, 2006, 4:44:26 PM3/19/06
to
Edward Diener No Spam wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>
>>The recent thread "Can GC be beneficial" was quite beneficial :o) - to
>>me at least. I've reached a number of conclusions that allow me to
>>better place the conciliation between garbage collection and
>>deterministic finalization in the language design space - in C++ and in
>>general.
>>
>>The following discussion focuses on C++-centric considerations, with
>>occasional escapes into "the right thing to do if we could break away
>>with the past. snipped...
>
>
> Some comments to your proposal:
>
> 1) KISS. The proposal is about GC in C++ and deterministic destruction
> of dynamically allocated objects in such a system. Therefore eliminate
> the stack and static considerations, and the syntax. I assume your
> proposal means that a dynamically allocated deterministic object gets
> automatically reference counted, ala smart pointer, and gets deleted
> once the final pointer goes out of scope, since if not the whole bit
> about deterministic objects is a waste of time. Gee, I wonder who first
> suggested that.

Heh. So yes, your assumption is right. My post merely showed how it's a
necessary fallout of the design constraints.

I do want to keep *auto because it shows how in many cases you can pass
pointers and references down without incurring an unnecessary cost.

> 2) Manual delete on a collected type pointer should still possible to
> force that type to be garbage collected immediately. This is necessary
> if very large amounts of memory are being allocated and the programmer
> wants to make sure that the memory is reclaimed. Needless to say one
> should also be able to force the GC to run at will.
>
> 3) Manual delete on a deterministic type should never be allowed.

Didn't you get them exactly backwards? :o) The way today's GCs are
built, it's not helpful if user code says, "See this one object?
Obliterate it". Besides, it's typ[e-unsafe.

On the other hand, manual delete on a deterministic object is an
operation that can be entirely and cheaply checked.

> 4) A dynamically allocated template object which takes more than one
> type by default is automatically deterministic if any one of its types
> is deterministic.

Yes, I was thinking of something like that about deducing the type of
template class instantiations. Good point.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 19, 2006, 4:41:58 PM3/19/06
to
igazt...@gmail.com wrote:
>>(3) Can improve performance, particularly in multithreaded environments
>
>
> I don't eat this one. "Can" is a very vague statement. In some cases,
> in general, or just in multithreaded? I really think that if you have
> the control of the memory management, you can be faster than the
> garbage collector. Of course, you can be slower, but in general this is
> not the rule. I can't think of a language with garbage collection that
> is faster than C++. And I think that's because of the allocation
> control. I really think garbage collection is a performance penalty.
>
> Apart from speed, what about size? How much memory do the garbage
> collected languages need? It's sad to see the huge memory a simple
> garbage collected program in C#/Java needs.

That's a valid point. GC will eat more memory and can also degrade
performance in some systems. However, as I discussed in Maged's and my
article "Lock-free data structures", in a MT environment the collector
can do a better job than the programmer because it has knowledge of all
threads' memory usage.

> -> You tie a class with the allocation mechanism:
>
> class [collected] Widget {...};
>
> Currently, in C++, a class is independent from the memory type. That
> allows advanced C++ uses like shared memory or memory mapped files. I
> can build a class in heap, in shared memory, in the stack or in a file,
> or a device if I map that device in memory.

You won't lose that; as long as a class is deterministic, you can put it
anywhere you please.

> -> You break the uniformity of resource management
>
> You claim that memory must be treated differently than other resources
> but what really makes C++ great is that you can treat all resources
> with the same mechanism (RAII, for example).

I think this is a misconception. Memory is a resource separate from all
other resources, and freeing it manually is riskier than disposing other
resources. This has been discussed at length.

>>* The class author decides whether the class is deterministic or garbage collected
>
>
> But does not the way C++ works. The user is the one that controls
> lifetime. You might need to have deterministic destruction for one
> class in an application and you maybe you don't care about it in
> another application. The determinism/garbage collection is a feature
> dependent on your application, not on the class itself. Do you know
> beforehand how and when a class will be used?

I've shown with examples that it is sensible that the decision belongs
to the class. The fact that the user controls lifetime is not good just
because "this is the way it is". Resources deciding their regime might
be as well a better way to go.

>>* Pointer arithmetic, unions, and casts must be reconsidered (a source
>>of unsafety not thoroughly discussed)
>
>
> So we will break C++. So many discussions on the standard committee
> with every feature that can produce little code breaks and now because
> some want to support garbage collection we have to totally redefine
> C++?

My preference is in favor of type safety where it doesn't impede
performance. However, for compatibility reasons, unsafe operations can
stay there. It's not like existing C++ must be broken.

>>* Efficiency of the system will not degrade compared to today's C++.
>
> Do you have any fact to state this? We know the performance of current
> C++. I don't know a garbage collected language can beat C++ in
> performance. Are programs with C++ garbage collection libraries (Boehm,
> etc...) faster than programs with manual management?

I do not have facts to back up my claim.

> Most important than being faster, is the fact that we know WHEN is
> going to be faster. Or we have to disable garbage collecting explicitly
> in a performance critical area, because the garbage collector has can
> stop the world to search unreferenced objects?

There are GC's that don't stop the world.

>>The
>>reduced need for reference-counted resources would allow free and fast
>>pointer copying for many objects; the minority that need care in
>>lifetime management will stay tracked by the compiler, the way they
>>likely were manipulated (by hand) anyway.
>
>
> If you need for many reference-counted resources, maybe that's because
> your design is not correct. Normally you know when you need your data,
> and when it's not needed. The difference is that I can make
> reference-counted just some classes but not almost all classes. The
> garbage has much more work to do that an atomic increment for each
> shared_ptr copy.

If that need is scoped, you can always use stack-allocated objects and
*auto pointers. And I think the view that the GC has "much more work" to
do than the increment is entirely fallacious. If you'll copy smart
pointers in a tight loop, you can easily enjoy performance worse than
garbage collection.

>>* Given that the compiler can apply advanced analysis to eliminate
>>reference count manipulation in many cases, it is likely that the
>>quality of built-in reference counting would be superior to
>>manually-implemented reference counting, and on a par with advanced
>>manual careful manipulation of a mix of raw and smart pointers.
>
>
> The same analysis it can apply with manual reference counting through
> smart pointers. This is a compile-time optimization, so I don't see why
> the compiler can't optimize it with smart pointers if member operations
> are inlined.

In theory, that could happen. However, the semantics of user-level code
aren't as clear to the compiler as the operations it generates itself.
It's like reverse engineering versus engineering. I won't discuss this
further though; sufficient advance in compiler technology could imbue
the compiler with what's needed to optimize client code the same way.

> My opinion is that your proposal makes C++ another language. If we can
> apply garbage collection to C++ just like we have added move semantics,
> with no changes to user code, I wouldn't have objections to have it.
> But your proposal converts C++ in a language that is unknown and
> strange to me.

My suggestion is not complete. It does not discuss what the regime of
types defined without [] should be.

> All this machinery for memory management? With STL containers,
> std::strings, and shared_ptr/unique_ptr, manual memory management is
> easy. _Very_ easy.

For some applications, I agree. I wouldn't generalize that for all or
even most applications.


Andrei

Paul Mensonides

unread,
Mar 19, 2006, 4:51:19 PM3/19/06
to
Andrei Alexandrescu (See Website For Email) wrote:
> The recent thread "Can GC be beneficial" was quite beneficial :o) - to
> me at least. I've reached a number of conclusions that allow me to
> better place the conciliation between garbage collection and
> deterministic finalization in the language design space - in C++ and
> in general.

Despite trying to stay out of this debate, I can't avoid it here. The proposal
that you have is trying to do too many things at once. This is about garbage
collection, not type safety--regardless of how garbage collection affects type
safety. IMO, we don't need any specifiers like [collected] or [deterministic].
Those things are redundant abomination in my opinion. All we need is a 'new
(gc) T' variant that will fail to compile if the type has a non-trivial
destructor (and a way to determine if a type has a non-trivial destructor in
generic code). We don't need any addition semantics on delete except to say
that it is undefined behavior to delete gc-allocated memory--which can be
asserted by a checked implementation. We don't need anything on a type that
says that it cannot be allocated on a non-gc-heap either. The only limitations
that are necessary is that if an object is to be allocated on a gc-controlled
heap, it must have a trivial destructor (and this must be statically enforced by
the expression 'new (gc) T').

Regards,
Paul Mensonides

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 5:46:01 AM3/20/06
to
Paul Mensonides wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>
>>The recent thread "Can GC be beneficial" was quite beneficial :o) - to
>>me at least. I've reached a number of conclusions that allow me to
>>better place the conciliation between garbage collection and
>>deterministic finalization in the language design space - in C++ and
>>in general.
>
>
> Despite trying to stay out of this debate, I can't avoid it here. The proposal
> that you have is trying to do too many things at once. This is about garbage
> collection, not type safety--regardless of how garbage collection affects type
> safety. IMO, we don't need any specifiers like [collected] or [deterministic].
> Those things are redundant abomination in my opinion. All we need is a 'new
> (gc) T' variant that will fail to compile if the type has a non-trivial
> destructor (and a way to determine if a type has a non-trivial destructor in
> generic code).

I understand. So you say that classes that define a destructor are
automatically [deterministic] and those that don't, and that don't have
members that do, are automatically [collected].

There are two issues I see. One, as you say, there must be some way for
template classes, say std::vector or std::list to specify whether the
destructor they define is to be "taken into account" or not. Second, my
proposal naturally allowed the programmer to also control the other
storages of objects, thus keeping a number of idioms efficient. If we
give up on that, efficiency of C++ programs can be hurt.

> We don't need any addition semantics on delete except to say
> that it is undefined behavior to delete gc-allocated memory--which can be
> asserted by a checked implementation.

Ehm. Fine, if you like "undefined behavior". I stay away from it like
the pest.

> We don't need anything on a type that
> says that it cannot be allocated on a non-gc-heap either. The only limitations
> that are necessary is that if an object is to be allocated on a gc-controlled
> heap, it must have a trivial destructor (and this must be statically enforced by
> the expression 'new (gc) T').

I'm not sure I understand. If you allocate a normally GC object in the
non-GC realm, you'll need to needlessly manipulate a reference counter
for that object, or define two kind of pointer modifiers (*gc and
*deterministic) that specify what heap a pointer is on. Right?


Andrei

Nemanja Trifunovic

unread,
Mar 20, 2006, 5:42:59 AM3/20/06
to

> Garbage collection is desirable because:
>
> (1) It automates a routine and error-prone task
>
> (2) Reduces client code
>
> (3) Improves type safety
>
> (3) Can improve performance, particularly in multithreaded environments


The single most important feature of some GC systems (compacting GC) is
avoiding memory fragmentation. Unfortuntelly, I am not sure it is
evenpossible to have a compacting GC working with C++.

On the other hand, using a GC also has negative implications - most
notably in the cases where it is important to keep the memory footprint
low; also, for some classes of applications (real-time) non
deterministic GC is simply a no-no.

> Claim #1: The lifetime management of objects of a class is a decision of
> the class implementer, not of the class user.
>

That's one of the reasons I stopped using .NET :) On the contrary, the
decision needs to be made by the class user, since only the class user
knows the context in which the objects are going to be used. Of course,
in many cases, the class implementer would design their class for a
specific scenario, but there are types that are made for generic use.
Take string as an example: in "most" cases it would probably needed to
be non-deterministic. But there are few cases where it is simply not
acceptable, and a user of the class needs to be able to make a choice.


Slightly OT: All these discussions are very interesting of course, but
I don't think it is feasible to make C++ a "GC language". Even if it
happens (with C++ 202x maybe), my bet is that people would avoid to use
it. See what happened to Fortran - every single Fortran programmer I
know uses F77 - even the latest version of that language Fortran 2003
is an object oriented language.

peter koch larsen

unread,
Mar 20, 2006, 8:32:47 AM3/20/06
to
Hi Andrei - and others

I've decided not to quote anything from Andreis post. Quoting his
message did tend to make my post more confusing than it already is.
Sorry about that to all. Moderators: i hope you can accept the post as
is.

My proposal for using garbage collection in relation to C++ would be
very close to the solution that is available to us all today when using
the Boehm garbage collector. I propose that no action can take place on
collection so there is no kind of finalization. Unfortunately the
discussion so far has not touched the drawbacks of this approach, so
perhaps I'm missing something here.
I must also mention that I use garbage collection here as meaning
non-deterministic garbage collection such as as the one you see if you
use a mark-and-sweep algorithm. Smart-pointers are not considered as
garbage-collecting mechanism. This is contrary to the way Andrei has
used the term in the thread "Can garbage collection be beneficial", but
it seems to me that he in this thread uses garbage collection in the
same way as I do here.

First let me give you my basic attitude to garbage collection. The
programs, I've written are of a kind where RAII has had to be used
extensively for resources other than memory, and it has been quite easy
to cope with the lifetime of my objects. I thus do not believe that I
would have had any benefit at all from using GC.
That said, I can imagine application domains where GC would come in
real handy, so I do not see any reason to not have GC built in as an
optional extension. I am not sure that GC should be supported on all
platforms. At least it should be a quality of implementation issue.

The goal for GC in C++ should be:

1) Most current C++-programs should be able to run in a
garbage-collected environment. The exception here is that programs that
hide pointers by e.g. xoring them or writing them to a file should not
work. (These usages prevent the mark-sweep to be a complete replacement
for new/delete by the way).

2) All C++ programs enabling garbage collection should be able to use
current C++ libraries without modifications.


Andreis proposal to declare objects as either collectable or
deterministic is not understood by me - so far as I can tell, there is
no need for this destinction if only we can demand some intelligence
from the compiler - and require a slight extension of the standard C++
definition of a destructor:we keep the current definition of the
"trivial" destrutor but divide the non-trivial destructors into two:
lightweight-destructors and full destructors. Lightweight destructors
are destructors that only reclaim memory, whereas full destructors
might also release other resources. If we require the compiler to be
able to detect the type of the destructor, we basically have Andreis
definition of collectible versus deterministic classes. By also making
the type of the destructor available to the programmer via some
meta-programming facility we should have (almost) all pieces in place:
the std::smart_ptr could then simply be a built-in pointer in those
cases where garbage collection is enabled and the pointed-to type does
not contain resources.

This requires lightweight destructors to be inline. If the definition
is not available, the compiler must assume that the destructor is
heavyweight - erring on the safe side. In practical life, this will not
often be a problem and if it is, one can always inherit the class from
a lightweight class.

One problem could be with pointers to base classes. If we could agree
that

struct base
{
virtual ~base() {}
};

defines a class hirearchy with non-resourceholding classes and

struct base
{
virtual ~base() = 0;
};

base::~base() {};
designates a (potentially) resource-holding hirearchy.
(This is the one area, where incompatibility might occur. A keyword
designating a lightweight destructor could be considered if this is
deemed a big problem).

This approach is so close to being non-intrusive as is possible, and I
believe that current compiler technology will make this approach easy
to implement.

I do not believe that these changes require major changes to todays
compilers. Actually todays compilers would be fully compliant if only
they took the conservative approach and designated all destructors as
"heavyweight". Useful changes - apart from being able to accurately
categorize the destructor type of a given class could be to
- give warnings if classes with heavy destructors inherited publicly
from classes with lightweight destructors.
- supply extra information to heavyweight-classes: you could have a
count available for use in all cases where the class is allocated
dynamically.

This pretty much ends the story. A lot of the type-safety proposals in
Andreis post have not been discussed here. In my opinion, this topic
deserves a thread of its own.

Kind regards
Peter

Edward Diener No Spam

unread,
Mar 20, 2006, 8:36:20 AM3/20/06
to

Part of your objection to GC in C++, to which I totally agree, is that
RAII objects need a smart pointer while non-RAII objects can be dealt
with adequately using GC, thus creating two different syntaxes for
dynamically allocated objects in a theoretical GC C++ system. As classes
change, from RAII to non-RAII or vice versa, programmers must then
change their syntax, which is a ridiculous imposition. In that world
simply using smart pointers and ignoring GC is the natural action, so
why have GC in the first place.

The proposal says that if classes can be tagged as RAII ( deterministic
) or non-RAII ( collected ), the same dynamically allocated syntax can
be used, ie. x = new X(...);. In the case of deterministic objects, the
compiler creates a smart pointer under the covers, where the object gets
deleted when its reference counts goes back to 0, and in the case of
collected objects GC works non-deterministically as usual. I assume also
the semantics of current smart pointers for such deterministic objects,
where when the pointer gets assigned to another variable the reference
count is incremented and where the pointerf foes out of scope it is
decremented.

The rest of the proposal regarding stack objects and static objects can,
and should, be thrown out since neither has anything to do with dynamic
allocation.

Edward Diener No Spam

unread,
Mar 20, 2006, 8:35:33 AM3/20/06
to
Andrei Alexandrescu (See Website For Email) wrote:
> Edward Diener No Spam wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>
>>> The recent thread "Can GC be beneficial" was quite beneficial :o) - to
>>> me at least. I've reached a number of conclusions that allow me to
>>> better place the conciliation between garbage collection and
>>> deterministic finalization in the language design space - in C++ and in
>>> general.
>>>
>>> The following discussion focuses on C++-centric considerations, with
>>> occasional escapes into "the right thing to do if we could break away
>>> with the past. snipped...
>>
>> Some comments to your proposal:
>>
>> 1) KISS. The proposal is about GC in C++ and deterministic destruction
>> of dynamically allocated objects in such a system. Therefore eliminate
>> the stack and static considerations, and the syntax. I assume your
>> proposal means that a dynamically allocated deterministic object gets
>> automatically reference counted, ala smart pointer, and gets deleted
>> once the final pointer goes out of scope, since if not the whole bit
>> about deterministic objects is a waste of time. Gee, I wonder who first
>> suggested that.
>
> Heh. So yes, your assumption is right. My post merely showed how it's a
> necessary fallout of the design constraints.

It's unnecessary to consider stack based or static objects in your
design constraints. Neither has anything to do with GC and dynamic
allocation. If you simplify your argument to deal only with dynamically
allocated objects, it becomes much simpler for C++ with GC to adapt the
idea of tagging a class as either deterministic delete * ( deterministic
) or non-deterministic destroy ( collected ), and presenting that as a
possible proposal for GC in C++. Adding unnecessary design constraints
merely obfuscates the basic idea and will make it much harder for the
idea to be adapted.

>
> I do want to keep *auto because it shows how in many cases you can pass
> pointers and references down without incurring an unnecessary cost.
>
>> 2) Manual delete on a collected type pointer should still possible to
>> force that type to be garbage collected immediately. This is necessary
>> if very large amounts of memory are being allocated and the programmer
>> wants to make sure that the memory is reclaimed. Needless to say one
>> should also be able to force the GC to run at will.
>>
>> 3) Manual delete on a deterministic type should never be allowed.
>
> Didn't you get them exactly backwards? :o)

No, a deterministic type object should be automatically deleted when its
last reference goes out of scope. The programmer should not be allowed
to foul up that system by trying to manually delete such an object.

> The way today's GCs are
> built, it's not helpful if user code says, "See this one object?
> Obliterate it". Besides, it's typ[e-unsafe.

You are right. The programmer may not know what other object currently
references the object he wants to delete. OK, I will settle for the
usual of being able to manually force GC.

>
> On the other hand, manual delete on a deterministic object is an
> operation that can be entirely and cheaply checked.

But the programmer does not know if the last reference to the object is
gone or not, so he should never be allowed to delete it manually. This
is the way Boost smart pointers work and it is correct.

I have assumed that you are saying that when the last reference to a
pointer of a dynamically allocated deterministic object goes out of
scope, meaning the reference count drops down to 0, the object
automatically gets delete called on it, which allows its destructor to
run in order to release its resource. Essentially dynamically allocated
deterministic objects automatically become smart pointers under the
covers. This is the entire crux of my argument that this is the way GC
needs to occur in C++ so that a programmer does not have to manually
switch syntax between using a smart pointer and use GC depending on
whether a class is deterministic or collected. Neither Java, C#, or
Python get this right, so that resource classes in those languages are
headaches to various degrees, but C++ does need to get this right if GC
is added to the language. It is not necessary to duplicate a mistake of
other languages when those languages are blind to it.

Walter Bright

unread,
Mar 20, 2006, 8:42:33 AM3/20/06
to

<igazt...@gmail.com> wrote in message
news:1142696088.6...@i40g2000cwc.googlegroups.com...

> I really think that if you have
> the control of the memory management, you can be faster than the
> garbage collector. Of course, you can be slower, but in general this is
> not the rule.

That certainly is the conventional wisdom, but is it true? For example,
check out the benchmarks in www.digitalmars.com/d/cppstrings.html. GC makes
for a substantial performance increase. It's not because a GC malloc is
faster than malloc, it is because with GC memory management *far fewer
allocations are necessary*.

I just got back from SDWest, where I attended Scott Meyers' excellent
presentation on TR1's shared pointer. Shared pointer is supposed to make
explicit memory management in C++ much more reliable and tractable. But if
we look under the hood of it, we find:

1) shared_ptr makes an extra allocation for each object being pointed to (in
addition to the 8 bytes for the shared pointer itself). Yes, that means TWO
allocations per object. So, for shared_pointer to be faster than GC, it's
got to be at least TWICE as fast per allocation as a GC allocation.

2) For every copy of a pointer, shared_ptr needs to increment and decrement
through two indirections. Note that just passing a shared_ptr to a function
will require this.

3) If "multithreaded" is turned on, then those increments/decrements need to
be locked or synchronized.

4) shared_ptr's are exception safe, which means that copying one means they
have to be entered into the exception tables, etc. Although this overhead
can be moved into static data, it still can inhibit more aggressive program
flow optimizations.

Contrast that with conventional GC, where you just have a pointer and can
copy it around willy-nilly without extra overhead. That's a high bar for
shared_ptr to climb over.

> I can't think of a language with garbage collection that
> is faster than C++. And I think that's because of the allocation
> control. I really think garbage collection is a performance penalty.

D is faster (see the benchmark). Most garbage collected languages *are*
slower overall than C++, often considerably so. But it's a serious mistake
to assume that is because of the gc, since those languages have a lot of
other heavy anchors to drag around (like being interpreted rather than
compiled, or simply lacking needed expressiveness).

> Apart from speed, what about size? How much memory do the garbage
> collected languages need?

For the heap, about 2 to 3 times what C++ needs. For the code, it's
significantly less, though it's hard to say how much less. All those
shared_ptr fiddlings add up.

> It's sad to see the huge memory a simple
> garbage collected program in C#/Java needs.

C# and Java both have to carry around enormous VMs as well as JIT compilers,
all kinds of stuff. That isn't the fault of the GC.

> I don't know a garbage collected language can beat C++ in
> performance.

You do now <g>.

> Most important than being faster, is the fact that we know WHEN is
> going to be faster. Or we have to disable garbage collecting explicitly
> in a performance critical area, because the garbage collector has can
> stop the world to search unreferenced objects?

There are well known methods for dealing with this. Additionally, there are
no upper bounds for how long it takes new() to run in C++, either. C++
programs that require real time performance with no pauses and guaranteed
latency can't use new(). They usually preallocate all they'll need before
entering the latency critical section.

> All the proposals I see, including the
> "Transparent Garbage Collection for C++" need so many changes and added
> complexities that if we consider garbage collection essential, we
> should think about declaring C++ deprecated and start another language.

I agree, which is the genesis for the D programming language <g>.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

igazt...@gmail.com

unread,
Mar 20, 2006, 8:45:53 AM3/20/06
to
> If that need is scoped, you can always use stack-allocated objects and
> *auto pointers. And I think the view that the GC has "much more work" to
> do than the increment is entirely fallacious. If you'll copy smart
> pointers in a tight loop, you can easily enjoy performance worse than
> garbage collection.

But I know copying shared_ptr-s is slow, because I know
(approximately), the cost of a C++ operation. Copying shared_ptr means
an atomic increment. In multithreaded environments this can be
expensive due to memory synchronization. You can always find an example
where GC is faster. But GC has to track all dynamic memory. If all your
pointers are shared_ptr-s maybe GC is faster. But mainly if most of the
pointers of your application are shared_ptr-s AND your pass the always
by value (since shared_ptr are expensive, pass them by reference except
when you need shared ownership). Don't forget move semantics,
shared_ptr is moveable, so we can save a lot of atomic operations.

I can find an example where GC is slower is and you can find the
inverse. But after so many years of research with garbage collection,
apart from some flawed benchamarks (how many "See!, XXX is faster than
C++ in this benchmark, so XXX is faster!" have we seen?), there is no
GC language faster than C++ (the size is another issue where C++ beats
all). We can talk about being difficult to program, complex, etc... but
the it's king of the speed, and with move semantics, it will be even
faster.

Regarding size, we have to remember that with current architectures,
and even more with future multi-core architectures, smaller is faster.
With same memory synchronization operations as GC, smaller means more
data in cache.

I didn't want to sound rude replying your proposal, so please don't use
"fallacious" when I give my opinion. Garbage collection can make life
easier to programmers and prevent some low-level management risks.
Nobody refutes this. But the huge amount of resources needed by some
popular Big Company backed (and I suppose, well designed) applications
written in garbage collected applications is not very convincing.

Only using raw pointers, for example, exception handling would be very
difficult since we would need to deallocate all the resources by hand.
GC would be an impressive help. But once we have containers,
std::strings and smart pointers, that's not the case. The fact is that
C++ is based on manual memory management from the ground. And that's
nearly impossible to change mantaining compatibility.

How will this work if a library L uses garbage collection and
application A wants to use manual management? What is library L returns
a pointer to an object? Do I know if it's garbage collected? Can a
erase it? How will we deal with this? Returning shared_ptr makes clear
that I can't erase it. It would force A to be garbage collected? Will
that break binary compatibility?
Is Managed C++/CLI as fast as C++? Is it faster? Has it a small
footprint?

A lot of questions. But the core is: Is garbage so important in C++0x
to:

-> Do the most breaking change in C++ I can remember?
-> Have a risk (this is only _my_ opinion, based on my limited
experiece) of a serious performance/size impact?

Regards,

Ion

Peter Dimov

unread,
Mar 20, 2006, 9:27:29 AM3/20/06
to
Andrei Alexandrescu (See Website For Email) wrote:
> Paul Mensonides wrote:

> > Despite trying to stay out of this debate, I can't avoid it here. The proposal
> > that you have is trying to do too many things at once. This is about garbage
> > collection, not type safety--regardless of how garbage collection affects type
> > safety. IMO, we don't need any specifiers like [collected] or [deterministic].
> > Those things are redundant abomination in my opinion. All we need is a 'new
> > (gc) T' variant that will fail to compile if the type has a non-trivial
> > destructor (and a way to determine if a type has a non-trivial destructor in
> > generic code).
>
> I understand. So you say that classes that define a destructor are
> automatically [deterministic] and those that don't, and that don't have
> members that do, are automatically [collected].
>
> There are two issues I see. One, as you say, there must be some way for
> template classes, say std::vector or std::list to specify whether the

> destructor they define is to be "taken into account" or not. [...]

Under this model, std::vector doesn't have a trivial destructor (it
allocates memory via its allocator, not via new(gc), so it has to
deallocate it.) One would need to write a separate gc_vector (a
resizeable wrapper over new(gc) T[]) with a trivial destructor for use
with collected classes.

Similarly,

class X
{
private:

A * pa_;

public:

X(): pa_( new A ) {}
~X() { delete pa_; }
}

isn't collected, but

class X2
{
private:

A * pa_;

public:

X2(): pa_( new(gc) A ) {}
}

is.

> > We don't need anything on a type that
> > says that it cannot be allocated on a non-gc-heap either. The only limitations
> > that are necessary is that if an object is to be allocated on a gc-controlled
> > heap, it must have a trivial destructor (and this must be statically enforced by
> > the expression 'new (gc) T').
>
> I'm not sure I understand. If you allocate a normally GC object in the
> non-GC realm, you'll need to needlessly manipulate a reference counter
> for that object, or define two kind of pointer modifiers (*gc and
> *deterministic) that specify what heap a pointer is on. Right?

There are no implicit RC manipulations in Paul's simple model.
Everything stays exactly as is, with the only addition being new(gc).

Paul Mensonides

unread,
Mar 20, 2006, 9:35:51 AM3/20/06
to
Edward Diener No Spam wrote:

> Part of your objection to GC in C++, to which I totally agree, is that

I don't object to having a GC in C++ provided it is done in a C++-
like way.

> RAII objects need a smart pointer while non-RAII objects can be dealt
> with adequately using GC, thus creating two different syntaxes for
> dynamically allocated objects in a theoretical GC C++ system. As
> classes
> change, from RAII to non-RAII or vice versa, programmers must then
> change their syntax, which is a ridiculous imposition.

Actually, only as non-RAII to RAII would programmers need to change
their
syntax. That's a hard pill to swallow, but necessary (IMO), and it
is made
easier if the compiler statically enforces this.

> In that world
> simply using smart pointers and ignoring GC is the natural action, so
> why have GC in the first place.

Because a huge category of classes would never make the change from not
requiring deterministic destruction to requiring it. Some would,
yes, and that
would unfortunately, but necessarily (IMO), require code changes
elsewhere.
However, if the mechanism was defined correctly, the compiler would
statically
catch all of those allocations--which mitigates this problem
significantly. (It
is no longer a soft error that leaks non-memory resources, nor is it
even a
runtime hard error. It would be a static hard error--the best kind
of error.)

> The proposal says that if classes can be tagged as RAII (
> deterministic ) or non-RAII ( collected ), the same dynamically
> allocated syntax can
> be used, ie. x = new X(...);. In the case of deterministic objects,
> the
> compiler creates a smart pointer under the covers,

....which causes just as many massive changes to code (unless you
double the size
of all pointers).

> where the object
> gets
> deleted when its reference counts goes back to 0, and in the case of
> collected objects GC works non-deterministically as usual. I assume
> also
> the semantics of current smart pointers for such deterministic
> objects,
> where when the pointer gets assigned to another variable the reference
> count is incremented and where the pointerf foes out of scope it is
> decremented.

The one thing that we absolutely don't want is for there to be a type
difference
between pointer types. We do not want, for example, X^ and X*. All
that we
want is X*. Nor do we want to deal with all the loopholes and problems
associated with having multiple pointer types--such as what is the
type of
&*ptr? It is better anyway for a gc-allocation to be explicit (as
there are
other implications of using GC beyond just deterministic destruction).

> The rest of the proposal regarding stack objects and static objects
> can,
> and should, be thrown out since neither has anything to do with
> dynamic allocation.

See my other post for a more detailed explanation of my opinion.
Suffice to
say, the difference between gc-allocation and non-gc-allocation
should be
explicit. Memory management of non-gc-allocated memory should be
explicitly
controlled by a smart pointer (there are all kinds of different
options for
that--not just a one-size-fits-all hidden-by-the-implementation scheme).

Regards,
Paul Mensonides

Paul Mensonides

unread,
Mar 20, 2006, 9:49:46 AM3/20/06
to
Andrei Alexandrescu (See Website For Email) wrote:
> Paul Mensonides wrote:

>> redundant abomination in my opinion. All we need is a 'new (gc) T'
>> variant that will fail to compile if the type has a non-trivial
>> destructor (and a way to determine if a type has a non-trivial
>> destructor in generic code).
>
> I understand. So you say that classes that define a destructor are
> automatically [deterministic] and those that don't, and that don't
> have
> members that do, are automatically [collected].

Yes, but I'd get rid of the "deterministic" and "collected" labels.
Instead,
I'd just call a class "collectable" or "non-collectable".

> There are two issues I see. One, as you say, there must be some way
> for
> template classes, say std::vector or std::list to specify whether the
> destructor they define is to be "taken into account" or not.

This is a choice of the template class itself. The template class
can assert
that the stored type is collectable (simply by using 'new (gc) T' or
similar).
It can also define a destructor--which is another way of saying that
instances
of the class are not collectable. Third, it can inherit a destructor
or not
based on the 'has_trivial_dtor' trait of T along with allocation
functions.
More likely, the (e.g.) vector will just contain a storage device for
this
purpose...

template<class T, class allocator> class vector {
public:
vector(...) : p_(0, alloc_) ... {
...
return;
}
private:
allocator alloc_;
storage_device<T, allocator> p_;
};

....that would have a non-trivial destructor if the allocator says
that it should
and would not if the allocator says that it shouldn't. The default
allocator
could determine this based on 'has_trivial_destructor', or a user
could pass in
a different allocator that always allocates on the regular heap. In
either
case, this would cause the vector to pick up a non-trivial destructor
automatically. It also allows for the user of the vector to
explicitly decide
that he doesn't want to the vector to allocate on the GC-controlled
heap (for
whatever reasons) without re-inventing a different vector template.

> Second,
> my
> proposal naturally allowed the programmer to also control the other
> storages of objects, thus keeping a number of idioms efficient. If we
> give up on that, efficiency of C++ programs can be hurt.

I don't think that automatically producing reference counted smart
pointers is a
good option. For those types that cannot be allocated on a gc-heap,
current
measures that we already use will continue to work correctly.

>> We don't need any addition semantics on delete except to say
>> that it is undefined behavior to delete gc-allocated memory--which
>> can be asserted by a checked implementation.
>
> Ehm. Fine, if you like "undefined behavior". I stay away from it like
> the pest.

I don't like it, but that is a consequence of having pointers to the
gc-heap be
raw pointers. We simply cannot change all non-gc-heap pointers (what
we have
now) to something other than raw memory addresses. That is not a
viable option.

>> We don't need anything on a type that
>> says that it cannot be allocated on a non-gc-heap either. The only
>> limitations that are necessary is that if an object is to be
>> allocated on a gc-controlled heap, it must have a trivial destructor
>> (and this must be statically enforced by the expression 'new (gc)
>> T').
>
> I'm not sure I understand. If you allocate a normally GC object in the
> non-GC realm, you'll need to needlessly manipulate a reference counter
> for that object, or define two kind of pointer modifiers (*gc and
> *deterministic) that specify what heap a pointer is on. Right?

You wouldn't necessarily need to manipulate a reference counter. Nor
do you
need pointer modifiers at all. E.g.

std::auto_ptr<X> x = new X;

This will work regardless of whether X could have been allocated on the
GC-controlled heap. Even if X internally allocates other memory on the
GC-controlled heap. The memory required to store X itself will be
deleted when
the auto_ptr is destructed (RAII). Any memory that X allocates on
the GC-heap
will be collected by the GC at some other time.

Basically all you really need to support GC is some way for a
(global) operator
new to be able to tell what type it is allocating memory for and a
has_trivial_destructor metafunction. E.g.

template<class T> inline
std::enable_if<std::has_trivial_destructor<T>::value, void*>::type
operator new(std::size_t size, std::gc) {
return std::__gcalloc(size);
}

(The compiler can trivially supply the explicit argument for T.)

Of course, this doesn't protect pointers that were allocated on the
gc-heap from
being deleted. But, dynamically allocated objects, whose lifetime is
not
controlled by some form of smart pointer, should already not be
manually deleted
non-locally. Even locally, such pointers should have been controlled
by some
sort of smart pointer. The best that you're protecting against is
blatantly
obvious errors like:

std::auto_ptr<T> p = new (gc) T;

I don't think that this is a serious problem, and in all likelihood,
vendors
could and would check whether the pointer pointed to memory in the GC-
controlled
heap in a delete expression or in the global operator delete
implementation.
(The standard could even require it to do this check when NDEBUG is not
defined.)

The net result is maximum reusability. For example, say you have a
simple
struct:

struct point {
double x, y;
};

One group of programmers (say that are writing programs for a PC) can
allocate
objects of this type on the GC-heap all day and never worry about it.

Another group of programmers (say that are writing programs that run
on much
more limited hardware with realtime contraints--anathema to GC, BTW) can
allocate objects of this type in another way (say with explicitly
controlled
fashion on the regular heap or even with in-place construction in some
statically preallocated chunk of memory).

The only property of 'point' that matters is whether or not it is
collectable--which is fundamentally based on whether it has a trivial
destructor. The only effect that that would have is whether 'new
(gc) point'
will compile. No other way that objects of type 'point' are
allocated should
matter. The difference is that memory dynamically allocated on the
regular heap
still needs to be tracked just like we've already been doing--no big
deal.

In fact, this way of doing it (via a template'ed operator new +
has_trivial_destructor) is actually the more general way to do it, as
other
types of allocators (even other types of garbage collection) could be
implemented with it. E.g. a garbage collector that only runs when
manually
invoked (which might be suitable for some systems with hard realtime
contraints).

So, I agree that whether a type is 'collectable' is fundamentally a
property of
the type (unfortunate but true: RAII is fundamentally at odds with GC).
However, I think that fundamental property is either 'collectable' or
'not-collectable' and which it is depends entirely on whether there
is a need
for a non-trivial destructor. IOW, if you have a non-trivial
destructor, it
implies that the type is not collectable.

Regards,
Paul Mensonides

Nicola Musatti

unread,
Mar 20, 2006, 12:07:42 PM3/20/06
to

Andrei Alexandrescu (See Website For Email) wrote:
[...]

> Claim #1: The lifetime management of objects of a class is a decision of
> the class implementer, not of the class user.

I disagree. Only the class user knows enough of the context to decide
on when an object should be destroyed.

> a) A class such as complex<double> is oblivious to destruction
> timeliness because it does not allocate scarce resource that need timely
> release;

I.e. neither the class nor the user care about destruction.

> b) A class such as string doesn't need to worry about destruction
> timeliness within a GC (Garbage Collected) environment;

This might be the case in languages with built in compound data types
such as Array or String, but in C++ this would mean that std::string
would have no control on the lifetime of its buffer unless it was
implemented as a plain char * .

> c) A class such as temporary_file does need to worry about destruction
> timeliness because it allocates scarce resources that transcend both the
> lifetime of the object (a file handle) and the lifetime of the program
> (the file on disk that presumably temporary_file needs to delete after
> usage).

What if temporary_file was implemented in terms of "file"? With current
multi level buffered I/O there may be write errors that are not
detected until a file is closed. In this context a RAII approach is not
effective, because it would make it impossible to check the success of
closing the file. Once RAII is out of the way there's no reason to be
timely about destruction anymore. Here you have conflicting use cases
for the file class.

> In all of these examples, the context in which the objects are used is
> largely irrelevant (barring ill-designed types that employ logical
> coupling to do entirely different actions depending on their state).
> There is, therefore, a strong argument that the implementer of a class
> decides entirely what the destruction regime of the class shall be. This
> claim will guide design considerations below.

There's another argument against this approach: you are introducing an
irregularity in the language in that the user gets to choose whether
the object is statically, automatically or dynamically allocated, but
if it is dynamically allocated it's the class that decides how it
should be handled. This looks as an unnecessary complication to me.

Cheers,
Nicola Musatti

igazt...@gmail.com

unread,
Mar 20, 2006, 12:04:08 PM3/20/06
to

> That certainly is the conventional wisdom, but is it true? For example,
> check out the benchmarks in www.digitalmars.com/d/cppstrings.html. GC makes
> for a substantial performance increase. It's not because a GC malloc is
> faster than malloc, it is because with GC memory management *far fewer
> allocations are necessary*.

Just some snippets of this "benchmark" (wccpp2):

for(eow = first; eow != last; )
{
bow = find_if(eow, last, isWordStartChar);
eow = find_if(bow, last, isWordEndChar);
if(bow != eow)
++dict[string(bow, eow)], ++numWords;
}

and this (wccpp1):

for ( unsigned int j = 0; j < input.length(); j++ )
{
//..
else if (inword)
{
std::string word = input.substr( wstart, j - wstart );
std::map< std::string, int >::iterator it = dictionary.find(
word );
}

So constructing and destructing a std::string for every word of the
test. I hope you have small string optimization activated in your STL
implementation! The dictionary is std::map<std::string, int> (a
temporary when inserting std::map<>::value_type). I know this code is
from real people, but is this a correct benchmark? So with this
"definitive" benchmark you claim that "D is faster"?. Take the string
outside of the loop, use boost::pool for nodes. With move semantics we
can avoid nearly all temporaries when inserting nodes into map. This
can show that programming this wordcount in C++ is more difficult than
D but saying "D is faster" with this benchmark is something that I
wouldn't consider fair. This reminds the "definitive" Java benchmarks
against C++ we see every time Sun makes another version.

> I just got back from SDWest, where I attended Scott Meyers' excellent
> presentation on TR1's shared pointer. Shared pointer is supposed to make
> explicit memory management in C++ much more reliable and tractable. But if
> we look under the hood of it, we find:
>
> 1) shared_ptr makes an extra allocation for each object being pointed to (in
> addition to the 8 bytes for the shared pointer itself). Yes, that means TWO
> allocations per object. So, for shared_pointer to be faster than GC, it's
> got to be at least TWICE as fast per allocation as a GC allocation.

Reference counts have normally fixed size (with a default deleter, for
example). What about segregate storage allocator for them? Amortized
constant time two pointer operation. Check boost::smart_ptr with
BOOST_SP_USE_QUICK_ALLOCATOR activated.

> 2) For every copy of a pointer, shared_ptr needs to increment and decrement
> through two indirections. Note that just passing a shared_ptr to a function
> will require this.

Yes. If the function is not going to share ownership pass it by
reference. If you want the function to take exclusive ownership, move
it. shared_ptr is not a lightweight object when comparing to a raw
pointer. Use C++ rules for heavy objects.

> 3) If "multithreaded" is turned on, then those increments/decrements need to
> be locked or synchronized.

Correct.

> 4) shared_ptr's are exception safe, which means that copying one means they
> have to be entered into the exception tables, etc. Although this overhead
> can be moved into static data, it still can inhibit more aggressive program
> flow optimizations.

Some exception handling implementations have zero overhead while
exceptions are not thrown. I don't know about digitalmars compilers.
And I might be wrong with this statement, since I'm not a compiler
writer.

> Contrast that with conventional GC, where you just have a pointer and can
> copy it around willy-nilly without extra overhead. That's a high bar for
> shared_ptr to climb over.

> D is faster (see the benchmark).

That benchamark shows that your D program is faster than the C++
program written by others. Not that D is faster than C++.

> Most garbage collected languages *are*
> slower overall than C++, often considerably so. But it's a serious mistake
> to assume that is because of the gc, since those languages have a lot of
> other heavy anchors to drag around (like being interpreted rather than
> compiled, or simply lacking needed expressiveness).

I agree. Maybe is not gc's fault.

> > Apart from speed, what about size? How much memory do the garbage
> > collected languages need?
>
> For the heap, about 2 to 3 times what C++ needs. For the code, it's
> significantly less, though it's hard to say how much less. All those
> shared_ptr fiddlings add up.

2 to 3 times sounds terrible.

> > All the proposals I see, including the
> > "Transparent Garbage Collection for C++" need so many changes and added
> > complexities that if we consider garbage collection essential, we
> > should think about declaring C++ deprecated and start another language.
>
> I agree, which is the genesis for the D programming language <g>.

If D is so good, propose it for standarization.

Regards,

Ion

Nicola Musatti

unread,
Mar 20, 2006, 12:05:30 PM3/20/06
to

Paul Mensonides wrote:
[...]

> All we need is a 'new
> (gc) T' variant that will fail to compile if the type has a non-trivial
> destructor (and a way to determine if a type has a non-trivial destructor in
> generic code). We don't need any addition semantics on delete except to say
> that it is undefined behavior to delete gc-allocated memory--which can be
> asserted by a checked implementation.

While I tend to feel closer to your point of view than to Andrei's, I
do have one objection: why the requirement that GC-able classes have
trivial destructors? Wouldn't it be simpler and more effective to allow
non trivial destructors, keep calling delete on GC'ed pointers legal
and separate destruction from memory recycling?

Cheers,
Nicola Musatti

Nicola Musatti

unread,
Mar 20, 2006, 12:06:36 PM3/20/06
to

Nemanja Trifunovic wrote:
[...]

> Slightly OT: All these discussions are very interesting of course, but
> I don't think it is feasible to make C++ a "GC language". Even if it
> happens (with C++ 202x maybe), my bet is that people would avoid to use
> it. See what happened to Fortran - every single Fortran programmer I
> know uses F77 - even the latest version of that language Fortran 2003
> is an object oriented language.

I don't think any of the proposals in this and the other GC thread is
trying to turn C++ in a GC only language; the idea is to change C++ so
that GC is available as a memory management alternative to the manual
approach.

Cheers,
Nicola Musatti

Edward Diener No Spam

unread,
Mar 20, 2006, 12:04:35 PM3/20/06
to
Paul Mensonides wrote:
> Edward Diener No Spam wrote:
>
>> The proposal says that if classes can be tagged as RAII (
>> deterministic ) or non-RAII ( collected ), the same dynamically
>> allocated syntax can
>> be used, ie. x = new X(...);. In the case of deterministic objects,
>> the
>> compiler creates a smart pointer under the covers,
>
> ....which causes just as many massive changes to code (unless you
> double the size
> of all pointers).

There would be no changes to the syntax. The compiler would treat the
dynamic allocation of a deterministic object in the same way, for
example, that Boost shared pointer treats it. The difference would be
that, for a RAII class, instead of the programmer writing
"boost::shared_ptr<X> x(new X(...));", he would be writing "X x(new
X(...));". Because X is a class marked as "deterministic" ( or whatever
else it is decided to call an RAII class ), the same syntax can be used
for dynamically allocating deterministic objects and normal GC collected
objects.

Requiring "X x(new(gc) X(...));" for collected objects means:

1) The programmer, not the class designer, has to know whether class X
is a RAII one or not.

2) The syntax has to change depending on whether or not the class is a
RAII or not.

>
>> where the object
>> gets
>> deleted when its reference counts goes back to 0, and in the case of
>> collected objects GC works non-deterministically as usual. I assume
>> also
>> the semantics of current smart pointers for such deterministic
>> objects,
>> where when the pointer gets assigned to another variable the reference
>> count is incremented and where the pointerf foes out of scope it is
>> decremented.
>
> The one thing that we absolutely don't want is for there to be a type
> difference
> between pointer types. We do not want, for example, X^ and X*.

Totally agreed. There is no type difference or syntax difference and
both deterministic and collected objects are allocated in the same
memory space. The only difference is that deterministic objects get
tracked, just like boost::shared_ptr, where they get deleted
automatically and their memory released when their reference count goes
down to 0, and collected objects do not.

> All
> that we
> want is X*. Nor do we want to deal with all the loopholes and problems
> associated with having multiple pointer types--such as what is the
> type of
> &*ptr?

Total agreement.

> It is better anyway for a gc-allocation to be explicit (as
> there are
> other implications of using GC beyond just deterministic destruction).

See my argument previously about this. The class designer knows whether
the class is RAII ( deterministic ) or not. The programmer should not
need to know about this or care.

>
>> The rest of the proposal regarding stack objects and static objects
>> can,
>> and should, be thrown out since neither has anything to do with
>> dynamic allocation.
>
> See my other post for a more detailed explanation of my opinion.
> Suffice to
> say, the difference between gc-allocation and non-gc-allocation
> should be
> explicit.

I disagree that the explicitlessness should be the responsibility of the
programmer doing the allocation rather than the class designer. Using
the exact same syntax to dynamically allocate an object that is
deterministically deleted or collected appears to me a winner, because
the difference of how those objects are handled should not affect the
way one uses them at all.

In .NET's C++/CLI there is a vast difference between the way that .NET
objects are treated and normal C++ objects are treated. The essence of
that difference, among many others, is that .NET objects are references
to memory addresses which are movable. This difference alone required a
different syntax and a different memory space for .NET objects and
normal C++ objects.

For GC in C++ the only difference is that there are objects that need to
be deterministically deleted as opposed to collected. If this difference
can be expressed by the class itself I see no reason why the syntax of
dynamic allocation, and the way pointers are expressed for a dynamically
allocated object, should be any different from what it is now.

kanze

unread,
Mar 20, 2006, 4:13:54 PM3/20/06
to
Walter Bright wrote:
> <igazt...@gmail.com> wrote in message
> news:1142696088.6...@i40g2000cwc.googlegroups.com...

> > I really think that if you have the control of the memory
> > management, you can be faster than the garbage collector. Of
> > course, you can be slower, but in general this is not the
> > rule.

> That certainly is the conventional wisdom, but is it true? For
> example, check out the benchmarks in
> www.digitalmars.com/d/cppstrings.html. GC makes for a
> substantial performance increase. It's not because a GC malloc
> is faster than malloc, it is because with GC memory management
> *far fewer allocations are necessary*.

Even without that, GC often beats manual allocation.

> I just got back from SDWest, where I attended Scott Meyers'
> excellent presentation on TR1's shared pointer. Shared pointer
> is supposed to make explicit memory management in C++ much
> more reliable and tractable. But if we look under the hood of
> it, we find:

> 1) shared_ptr makes an extra allocation for each object being
> pointed to (in addition to the 8 bytes for the shared pointer
> itself). Yes, that means TWO allocations per object. So, for
> shared_pointer to be faster than GC, it's got to be at least
> TWICE as fast per allocation as a GC allocation.

> 2) For every copy of a pointer, shared_ptr needs to increment
> and decrement through two indirections. Note that just passing
> a shared_ptr to a function will require this.

> 3) If "multithreaded" is turned on, then those
> increments/decrements need to be locked or synchronized.

> 4) shared_ptr's are exception safe, which means that copying
> one means they have to be entered into the exception tables,
> etc. Although this overhead can be moved into static data, it
> still can inhibit more aggressive program flow optimizations.

You miss another important difference. shared_ptr is a class
type with as size of at least two pointers, and non-trivial
constructors and destructors. At least today, I don't know of a
compiler which will pass it or return it in a register. With
garbage collection, you pass and return raw pointers -- I don't
know of a compiler today which will return them other than in a
register, and at least on a Sparc, all of the compilers also
pass them in a register.

It's usual when passing them to pass them by const reference, in
order to avoid the copy. But of course, this leads to an
additional level of indirection in the function, so it's not
always obvious that it is a gain.

For the rest, to be fair: one of the allocations is of a fixed
constant size, and so can be optimized. And of course,
shared_ptr arguably shouldn't be "thread safe", since it is
usually an error if two threads hold a pointer to the same
object (but of course, this depends on the role of the object in
the application -- and I'm sure that Andrei has some sort of
policy based smart pointer which will avoid locking except when
necessary).

Also, of course, acquiring an undisputed lock is pretty cheap.
About 1.5 microseconds on my machine (which is a pretty old and
very slow Sparc).

> Contrast that with conventional GC, where you just have a
> pointer and can copy it around willy-nilly without extra
> overhead. That's a high bar for shared_ptr to climb over.

> > I can't think of a language with garbage collection that is
> > faster than C++. And I think that's because of the
> > allocation control. I really think garbage collection is a
> > performance penalty.

> D is faster (see the benchmark).

So is Java, in most benchmarks.

The real question is: who wrote the benchmark? Overall, my
experience that the language plays a very small role in
performance. What counts is the amount of effort invested in
the optimizer. Both Java and C++ (which have both been heavily
optimized, at least for some platforms) have constructs which
cause problems for optimization -- depending on the benchmark,
one or the other constructs will dominate, and which language
ends up fastest depends on the constructs used in the benchmark.

In general, most "benchmarks" I've seen do very little
allocation and freeing of memory. Which means that garbage
collection plays only a very small role. In the few exceptions,
I've seen, garbage collection has systematically ended up faster
than manual allocation. (But the question of "who wrote them"
is still legitimate.

In the end, the simplest solution for a real user is to download
the Boehm collector, and use it in a small application. Without
bothering with benchmarks, because in the end, the question
isn't which is faster, but which is fast enough, and if both are
(which in my experience is usually the case), which requires the
least programmer effort to achieve the same results.

> Most garbage collected languages *are* slower overall than
> C++, often considerably so.

Most languages, garbage collected or no, are significantly
slower than C++ because significantly less effort (i.e. money)
has been invested in their optimization. One blatant exception
today is Java -- a great deal has been invested in optimizing it
on PC's (but only on PC's), with the results that it is
generally faster than C++ on PC's. (I'm willing to bet that on
an older processor, with, say, 36 one's complement integers, C++
will beat it hands down:-).)

> But it's a serious mistake to assume that is because of the
> gc, since those languages have a lot of other heavy anchors to
> drag around (like being interpreted rather than compiled, or
> simply lacking needed expressiveness).

I'm not sure about the "interpreted"; modern interpreters don't
differ much from optimizing compilers. With a few advantages,
in fact -- they can optimize exactly for the processor they are
running on. As for lack of expressiveness... that pessimizes
programmer productivity much more than run-time.

> > Apart from speed, what about size? How much memory do the
> > garbage collected languages need?

> For the heap, about 2 to 3 times what C++ needs. For the
> code, it's significantly less, though it's hard to say how
> much less. All those shared_ptr fiddlings add up.

> > It's sad to see the huge memory a simple
> > garbage collected program in C#/Java needs.

> C# and Java both have to carry around enormous VMs as well as
> JIT compilers, all kinds of stuff. That isn't the fault of the
> GC.

I'm not sure about the current implementations, but the way Java
garbage collection was implemented in the implementations I
used, one got the impression that it was designed to maximize
memory use. You can do the same thing in malloc, but I've not
seen an implementation which actually did so.

> > I don't know a garbage collected language can beat C++ in
> > performance.

> You do now <g>.

> > Most important than being faster, is the fact that we know
> > WHEN is going to be faster. Or we have to disable garbage
> > collecting explicitly in a performance critical area,
> > because the garbage collector has can stop the world to
> > search unreferenced objects?

> There are well known methods for dealing with this.
> Additionally, there are no upper bounds for how long it takes
> new() to run in C++, either. C++ programs that require real
> time performance with no pauses and guaranteed latency can't
> use new(). They usually preallocate all they'll need before
> entering the latency critical section.

> > All the proposals I see, including the "Transparent Garbage
> > Collection for C++" need so many changes and added
> > complexities that if we consider garbage collection
> > essential, we should think about declaring C++ deprecated
> > and start another language.

> I agree, which is the genesis for the D programming language
> <g>.

But garbage collection wasn't the only motivation, was it?
(I've had pretty good experience with the Boehm collector with
C++.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Greg Herlihy

unread,
Mar 20, 2006, 4:14:19 PM3/20/06
to

I would make a few changes: "gc" is better placed as a storage
specifier:

gc T * = new T;

The "gc" specifier would be part of the object's type, but an implicit
conversion would exist between gc and non-gc pointers to the same type
- meaning that gc pointers would be compatible with all existing code.
It would also be possible to use "gc" as a class specifier indicating
that objects of that class be allocated on the gc heap. Subclasses
would have to have the same gc qualifier as each of their base classes.

There is also no reason to prohibit explictly deleting a gc pointer
(after all, the last thing that C++ needs is more undefined behavior).
Nor is there any reason to require that gc-allocated objects have
trivial destructors.

In fact, all of my suggested changes would make this garbage collector
nearly identical to the one described in Ellis's and Detlefs' 1994
paper "Safe, Efficient Garbage Collection for C++" which can be found
at
http://www.usenix.org/publications/library/proceedings/c++94/full_papers/ellis.a
- a paper that seems very forward looking, dealing as it does with
issues of programmer acceptance, source code compatibility,
implementation feasibility - in short the very same issues still being
dissussed here 12 years later (and with almost no perceptible progress
having been made during the interim).

So I would conclude that the core obstacle for advancing C++ garbage
collection is not a shortage of opinions on how it should work (in fact
there may be something of an oversupply), but rather a dearth of
experimental implementations that would provide empirical data - in
short - the hard evidence needed to make the case.

Greg

kanze

unread,
Mar 20, 2006, 4:13:32 PM3/20/06
to

I think the idea was to provide some sort of hook for
meta-programming, so that the vector<T> could use new(gc) if the
T were collectable, and normal new otherwise.

With regards to Andrei's proposal: my experience with the Boehm
collector, simply using it instead of operator new(), have been
very positive. It will take something to convince me that we
need something more complicated.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,
Mar 20, 2006, 4:18:41 PM3/20/06
to
peter koch larsen wrote:

> I've decided not to quote anything from Andreis post. Quoting
> his message did tend to make my post more confusing than it
> already is. Sorry about that to all. Moderators: i hope you
> can accept the post as is.

> My proposal for using garbage collection in relation to C++
> would be very close to the solution that is available to us
> all today when using the Boehm garbage collector. I propose
> that no action can take place on collection so there is no
> kind of finalization. Unfortunately the discussion so far has
> not touched the drawbacks of this approach, so perhaps I'm
> missing something here.

That's what I tend to favor as well. Not that Andrei doesn't
have some interesting ideas, but the solution with the Boehm
collector has the advantage of simplicity and existing
experience -- we know exactly what the repercusions are.

It does mean that if you allocate a class with a needed
destructor, and you forget to call delete, the destructor won't
get called, and there's nothing to remind you that you've made
an error.

> I must also mention that I use garbage collection here as
> meaning non-deterministic garbage collection such as as the
> one you see if you use a mark-and-sweep algorithm.
> Smart-pointers are not considered as garbage-collecting
> mechanism. This is contrary to the way Andrei has used the
> term in the thread "Can garbage collection be beneficial", but
> it seems to me that he in this thread uses garbage collection
> in the same way as I do here.

> First let me give you my basic attitude to garbage collection.
> The programs, I've written are of a kind where RAII has had to
> be used extensively for resources other than memory, and it
> has been quite easy to cope with the lifetime of my objects.
> I thus do not believe that I would have had any benefit at all
> from using GC.

> That said, I can imagine application domains where GC would
> come in real handy, so I do not see any reason to not have GC
> built in as an optional extension. I am not sure that GC
> should be supported on all platforms. At least it should be a
> quality of implementation issue.

Well, that's really the current status. Unless you mean
"optional" for the user, as opposed to optional for the
implementor. An hosted implementation today has to provide
everything in the standard -- it doesn't have the right to skip
iostream's, for example, on the grounds that some people don't
use it or want it. If you don't like or want iostream, or if it
isn't appropriate for your application, of course, you don't
have to use it. That's about the situation I'd like to see for
garbage collection.

> The goal for GC in C++ should be:

> 1) Most current C++-programs should be able to run in a
> garbage-collected environment. The exception here is that
> programs that hide pointers by e.g. xoring them or writing
> them to a file should not work. (These usages prevent the
> mark-sweep to be a complete replacement for new/delete by the
> way).

Agreed.

> 2) All C++ programs enabling garbage collection should be able
> to use current C++ libraries without modifications.

That's slightly more difficult, unless by current C++ libraries,
you mean only implementations of the standard library. In
general, you can't tell if a third party library uses the xor
technique on pointers or not.

That is, in fact, one of the principle motives I see for
standardizing garbage collection. As long as I limit myself to
my own code, and the g++ implementation of the standard library
(which is the only one I've really tested), the Boehm collector
works just fine as a third party add in; there's pratically no
need for standardization. On the other hand, if it were
standard, I'd be more or less guaranteed that it would work with
third party libraries as well.

There's also an issue of what compilers can generate when
optimizing. I haven't found this to be a problem, at least not
with g++ on a Linux 64 bit PC. But that doesn't mean that it's
not a problem somewhere -- for that matter, I've not done
extensive tests at higher levels of optimization even on that
platform. Standardization would protect against this.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Gerhard Menzl

unread,
Mar 20, 2006, 4:26:26 PM3/20/06
to
Paul Mensonides wrote:

> The one thing that we absolutely don't want is for there to be a type
> difference between pointer types. We do not want, for example, X^ and
> X*. All that we want is X*. Nor do we want to deal with all the
> loopholes and problems associated with having multiple pointer
> types--such as what is the type of &*ptr? It is better anyway for
> a gc-allocation to be explicit (as there are other implications
> of using GC beyond just deterministic destruction).

This is what the designers of Managed C++ did. Herb Sutter describes in
http://www.gotw.ca/publications/C++CLIRationale.pdf why using a uniform
pointer syntax for collected and native types was considered a mistake
and changed in C++/CLI. Have you read about the rationale behind the
decision?

As much as I retain certain reservations towards C++/CLI, and in full
knowledge that many other design considerations played a role, I think
we should not ignore the experience made in the course of the effort of
adapting C++ to a platform based on garbage collection. The designers
seem to have gone through much of what we are dicussing here: different
static types for pointers to collected and to non-collected objects,
restrictions for mixing them, etc.


--
Gerhard Menzl

#dogma int main ()

Humans may reply by replacing the thermal post part of my e-mail address
with "kapsch" and the top level domain part with "net".

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:48:37 PM3/20/06
to
igazt...@gmail.com wrote:
> I can find an example where GC is slower is and you can find the
> inverse. But after so many years of research with garbage collection,
> apart from some flawed benchamarks (how many "See!, XXX is faster than
> C++ in this benchmark, so XXX is faster!" have we seen?), there is no
> GC language faster than C++ (the size is another issue where C++ beats
> all). We can talk about being difficult to program, complex, etc... but
> the it's king of the speed, and with move semantics, it will be even
> faster.

I totally agree that a disciplined program that carefully combines raw
pointers with smart pointers judicioulsly will always be faster than a
GC program. The problem is the amount of programmer time that must go
into that "judiciousness". We just can't claim that programs using GC
can't possibly be faster than those that don't. Indiscriminate use of
smart pointers will likely add up quickly. And there's no easy way to
keep things correct and efficient at the same time.

Walter has addressed some other points, but I'll say that D has shown
how a language of much C++ heritage that has added GC, is not a
performance slouch.

> Regarding size, we have to remember that with current architectures,
> and even more with future multi-core architectures, smaller is faster.
> With same memory synchronization operations as GC, smaller means more
> data in cache.

Sure. Let's also not forget the temporal aspect. :o)

> How will this work if a library L uses garbage collection and
> application A wants to use manual management?

My proposal says that as long as you have at least one [collected]
pointer in an application, the garbage collector will be linked in. I'm
not saying that this is a must; we could think of changing the initial
constraints of the proposal to allow different designs.

Don't forget that it's way easier to attack an imperfect design than to
find intelligent ways to make it better.

> What is library L returns
> a pointer to an object?

L decides the pointer's regime, and the application knows it. I now
realize that the regime must be part of the declaration, not the
definition, that is, a class declaration must look like:

class [deterministic] Widget;

> Do I know if it's garbage collected?

You know.

> Can a
> erase it?

You don't need to :o).

> A lot of questions. But the core is: Is garbage so important in C++0x
> to:
>
> -> Do the most breaking change in C++ I can remember?
> -> Have a risk (this is only _my_ opinion, based on my limited
> experiece) of a serious performance/size impact?

Seemingly a lot of people seem to think that GC is worth putting a lot
of effort into.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:48:15 PM3/20/06
to
igazt...@gmail.com wrote:
> I can find an example where GC is slower is and you can find the
> inverse. But after so many years of research with garbage collection,
> apart from some flawed benchamarks (how many "See!, XXX is faster than
> C++ in this benchmark, so XXX is faster!" have we seen?), there is no
> GC language faster than C++ (the size is another issue where C++ beats
> all). We can talk about being difficult to program, complex, etc... but
> the it's king of the speed, and with move semantics, it will be even
> faster.

I totally agree that a disciplined program that carefully combines raw
pointers with smart pointers judiciously will always be faster than a GC

program. The problem is the amount of programmer time that must go into
that "judiciousness". We just can't claim that programs using GC can't
possibly be faster than those that don't. Indiscriminate use of smart
pointers will likely add up quickly. And there's no easy way to keep
things correct and efficient at the same time.

Walter has addressed some other points, but I'll say that D has shown
how a language of much C++ heritage that has added GC, is not a
performance slouch.

> Regarding size, we have to remember that with current architectures,


> and even more with future multi-core architectures, smaller is faster.
> With same memory synchronization operations as GC, smaller means more
> data in cache.

Sure. Let's also not forget the temporal aspect. :o)

> How will this work if a library L uses garbage collection and


> application A wants to use manual management?

My proposal says that as long as you have at least one [collected]

pointer in an application, the garbage collector will be linked in. I'm
not saying that this is a must; we could think of changing the initial
constraints of the proposal to allow different designs.

Don't forget that it's way easier to attack an imperfect design than to
find intelligent ways to make it better.

> What is library L returns


> a pointer to an object?

L decides the pointer's regime, and the application knows it. I now

realize that the regime must be part of the declaration, not the
definition, that is, a class declaration must look like:

class [deterministic] Widget;

> Do I know if it's garbage collected?

You know.

> Can a
> erase it?

You don't need to :o).

> A lot of questions. But the core is: Is garbage so important in C++0x


> to:
>
> -> Do the most breaking change in C++ I can remember?
> -> Have a risk (this is only _my_ opinion, based on my limited
> experiece) of a serious performance/size impact?

Seemingly a lot of people seem to think that GC is worth putting a lot
of effort into.


Andrei

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:46:27 PM3/20/06
to
peter koch larsen wrote:
> Andreis proposal to declare objects as either collectable or
> deterministic is not understood by me - so far as I can tell, there is
> no need for this destinction if only we can demand some intelligence
> from the compiler - and require a slight extension of the standard C++
> definition of a destructor:we keep the current definition of the
> "trivial" destrutor but divide the non-trivial destructors into two:
> lightweight-destructors and full destructors. Lightweight destructors
> are destructors that only reclaim memory, whereas full destructors
> might also release other resources.

I've been coquetting for a short while with the idea that the compiler
scans the destructor body and makes decisions based on that, but then I
said, heck, let the programmer just specify the regime in the class
definition.

Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:47:11 PM3/20/06
to
Edward Diener No Spam wrote:
> It's unnecessary to consider stack based or static objects in your
> design constraints. Neither has anything to do with GC and dynamic
> allocation. If you simplify your argument to deal only with dynamically
> allocated objects, it becomes much simpler for C++ with GC to adapt the
> idea of tagging a class as either deterministic delete * ( deterministic
> ) or non-deterministic destroy ( collected ), and presenting that as a
> possible proposal for GC in C++. Adding unnecessary design constraints
> merely obfuscates the basic idea and will make it much harder for the
> idea to be adapted.

Yes, but it also makes the idea easier to attack. "You mean I can't even
create objects on the stack and avoid the whole reference counting
business???"

I wanted to clarify that it's easy, with minimal change to the language,
to implement the concept of "pass-down" pointers in a typesafe manner.

> I have assumed that you are saying that when the last reference to a
> pointer of a dynamically allocated deterministic object goes out of
> scope, meaning the reference count drops down to 0, the object
> automatically gets delete called on it, which allows its destructor to
> run in order to release its resource.

Your assumption is correct. Within the framework I suggest, calls to
delete become unnecessary.

> Essentially dynamically allocated
> deterministic objects automatically become smart pointers under the
> covers. This is the entire crux of my argument that this is the way GC
> needs to occur in C++ so that a programmer does not have to manually
> switch syntax between using a smart pointer and use GC depending on
> whether a class is deterministic or collected. Neither Java, C#, or
> Python get this right, so that resource classes in those languages are
> headaches to various degrees, but C++ does need to get this right if GC
> is added to the language. It is not necessary to duplicate a mistake of
> other languages when those languages are blind to it.

I agree.

Andrei

Nemanja Trifunovic

unread,
Mar 20, 2006, 4:46:49 PM3/20/06
to

Nicola Musatti wrote:
>
> I don't think any of the proposals in this and the other GC thread is
> trying to turn C++ in a GC only language; the idea is to change C++ so
> that GC is available as a memory management alternative to the manual
> approach.
>
> Cheers,
> Nicola Musatti

By a "GC language" I did not mean "a GC-only language", but rather "a
language where GC is commonly used". GC is already available even
without changes in the language, but the number of people using it is
very small, AFAIK, and even if it can be made "better" with language
changes, my bet is that little would change in this regard. And given
the speed the new features are being introduced to C++ these days,
there is little chance it would happen in the next ten years.

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:45:40 PM3/20/06
to
Nemanja Trifunovic wrote:
>>Claim #1: The lifetime management of objects of a class is a decision of
>>the class implementer, not of the class user.
>>
>
>
> That's one of the reasons I stopped using .NET :) On the contrary, the
> decision needs to be made by the class user, since only the class user
> knows the context in which the objects are going to be used. Of course,
> in many cases, the class implementer would design their class for a
> specific scenario, but there are types that are made for generic use.
> Take string as an example: in "most" cases it would probably needed to
> be non-deterministic. But there are few cases where it is simply not
> acceptable, and a user of the class needs to be able to make a choice.

It's feasible; you'd have to add pointer qualifiers to track where the
object was allocated. The one thing that shouldn't be allowed is to
allocate a deterministic class on the GC heap. So it's at the end of the
day, a decision of the class writer.

> Slightly OT: All these discussions are very interesting of course, but
> I don't think it is feasible to make C++ a "GC language". Even if it
> happens (with C++ 202x maybe), my bet is that people would avoid to use
> it. See what happened to Fortran - every single Fortran programmer I
> know uses F77 - even the latest version of that language Fortran 2003
> is an object oriented language.

Ionno. In the Perl world, many are using the latest Perl (5.8) and look
at Perl 6 as the next big thing (I, too, believe it will). In the Java
world, people are using (I think) Java 1.5's generics. At least students
do. Java 1.4 just sucked too bad :o).

But, a good point is that my discussion might not help C++ a lot. It
might, however, help a language that has a similar set of constraints as
C++ does, plus a desire to implement type safety.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:58:52 PM3/20/06
to
Paul Mensonides wrote:
>>The proposal says that if classes can be tagged as RAII (
>>deterministic ) or non-RAII ( collected ), the same dynamically
>>allocated syntax can
>>be used, ie. x = new X(...);. In the case of deterministic objects,
>>the
>>compiler creates a smart pointer under the covers,
>
>
> ....which causes just as many massive changes to code (unless you
> double the size
> of all pointers).

I'm not sure I understand. There's absolutely no change in client code,
and pointer size stays the same. You'd only have to recompile. The
compiler can massage the reference count intrusively with the object.

Where is the massive change in the code?

> The one thing that we absolutely don't want is for there to be a type
> difference
> between pointer types. We do not want, for example, X^ and X*.

Well I wouldn't say "absolutely", but let's say "it would be good if we
got away without a new pointer type".

My proposal achieves that.

> See my other post for a more detailed explanation of my opinion.
> Suffice to
> say, the difference between gc-allocation and non-gc-allocation
> should be
> explicit. Memory management of non-gc-allocated memory should be
> explicitly
> controlled by a smart pointer (there are all kinds of different
> options for
> that--not just a one-size-fits-all hidden-by-the-implementation scheme).

Again, this doesn't take us to a spot more interesting than where we are
right now. We still have to do pretty much all by hand, and the risk of
resource leaks and dangling accesses is still high.

In my proposal, the risk of resource leaks is reduced to virtually
nothing (barring the programmers storing pointers and forgetting that
they stored them) and dangling pointers is reduced to zero. This was
what I set out for. Now I understand if someone wants to set out for a
different set of goals, and they achieve a different design. But let's
compare the designs *together with their stated goals*, so please state
your goals clearly so we know them.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 4:54:29 PM3/20/06
to
Peter Dimov wrote:
> Under this model, std::vector doesn't have a trivial destructor (it
> allocates memory via its allocator, not via new(gc), so it has to
> deallocate it.)

There's no new(gc). All pointers are manipulated uniformly. That's the
advantage of the compiler knowing where each object can live.

>>>We don't need anything on a type that
>>>says that it cannot be allocated on a non-gc-heap either. The only limitations
>>>that are necessary is that if an object is to be allocated on a gc-controlled
>>>heap, it must have a trivial destructor (and this must be statically enforced by
>>>the expression 'new (gc) T').
>>
>>I'm not sure I understand. If you allocate a normally GC object in the
>>non-GC realm, you'll need to needlessly manipulate a reference counter
>>for that object, or define two kind of pointer modifiers (*gc and
>>*deterministic) that specify what heap a pointer is on. Right?
>
>
> There are no implicit RC manipulations in Paul's simple model.
> Everything stays exactly as is, with the only addition being new(gc).

And what do you do if someone allocates a resource with a deterministic
destructor and forgets to deallocate it, or deallocates it and a
subsequent dangling pointer access occurs?

And what if you have a resource without a destructor that, however,
holds a pointer to a deterministic resource?

I agree that simply adding a GC realm and allowing certain objects to
sit in it is a point in the design space that's distinct from today's
C++, but it doesn't satisfy the constraints I set out for: true
deterministic scarce resource manipulation and improved type safety.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 5:07:31 PM3/20/06
to
Paul Mensonides wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> Yes, but I'd get rid of the "deterministic" and "collected" labels.
> Instead,
> I'd just call a class "collectable" or "non-collectable".

Well Walter taught me to avoid using feature names that embed a negation
:o).

> Third, it can inherit a destructor
> or not
> based on the 'has_trivial_dtor' trait of T along with allocation
> functions.
> More likely, the (e.g.) vector will just contain a storage device for
> this
> purpose...
>
> template<class T, class allocator> class vector {
> public:
> vector(...) : p_(0, alloc_) ... {
> ...
> return;
> }
> private:
> allocator alloc_;
> storage_device<T, allocator> p_;
> };
>
> ....that would have a non-trivial destructor if the allocator says
> that it should
> and would not if the allocator says that it shouldn't. The default
> allocator
> could determine this based on 'has_trivial_destructor', or a user
> could pass in
> a different allocator that always allocates on the regular heap.

Hm, so in this setup I have the following: there's a new class
storage_device<T, A> that either defines a destructor or not, depending
whether T can be collected or not. That can be done via partial
specialization. Vector will automatically get a destructor or not as it
embeds a member of that class.

Fine.

> In
> either
> case, this would cause the vector to pick up a non-trivial destructor
> automatically. It also allows for the user of the vector to
> explicitly decide
> that he doesn't want to the vector to allocate on the GC-controlled
> heap (for
> whatever reasons) without re-inventing a different vector template.

Fine.

> I don't think that automatically producing reference counted smart
> pointers is a
> good option. For those types that cannot be allocated on a gc-heap,
> current
> measures that we already use will continue to work correctly.

Well here's the thing. Let's talk about causes, not their consequences.
It's a causal chain that engendered refcounted pointers, not a decision
or a goal.

So is it:

- "automatically producing reference counted smart pointers is a good
option within the design constraints you chose, and here's an idea on
how to change that choice within your design constraints", or

- "automatically producing reference counted smart pointers is a good
option for reasons A, B, and C, and here's an idea on how to change the
design constraints and arguments C, D, and E on why those constraints
are desirable"?

> I don't like it, but that is a consequence of having pointers to the
> gc-heap be
> raw pointers. We simply cannot change all non-gc-heap pointers (what
> we have
> now) to something other than raw memory addresses. That is not a
> viable option.

This is a misunderstanding. My proposal doesn't mandate changing
pointers to something else than raw pointers. They stay memory
addresses. The reference count can be massaged into the object proper,
similar to today's intrusive reference counting schemes.

> Basically all you really need to support GC is some way for a
> (global) operator
> new to be able to tell what type it is allocating memory for and a
> has_trivial_destructor metafunction. E.g.
>
> template<class T> inline
> std::enable_if<std::has_trivial_destructor<T>::value, void*>::type
> operator new(std::size_t size, std::gc) {
> return std::__gcalloc(size);
> }
>
> (The compiler can trivially supply the explicit argument for T.)
>
> Of course, this doesn't protect pointers that were allocated on the
> gc-heap from
> being deleted.

I think this is what we need to do. We need to state your design
constraints and consequences, as I did mine. It's unclear to me what
exactly your design goals are. Memory safety? I'm not sure. Guaranteed
release of deterministic resources? I'm not sure. Automate some of
today's hand-implemented idioms? I'm not sure.

Also I'm not sure what your design's consequences are. We need to state
them clearly, otherwise we can't compare some random fallout of my
design with some random fallout of yours.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 5:06:01 PM3/20/06
to
Nicola Musatti wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> [...]
>
>>Claim #1: The lifetime management of objects of a class is a decision of
>>the class implementer, not of the class user.
>
>
> I disagree. Only the class user knows enough of the context to decide
> on when an object should be destroyed.

If you want to successfully refute my claim, you need to take the
alternative to its utmost consequence. I did that. The utmost
consequence is that you'd need two kinds of pointers (or pointer
modifiers) that record the way in which the object was created.

I am not sure you would be happy with that. I eliminated that option
from the design space entirely.

>>c) A class such as temporary_file does need to worry about destruction
>>timeliness because it allocates scarce resources that transcend both the
>>lifetime of the object (a file handle) and the lifetime of the program
>>(the file on disk that presumably temporary_file needs to delete after
>>usage).
>
>
> What if temporary_file was implemented in terms of "file"? With current
> multi level buffered I/O there may be write errors that are not
> detected until a file is closed. In this context a RAII approach is not
> effective, because it would make it impossible to check the success of
> closing the file. Once RAII is out of the way there's no reason to be
> timely about destruction anymore. Here you have conflicting use cases
> for the file class.

At least you don't leak the file handle resource. I didn't say there's a
guarantee that things will work correctly.

>>In all of these examples, the context in which the objects are used is
>>largely irrelevant (barring ill-designed types that employ logical
>>coupling to do entirely different actions depending on their state).
>>There is, therefore, a strong argument that the implementer of a class
>>decides entirely what the destruction regime of the class shall be. This
>>claim will guide design considerations below.
>
>
> There's another argument against this approach: you are introducing an
> irregularity in the language in that the user gets to choose whether
> the object is statically, automatically or dynamically allocated, but
> if it is dynamically allocated it's the class that decides how it
> should be handled. This looks as an unnecessary complication to me.

What would be the alternatives, and what would each of them lead to?


Andrei

Edward Diener No Spam

unread,
Mar 20, 2006, 5:07:09 PM3/20/06
to
Nicola Musatti wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> [...]
>> Claim #1: The lifetime management of objects of a class is a decision of
>> the class implementer, not of the class user.
>
> I disagree. Only the class user knows enough of the context to decide
> on when an object should be destroyed.

That is not what Andrei is saying ( or what I said originally in another
post in the "Can GC be beneficial thread" ) . What is being proposed is
that a distinction be made for RAII classes ( deterministic ) and
non-RAII classes ( collected ) within a GC system in C++. Are you really
going to insist that the user of the class should determine this rather
than the designer of the class ? And that the user of the class has to
be responsible for deleting an RAII class at just the right time ? Think
about it, because this is just what pure GC languages such as Java, C#,
and Python have decided for their "RAII" classes, and the mess they have
created for themselves with this thinking has never been rectified.
There is no reason why a GC implementation in C++ must follow the faulty
reasoning that these other pure GC languages have followed, especially
because C++ has deterministic deletion, as seen by RAII classes, and
smart pointers which show how this currently works outside of GC.

What is being proposed is that in a GC system in C++, the exact same
syntax be used for allocating deterministic and collected objects on the
heap, and that in either case the programmer need not know, or care,
when the objects are deleted, in the first case, or destroyed, in the
second. This can be done by tagging classes as deterministic or
collected, and having the compiler treat a dynamically allocated object
of a deterministic type as a smart reference counted pointer, the
equivalent of a Boost shared pointer let us say, and a collected type as
your normal GC object which does not get its destructor called.

Radical ? Yes. In this system, nothing else changes in C++ syntax except
for the tagging of deterministic of collected class types, and the
compiler turning dynamically allocated objects of deterministic types
into referenced counted smart pointers, with all that means of
incrementing the count when the pointer is assigned to another variable,
decrementing the count when the pointer goes out of scope, and calling
delete on the object when the reference count goes to 0.

I will readily admit that doing the latter is probably much harder than
I surmise, but if that can not be done in a GC system in C++, I
personally will not be using raw pointers in that system but just
continue to use smart pointers, which is what I do know anyway. The
headache, and burden, of having different syntax and outside usability
for deterministic and collected objects in GC is just not worth it in my
estimation.

Walter Bright

unread,
Mar 20, 2006, 5:10:23 PM3/20/06
to

<igazt...@gmail.com> wrote in message
news:1142868459.2...@i40g2000cwc.googlegroups.com...

> So constructing and destructing a std::string for every word of the
> test. I hope you have small string optimization activated in your STL
> implementation! The dictionary is std::map<std::string, int> (a
> temporary when inserting std::map<>::value_type). I know this code is
> from real people, but is this a correct benchmark? So with this
> "definitive" benchmark you claim that "D is faster"?.

I didn't say "definitive", so you shouldn't put quotes around it. All
benchmarks can be criticized. If you've got a better one, I'll be happy to
take a look.

> Take the string
> outside of the loop, use boost::pool for nodes. With move semantics we
> can avoid nearly all temporaries when inserting nodes into map. This
> can show that programming this wordcount in C++ is more difficult than
> D but saying "D is faster" with this benchmark is something that I
> wouldn't consider fair.

I didn't write the C++ versions of wc used in the benchmark because if I
did, I get accused of not being fair, not using C++ correctly, sabotaging
C++, etc. Both of the C++ versions were written by others who essentially
said, like you, that I wasn't showing C++ fairly. I'd be happy to add your
C++ version of wc into the list of benchmarks if you like.

But look at it from another perspective. I think you'll agree that the D
version is very straightforward. There's no attempt to tune it, cheat, it
doesn't use any tricks, etc. You're advocating that a "fair" C++ one uses a
complex third party library? Is that realistic?

>> 1) shared_ptr makes an extra allocation for each object being pointed to
>> (in
>> addition to the 8 bytes for the shared pointer itself). Yes, that means
>> TWO
>> allocations per object. So, for shared_pointer to be faster than GC, it's
>> got to be at least TWICE as fast per allocation as a GC allocation.
> Reference counts have normally fixed size (with a default deleter, for
> example). What about segregate storage allocator for them? Amortized
> constant time two pointer operation. Check boost::smart_ptr with
> BOOST_SP_USE_QUICK_ALLOCATOR activated.

If that's the way to use shared_ptr, why isn't it the default? Why, for a
fairly straightforward application, do I need to tweak the library? Is
BOOST_SP_USE_QUICK_ALLOCATOR part of TR1? If not, why not?

>> 2) For every copy of a pointer, shared_ptr needs to increment and
>> decrement
>> through two indirections. Note that just passing a shared_ptr to a
>> function
>> will require this.
> Yes. If the function is not going to share ownership pass it by
> reference. If you want the function to take exclusive ownership, move
> it. shared_ptr is not a lightweight object when comparing to a raw
> pointer. Use C++ rules for heavy objects.

Does that mean that shared_ptr reference counting is costly enough that it
should only be used when dealing with "heavy" objects? If so I don't see how
reference counting is a general solution superior to GC.

> Some exception handling implementations have zero overhead while
> exceptions are not thrown.

Not exactly. It's still there in the static data, and it also inhibits more
aggressive program flow optimizations.

> I don't know about digitalmars compilers.


> And I might be wrong with this statement, since I'm not a compiler
> writer.

The Windows DM compilers use the Windows Structured Exception Handling
mechanism (also used by VC), which does have runtime overhead, but it's a
fact of life needed to be compatible with Windows. The DOS and Linux DM
compilers use the static data method.

>> D is faster (see the benchmark).
> That benchamark shows that your D program is faster than the C++
> program written by others. Not that D is faster than C++.

I know the conventional wisdom dies hard, and it'll take more than a few
benchmarks to do it. But it's hardly a contrived benchmark, and the lengths
people need to go in C++ to get even close to D's speed on it should raise a
few eyebrows on that conventional wisdom.

>> > Apart from speed, what about size? How much memory do the garbage
>> > collected languages need?
>> For the heap, about 2 to 3 times what C++ needs. For the code, it's
>> significantly less, though it's hard to say how much less. All those
>> shared_ptr fiddlings add up.
> 2 to 3 times sounds terrible.

It would be on a 64K machine <g>. On today's machines, speed tends to be
much more important than memory usage. BTW, D still allows for manual memory
allocation (after all, you can overload operators new and delete to be
anything you want, including calling C's malloc/free). So when it is
necessary, like for some massive objects, it can be better to manually
allocate/delete it. It's up to the programmer. But most D programmers just
opt for gc because it's convenient, productive, and fast.

> If D is so good, propose it for standarization.

That'll happen in time. C++ wasn't standardized until it had been around for
15 years, and long before then people (like myself) thought it was very good
and used it heavilly. C certainly took even longer to get standardized.
Standardization does have its downsides, too. For one, it reduces the rate
of improvements to a language, even critically needed ones, to a glacial
pace.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2006, 5:44:12 PM3/20/06
to
kanze wrote:

> peter koch larsen wrote:
>> My proposal for using garbage collection in relation to C++
>> would be very close to the solution that is available to us
>> all today when using the Boehm garbage collector. I propose
>> that no action can take place on collection so there is no
>> kind of finalization. Unfortunately the discussion so far has
>> not touched the drawbacks of this approach, so perhaps I'm
>> missing something here.
>
>
> That's what I tend to favor as well. Not that Andrei doesn't
> have some interesting ideas, but the solution with the Boehm
> collector has the advantage of simplicity and existing
> experience -- we know exactly what the repercusions are.
>
> It does mean that if you allocate a class with a needed
> destructor, and you forget to call delete, the destructor won't
> get called, and there's nothing to remind you that you've made
> an error.

It all depends on what fights you pick. My proposal is not even an
attempt at advocacy - it simply says, "if you want A, B, and C, then the
consequences are D, E, and F".

My constraints largely were:

* Reconcile deterministic destruction with garbage collection

* Improve type safety

* Preserve as much as possible from pointer-based programming idioms

* Introduce as little complicatedness as possible

* Don't affect efficiency

The consequences were:

* You need two realms (I avoid using "heaps" because the actual heap can
be shared): the deterministic realm and the collected realm

* You can't have pointers out of the collected realm

* The deterministic realm must be transparently reference counted

* The designer of the class decides whether the class is collected or
deterministic

* An additional modifier allows typesafe manipulation of pointers to
stack variables

Now, some people either said they don't like one of the consequences.
They should add how changing the consequence to a more palatable one
ripples back to the initial goals. Others said they don't agree with the
constraints. It helps to set up some other constraints, argue their
desirability, and analyze the consequences. From what I can tell,
Boehm's proposal has as constraints:

* GC is optional

* Impact the language as little as possible

* Implementers can implement sevaral shades of preciseness in the
collector

Some less-desirable consequences are:

* Destructors aren't reconciled (or let's say, only partially
reconciled) with garbage collection

* Type safety is not improved

However, I do agree that the set of constraints Boehm's proposal has
chosen are very sensible.


Andrei

peter koch larsen

unread,
Mar 20, 2006, 11:45:06 PM3/20/06
to

Andrei Alexandrescu (See Website For Email) skrev:

> peter koch larsen wrote:
>> Andreis proposal to declare objects as either collectable or
>> deterministic is not understood by me - so far as I can tell,
>> there is
>> no need for this destinction if only we can demand some intelligence
>> from the compiler - and require a slight extension of the standard
>> C++
>> definition of a destructor:we keep the current definition of the
>> "trivial" destrutor but divide the non-trivial destructors into two:
>> lightweight-destructors and full destructors. Lightweight destructors
>> are destructors that only reclaim memory, whereas full destructors
>> might also release other resources.
>
> I've been coquetting for a short while with the idea that the compiler
> scans the destructor body and makes decisions based on that, but
> then I
> said, heck, let the programmer just specify the regime in the class
> definition.

I really do not see the advantage of doing that - except for the
problem with virtual base classes. Could you inform us about why that
idea was discarded? To be honest, I find your approach downright ugly -
i mean... having to state the same thing twice is something i dislike.

>
> Andrei

/Peter

David Abrahams

unread,
Mar 20, 2006, 11:46:04 PM3/20/06
to
"Andrei Alexandrescu (See Website For Email)"
<SeeWebsit...@erdani.org> writes:

> The one thing that shouldn't be allowed is to allocate a
> deterministic class on the GC heap.

Why? As long as you delete it deterministically, there should be no
problem.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

peter koch larsen

unread,
Mar 20, 2006, 11:48:07 PM3/20/06
to

kanze skrev:

> peter koch larsen wrote:
>
[snip]


>
>> That said, I can imagine application domains where GC would
>> come in real handy, so I do not see any reason to not have GC
>> built in as an optional extension. I am not sure that GC
>> should be supported on all platforms. At least it should be a
>> quality of implementation issue.
>
> Well, that's really the current status. Unless you mean
> "optional" for the user, as opposed to optional for the
> implementor. An hosted implementation today has to provide
> everything in the standard -- it doesn't have the right to skip
> iostream's, for example, on the grounds that some people don't
> use it or want it. If you don't like or want iostream, or if it
> isn't appropriate for your application, of course, you don't
> have to use it. That's about the situation I'd like to see for
> garbage collection.

I would like to have some more optionality so far as garbage collection
is concerned - even to the point where no collection takes place. This
is the only way we can continue to support C++ on small systems. On
larger systems such as Unix and Windows, the Boehm collector will put
implementors close to having a conforming implementation already.


>
>> The goal for GC in C++ should be:

[snip]


>
>> 2) All C++ programs enabling garbage collection should be able
>> to use current C++ libraries without modifications.
>
> That's slightly more difficult, unless by current C++ libraries,
> you mean only implementations of the standard library. In
> general, you can't tell if a third party library uses the xor
> technique on pointers or not.

Right... I should have exempted those programs.


>
> That is, in fact, one of the principle motives I see for
> standardizing garbage collection. As long as I limit myself to
> my own code, and the g++ implementation of the standard library
> (which is the only one I've really tested), the Boehm collector
> works just fine as a third party add in; there's pratically no
> need for standardization. On the other hand, if it were
> standard, I'd be more or less guaranteed that it would work with
> third party libraries as well.
>
> There's also an issue of what compilers can generate when
> optimizing. I haven't found this to be a problem, at least not
> with g++ on a Linux 64 bit PC. But that doesn't mean that it's
> not a problem somewhere -- for that matter, I've not done
> extensive tests at higher levels of optimization even on that
> platform. Standardization would protect against this.
>
> --
> James Kanze GABI Software

/Peter

Peter Dimov

unread,
Mar 20, 2006, 11:50:13 PM3/20/06
to
kanze wrote:
> Peter Dimov wrote:

>> Under this model, std::vector doesn't have a trivial
>> destructor (it allocates memory via its allocator, not via
>> new(gc), so it has to deallocate it.) One would need to write
>> a separate gc_vector (a resizeable wrapper over new(gc) T[])
>> with a trivial destructor for use with collected classes.
>
> I think the idea was to provide some sort of hook for
> meta-programming, so that the vector<T> could use new(gc) if the
> T were collectable, and normal new otherwise.

You could do that. It won't be a std::vector<T,A> anymore because of
the A, but it could be std::tr3::vector<T>. But it seems to me that a
vector that calls new(gc) is a different animal; it lets you keep
references after reallocation, for instance.

David Abrahams

unread,
Mar 20, 2006, 11:55:25 PM3/20/06
to
"Walter Bright" <wal...@nospamm-digitalmars.com> writes:

>> Reference counts have normally fixed size (with a default deleter,
>> for
>> example). What about segregate storage allocator for them? Amortized
>> constant time two pointer operation. Check boost::smart_ptr with
>> BOOST_SP_USE_QUICK_ALLOCATOR activated.
>
> If that's the way to use shared_ptr, why isn't it the default? Why,
> for a
> fairly straightforward application, do I need to tweak the library? Is
> BOOST_SP_USE_QUICK_ALLOCATOR part of TR1? If not, why not?

It's not, because whether or not to use such an optimized allocator is
an implementation detail that standard library vendors should feel
free to tune.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Thorsten Ottosen

unread,
Mar 21, 2006, 5:42:46 AM3/21/06
to
kanze wrote:
> Walter Bright wrote:

> Most languages, garbage collected or no, are significantly
> slower than C++ because significantly less effort (i.e. money)
> has been invested in their optimization. One blatant exception
> today is Java -- a great deal has been invested in optimizing it
> on PC's (but only on PC's), with the results that it is
> generally faster than C++ on PC's.

Do you have proof for this claim?

I recently tried
downloaded the newest Java package from Sun and timed a simple
while loop. VC8 won. And that test didn't even have a single
allcoation in it.

-Thorsten

Walter Bright

unread,
Mar 21, 2006, 5:48:49 AM3/21/06