Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

multithreading.

12 views
Skip to first unread message

mohang...@gmail.com

unread,
Mar 26, 2008, 3:24:41 PM3/26/08
to
hello everyone,
i want to use multithreading in c++,i have no previous experience of
implementing multithreading but from my operating system concepts i
know few concepts of multithreading .
can anyone please guide me how and were to begin from.
all help is apprecitated.
thank you
mohan gupta

Victor Bazarov

unread,
Mar 26, 2008, 3:28:33 PM3/26/08
to
mohang...@gmail.com wrote:
> i want to use multithreading in c++,i have no previous experience of
> implementing multithreading but from my operating system concepts i
> know few concepts of multithreading .
> can anyone please guide me how and were to begin from.

Begin with a good book. I recall that "Java Thread Programming" by
Paul Hyde gave me the needed push (never mind that it's for Java,
the principles are the same). I am sure that nowadays you can find
many a book on multithreading. Ask in 'comp.programming.threads'.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


Maarten Kronenburg

unread,
Mar 26, 2008, 4:38:20 PM3/26/08
to

wrote in message

The book "Cross-Platform Development in C++" by Syd Logan has a section on
threads, where Win32 threads, pthreads and NSPR threads are introduced.


Juha Nieminen

unread,
Mar 26, 2008, 4:49:03 PM3/26/08
to
mohang...@gmail.com wrote:
> i want to use multithreading in c++,i have no previous experience of
> implementing multithreading but from my operating system concepts i
> know few concepts of multithreading .
> can anyone please guide me how and were to begin from.
> all help is apprecitated.

http://bisqwit.iki.fi/story/howto/openmp/

Gianni Mariani

unread,
Mar 26, 2008, 5:20:23 PM3/26/08
to

Multithreading concepts can get very involved. Probably best to start
with a good book.

There are good cross platform C++ threading libraries though so try not
to use OS specific threading constructs, it will make your life much easier.

James Kanze

unread,
Mar 27, 2008, 4:50:05 AM3/27/08
to
On Mar 26, 8:28 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:

> mohangupt...@gmail.com wrote:
> > i want to use multithreading in c++,i have no previous experience of
> > implementing multithreading but from my operating system concepts i
> > know few concepts of multithreading .
> > can anyone please guide me how and were to begin from.

> Begin with a good book. I recall that "Java Thread
> Programming" by Paul Hyde gave me the needed push (never mind
> that it's for Java, the principles are the same). I am sure
> that nowadays you can find many a book on multithreading. Ask
> in 'comp.programming.threads'.

Along the same lines, the Butendorf is an excellent introductory
text: even though it is for C and Posix, almost everything in it
holds for C++, and an awful lot holds for Windows as well.
(Obviously, not the API names, but all of the considerations as
to when you need to lock, etc.)

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Alexander Dong Back Kim

unread,
Mar 27, 2008, 8:54:42 AM3/27/08
to

Hi Mohan,

As long as you know the concept of multithreading, then I would say
you are ready to write multithreading codes. Multithreading codes are
not that difficult as people think. Multithreading programming is, in
my opinion, a great fun to do allowing you to taste a bit of
programming.

However, there are a number of things you should know before you
start. First of all, when C++ was born, multithreading wasn't really
expected to be happened in programming world. In other words,
Multithreading is not supposed to be used in C++ world. BUT! don't
take this serisouly there are so many ways that you can deal with C++
multithreading programming. Second of all, according to some article(I
can't remember which one was...), when you design a multithreading
module, you should be extra careful about what you are really trying
to do. BUT! still just do fail few times and you will learn!!! so
don't give up =)

If you have bit of knowledge about algorithms, O-O concept and generic
programming, then I recommend you to look up a open source library C++
BOOST. Boost allows you to create multithreaded application so easily
and safely.

regards,
Alex D. B. Kim

mohi

unread,
Mar 27, 2008, 3:56:25 PM3/27/08
to
hello evreyone ,
when i last posted to ask you people to suggest me where to begin from
for multithreading in c++
a lot many good people suggested me to begin with a good book .
the only one of those many reply had boost library mentioned which i
have heard about before
so can anyone please suggest me a link where i can get good detailed
tutorial meant for a beginner or if u think other things are better to
start with (please also give the links to tutorials if possible ).
thank you
mohan

Maarten Kronenburg

unread,
Mar 27, 2008, 3:54:00 PM3/27/08
to

<> wrote in message

Multithreading has in my opinion two ways of applying C++ virtual functions:
(1) the thread caller function (e.g. in win32):
DWORD CALLBACK thread_call( void * arg )
{ base_thread * p = (base_thread *)arg;
( *p )();
}
where class base_thread has a pure virtual operator()():
virtual operator()() = 0;
and derive the different thread call classes from base_thread,
implementing different operator()(),
(2) the multi-threading class:
class multi_thread
{ base_thread * * the_threads;
public:
multi_thread( unsigned int );
~multi_thread();
void set_thread( unsigned int, base_thread * );
virtual void start_threads() = 0;
virtual void wait_threads() = 0;
virtual void close_threads() = 0;
}
multi_thread::multi_thread( unsigned int n )
{ the_threads = new base_thread * [ n ];
} etc.
and derive the different win32/pthread etc. implementations.
This way C++ runtime polymorphism is set to work in multithreading.
Maarten.


Ian Collins

unread,
Mar 27, 2008, 4:12:10 PM3/27/08
to

See the current thread with the same heading as yours.

--
Ian Collins.

Juha Nieminen

unread,
Mar 27, 2008, 4:38:54 PM3/27/08
to
mohi wrote:
> the only one of those many reply had boost library mentioned which i
> have heard about before

You won't even consider OpenMP? It's a much simpler way to use
multithreading in your C++ program. (Of course it requires compiler
support.)

gpderetta

unread,
Mar 27, 2008, 6:45:46 PM3/27/08
to

Other forms of threading like posix threads and win32 threads also
require compiler support (even if much less extensive).

IIRC, recent releases of Microsoft, Intel and GCC compilers, all
support OpenMP.

--
gpd

gpderetta

unread,
Mar 27, 2008, 6:49:58 PM3/27/08
to
On Mar 27, 9:50 am, James Kanze <james.ka...@gmail.com> wrote:
> On Mar 26, 8:28 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:
>
> > mohangupt...@gmail.com wrote:
> > > i want to use multithreading in c++,i have no previous experience of
> > > implementing multithreading but from my operating system concepts i
> > > know few concepts of multithreading .
> > > can anyone please guide me how and were to begin from.
> > Begin with a good book. I recall that "Java Thread
> > Programming" by Paul Hyde gave me the needed push (never mind
> > that it's for Java, the principles are the same). I am sure
> > that nowadays you can find many a book on multithreading. Ask
> > in 'comp.programming.threads'.
>
> Along the same lines, the Butendorf is an excellent introductory
> text: even though it is for C and Posix, almost everything in it
> holds for C++, and an awful lot holds for Windows as well.
> (Obviously, not the API names, but all of the considerations as
> to when you need to lock, etc.)
>
> --
> James Kanze (GABI Software) email:james.ka...@gmail.com

> Conseils en informatique orientée objet/
> Beratung in objektorientierter Datenverarbeitung
> 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Just to clarify, I think you meant _Butenhof_ book 'Programming with
POSIX threads'.

Googling 'Butendorf thread' returns nothing :)

--
gpd

Alexander Dong Back Kim

unread,
Mar 27, 2008, 6:55:33 PM3/27/08
to

Hi Juha,

Could you please give me some hints about benefits and drawbacks of
OpenMP and Boost?

Thanks a lot,
Alex

Alexander Dong Back Kim

unread,
Mar 27, 2008, 6:57:20 PM3/27/08
to


Hi Mohan,

If you go the offical boost web site, there is a well organised
documentation stuff. Although it seems it's not perfectly completed,
it's good enough to learn how you can use the library I think =)

Cheers,
Alex D. B. Kim

James Kanze

unread,
Mar 28, 2008, 4:55:10 AM3/28/08
to
On Mar 27, 11:49 pm, gpderetta <gpdere...@gmail.com> wrote:
> On Mar 27, 9:50 am, James Kanze <james.ka...@gmail.com> wrote:
> > On Mar 26, 8:28 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:

> > > mohangupt...@gmail.com wrote:
> > > > i want to use multithreading in c++,i have no previous experience of
> > > > implementing multithreading but from my operating system concepts i
> > > > know few concepts of multithreading .
> > > > can anyone please guide me how and were to begin from.
> > > Begin with a good book. I recall that "Java Thread
> > > Programming" by Paul Hyde gave me the needed push (never mind
> > > that it's for Java, the principles are the same). I am sure
> > > that nowadays you can find many a book on multithreading. Ask
> > > in 'comp.programming.threads'.

> > Along the same lines, the Butendorf is an excellent introductory
> > text: even though it is for C and Posix, almost everything in it
> > holds for C++, and an awful lot holds for Windows as well.
> > (Obviously, not the API names, but all of the considerations as
> > to when you need to lock, etc.)

> Just to clarify, I think you meant _Butenhof_ book


> 'Programming with POSIX threads'.

Yes. That's what comes of typing too quickly.

--
James Kanze (GABI Software) email:james...@gmail.com

James Kanze

unread,
Mar 28, 2008, 5:06:22 AM3/28/08
to
On Mar 27, 11:55 pm, Alexander Dong Back Kim <alexdb...@gmail.com>
wrote:

> On Mar 28, 7:38 am, Juha Nieminen <nos...@thanks.invalid> wrote:

> > mohi wrote:
> > > the only one of those many reply had boost library mentioned which i
> > > have heard about before

> > You won't even consider OpenMP? It's a much simpler way to
> > use multithreading in your C++ program. (Of course it
> > requires compiler support.)

> Could you please give me some hints about benefits and


> drawbacks of OpenMP and Boost?

They are designed to do completely different things. OpenMP, if
I understand it correctly, is designed largely to parallelize
operations on large arrays, in the presense of multiple cores.
Boost is designed for the classical "threading" applications,
where different threads are used to react to different events,
i.e. one thread can go off and perhaps execute some blocking
requests in response to one event, and the other threads will
continue handling other events.

Boost threads is a very low level wrapper around the system
threads, based more or less on the Posix model (mutexes and
conditions). AS such, I'd strongly recommend it, in
collaboration with the Butenhof (got it right this time, I
hope), for learning the low level basics. In any real
application, of course, you'll probably want to wrap it in
something a bit higher level. (Generally speaking, you'll want
a function to start detached threads, and a thread object with
more or less entity semantics for joinable threads. You can
also consider something like futures.)

Juha Nieminen

unread,
Mar 28, 2008, 5:47:01 AM3/28/08
to
Alexander Dong Back Kim wrote:
> Could you please give me some hints about benefits and drawbacks of
> OpenMP and Boost?

One advantage of OpenMP is that, if used properly, the program will
still compile just fine even with compilers with no OpenMP support, and
will work correctly (although only with a single thread).

Jon Harrop

unread,
Mar 28, 2008, 6:15:58 PM3/28/08
to

You should really look at a language with a concurrent garbage collector if
you want to do multithreading. C++ really falls down here because it lacks
a good foundation.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u

Chris Thomasson

unread,
Mar 28, 2008, 6:53:46 PM3/28/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13uqrd4...@corp.supernews.com...

> mohi wrote:
>> hello evreyone ,
>> when i last posted to ask you people to suggest me where to begin from
>> for multithreading in c++
>> a lot many good people suggested me to begin with a good book .
>> the only one of those many reply had boost library mentioned which i
>> have heard about before
>> so can anyone please suggest me a link where i can get good detailed
>> tutorial meant for a beginner or if u think other things are better to
>> start with (please also give the links to tutorials if possible ).
>
> You should really look at a language with a concurrent garbage collector
> if
> you want to do multithreading. C++ really falls down here because it lacks
> a good foundation.

Why do you think one needs a garbage collector in order to implement
multi-threaded algorithms?

Alexander Dong Back Kim

unread,
Mar 28, 2008, 8:14:07 PM3/28/08
to
> James Kanze (GABI Software)             email:james.ka...@gmail.com

> Conseils en informatique orientée objet/
>                    Beratung in objektorientierter Datenverarbeitung
> 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Thank you very much! What an impressive reply was that!!!
I compliment on your deep knowledge =)

Alexander Dong Back Kim

unread,
Mar 28, 2008, 8:16:28 PM3/28/08
to

same question here =)

Jon Harrop

unread,
Mar 28, 2008, 9:39:41 PM3/28/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13uqrd4...@corp.supernews.com...
>> You should really look at a language with a concurrent garbage collector
>> if you want to do multithreading. C++ really falls down here because it
>> lacks a good foundation.
>
> Why do you think one needs a garbage collector in order to implement
> multi-threaded algorithms?

Unless your allocation lifetimes happen to be statically well defined,
you'll essentially end up reinventing a garbage collector. That is hard
enough in single threaded applications but it is much harder in the
presence of multithreading because it is so error prone.

Chris Thomasson

unread,
Mar 28, 2008, 11:09:26 PM3/28/08
to
[comp.programming.threads added]

On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:

> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13uqrd4...@corp.supernews.com...
>>> You should really look at a language with a concurrent garbage collector
>>> if you want to do multithreading. C++ really falls down here because it
>>> lacks a good foundation.
>>
>> Why do you think one needs a garbage collector in order to implement
>> multi-threaded algorithms?
>
> Unless your allocation lifetimes happen to be statically well defined,
> you'll essentially end up reinventing a garbage collector. That is hard
> enough in single threaded applications but it is much harder in the
> presence of multithreading because it is so error prone.

That does not make sense. I choose the right lifetime management scheme
for the algorithms I need to implement. To this date, I have not
"needed" full-blown garbage collection. I mainly make use of distributed
reference counting and proxy collectors to manage the memory in
non-blocking algorithms. This is not reinventing a garbage collector at
all. BTW, garbage collectors have their share of problems; here are some
of them:


http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d


Marcin ‘Qrczak’ Kowalczyk

unread,
Mar 29, 2008, 5:35:43 AM3/29/08
to
Dnia 28-03-2008, Pt o godzinie 20:09 -0700, Chris Thomasson pisze:

> BTW, garbage collectors have their share of problems; here are some
> of them:
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d

The first one is not a problem with GC but with a poor choice of the
hash function of HashMap. (That a different GC implementation may not
expose the problem is another matter.)

The third one is not even related to GC.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

James Kanze

unread,
Mar 29, 2008, 7:27:55 AM3/29/08
to
On 28 mar, 23:15, Jon Harrop <use...@jdh30.plus.com> wrote:
> mohi wrote:

> > when i last posted to ask you people to suggest me where to
> > begin from for multithreading in c++ a lot many good people
> > suggested me to begin with a good book . the only one of
> > those many reply had boost library mentioned which i have
> > heard about before so can anyone please suggest me a link
> > where i can get good detailed tutorial meant for a beginner
> > or if u think other things are better to start with (please
> > also give the links to tutorials if possible ).

> You should really look at a language with a concurrent garbage
> collector if you want to do multithreading. C++ really falls
> down here because it lacks a good foundation.

The issues seem orthogonal to me, but FWIW: there are efficient
garbage collectors available for C++. I know. I use them.

--
James Kanze (GABI Software) email:james...@gmail.com

James Kanze

unread,
Mar 29, 2008, 7:33:47 AM3/29/08
to
On 29 mar, 04:09, Chris Thomasson <cris...@comcast.net> wrote:
> On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:
> > Chris Thomasson wrote:
> >> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> >>news:13uqrd4...@corp.supernews.com...
> >>> You should really look at a language with a concurrent garbage collector
> >>> if you want to do multithreading. C++ really falls down here because it
> >>> lacks a good foundation.

> >> Why do you think one needs a garbage collector in order to
> >> implement multi-threaded algorithms?

> > Unless your allocation lifetimes happen to be statically
> > well defined, you'll essentially end up reinventing a
> > garbage collector. That is hard enough in single threaded
> > applications but it is much harder in the presence of
> > multithreading because it is so error prone.

> That does not make sense. I choose the right lifetime
> management scheme for the algorithms I need to implement.

Object lifetime and garbage collection are orthogonal---garbage
collectors do NOT manage object lifetime. And of course, both
are orthogonal to threading---except that like just about
everything else, it's harder to implement the algorithms used
(garbage collector, or the alternatives) in the presence of
threads.

> To this date, I have not "needed" full-blown garbage
> collection.

You never "need" it. (For that matter, you never need C++, or
even C. You can do everything in assembler.)

> I mainly make use of distributed reference counting and proxy
> collectors to manage the memory in non-blocking algorithms.
> This is not reinventing a garbage collector at all. BTW,
> garbage collectors have their share of problems; here are some
> of them:

> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

Nothing is perfect, and there are contexts where garbage
collection may not be the answer. Or garbage collection mixed
with something else. It's a nice option to have, and makes
coding a few specific things easier, but it's not a silver
bullet.

Jon Harrop

unread,
Mar 29, 2008, 9:43:52 AM3/29/08
to
James Kanze wrote:
> ...efficient garbage collectors available for C++. I know. I use them.

I do not believe that is true. Have you benchmarked them?

Jon Harrop

unread,
Mar 29, 2008, 10:05:21 AM3/29/08
to
Chris Thomasson wrote:
> On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:
>> Unless your allocation lifetimes happen to be statically well defined,
>> you'll essentially end up reinventing a garbage collector. That is hard
>> enough in single threaded applications but it is much harder in the
>> presence of multithreading because it is so error prone.
>
> That does not make sense. I choose the right lifetime management scheme
> for the algorithms I need to implement. To this date, I have not
> "needed" full-blown garbage collection. I mainly make use of distributed
> reference counting and proxy collectors to manage the memory in
> non-blocking algorithms. This is not reinventing a garbage collector at
> all.

Reference counting was one of the earliest forms of garbage collection and
it is riddled with many very serious and well known problems. You are
literally reinventing the GC and you are now several decades out of date.

> BTW, garbage collectors have their share of problems; here are some
> of them:
>
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d

The thread you cite is a discussion about two different articles, neither of
which substantiate your claim:

The first is an article describing problems with a specific Java data
structure (WeakHashMap):

http://blogs.azulsystems.com/cliff/2007/08/why-weakhashmap.html

As the commenters on that post have already explained in detail, the
author's use of this data structure is wrong and many of his statements
about it are wrong because he didn't bother to read the documentation. For
example, read the comment:

"I'm hoping the irony in the title is intentional. There are several
layers of "suck" going on here and WeakHashMap is the least of them.
Assuming you have a correct hashcode and concurrent GC, as you said
everything gets cleared out really fast and the Javadocs explain all this."

i.e. the poster's problem is a direct result of the GC being too efficient
at collecting unused values. This problem is commonly cited in many GC'd
languages by newbies who are trying to abuse the garbage collector.

The second article describes a problem with memory fragmentation:

http://www.coversant.net/Coversant/Blogs/tabid/88/EntryID/9/Default.aspx

As the poster described, this problem is entirely a result of pinning, i.e.
manual memory management. As he explains, if the code were entirely
high-level with automatic memory management this problem would never have
occurred because the (moving) GC automatically defragments the heap.

Chris Thomasson

unread,
Mar 29, 2008, 8:23:15 PM3/29/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13usjnh...@corp.supernews.com...

> Chris Thomasson wrote:
>> On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:
>>> Unless your allocation lifetimes happen to be statically well defined,
>>> you'll essentially end up reinventing a garbage collector. That is hard
>>> enough in single threaded applications but it is much harder in the
>>> presence of multithreading because it is so error prone.
>>
>> That does not make sense. I choose the right lifetime management scheme
>> for the algorithms I need to implement. To this date, I have not
>> "needed" full-blown garbage collection. I mainly make use of distributed
>> reference counting and proxy collectors to manage the memory in
>> non-blocking algorithms. This is not reinventing a garbage collector at
>> all.
>
> Reference counting was one of the earliest forms of garbage collection and
> it is riddled with many very serious and well known problems.

cycles aside, can you list several other very serious problems? BTW, IMHO,
cycles are a red-herring and can usually be designed around.


> You are
> literally reinventing the GC and you are now several decades out of date.

How are virtually zero-overhead counting algorithms out-of-date? Please
explain...


>> BTW, garbage collectors have their share of problems; here are some
>> of them:
>>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d
>
> The thread you cite is a discussion about two different articles, neither
> of
> which substantiate your claim:
>
> The first is an article describing problems with a specific Java data
> structure (WeakHashMap):

[...]

How do you effectively cache object's in a "collect-the-world" environment?
GC and caching don't get along _sometimes_... From the applications point of
view an object in the cache is in a quiescent state. However, from the GC
perspective a node in the cache is a pointer to a "live object".

Chris Thomasson

unread,
Mar 29, 2008, 8:26:59 PM3/29/08
to
"James Kanze" <james...@gmail.com> wrote in message
news:5593adfa-6fbd-4b8a...@s50g2000hsb.googlegroups.com...

On 29 mar, 04:09, Chris Thomasson <cris...@comcast.net> wrote:
> > On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:
> > > Chris Thomasson wrote:
> > >> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> > >>news:13uqrd4...@corp.supernews.com...
> > >>> You should really look at a language with a concurrent garbage
> > >>> collector
> > >>> if you want to do multithreading. C++ really falls down here because
> > >>> it
> > >>> lacks a good foundation.

> > >> Why do you think one needs a garbage collector in order to
> > >> implement multi-threaded algorithms?

> > > Unless your allocation lifetimes happen to be statically
> > > well defined, you'll essentially end up reinventing a
> > > garbage collector. That is hard enough in single threaded
> > > applications but it is much harder in the presence of
> > > multithreading because it is so error prone.

> > That does not make sense. I choose the right lifetime
> > management scheme for the algorithms I need to implement.

> Object lifetime and garbage collection are orthogonal---garbage
> collectors do NOT manage object lifetime.

They determine when an object is quiescent which is probably the most
important aspect of lifetime management schemes in general.


> And of course, both
> are orthogonal to threading---except that like just about
> everything else, it's harder to implement the algorithms used
> (garbage collector, or the alternatives) in the presence of
> threads.

> > To this date, I have not "needed" full-blown garbage
> > collection.

> You never "need" it. (For that matter, you never need C++, or
> even C. You can do everything in assembler.)

Agreed.


> > I mainly make use of distributed reference counting and proxy
> > collectors to manage the memory in non-blocking algorithms.
> > This is not reinventing a garbage collector at all. BTW,
> > garbage collectors have their share of problems; here are some
> > of them:

> > http://groups.google.com/group/comp.programming.threads/browse_frm/th...

> Nothing is perfect, and there are contexts where garbage
> collection may not be the answer. Or garbage collection mixed
> with something else. It's a nice option to have, and makes
> coding a few specific things easier, but it's not a silver
> bullet.

Agreed.

Chris Thomasson

unread,
Mar 29, 2008, 8:52:13 PM3/29/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13usjnh...@corp.supernews.com...

> Chris Thomasson wrote:
>> On Sat, 29 Mar 2008 01:39:41 +0000, Jon Harrop wrote:
>>> Unless your allocation lifetimes happen to be statically well defined,
>>> you'll essentially end up reinventing a garbage collector. That is hard
>>> enough in single threaded applications but it is much harder in the
>>> presence of multithreading because it is so error prone.
>>
>> That does not make sense. I choose the right lifetime management scheme
>> for the algorithms I need to implement. To this date, I have not
>> "needed" full-blown garbage collection. I mainly make use of distributed
>> reference counting and proxy collectors to manage the memory in
>> non-blocking algorithms. This is not reinventing a garbage collector at
>> all.
>
> Reference counting was one of the earliest forms of garbage collection and
> it is riddled with many very serious and well known problems. You are
> literally reinventing the GC and you are now several decades out of date.
[...]

How can you say that when you have absolutely no idea what reference
counting algorithms I use, or how I use them? Anyway, the only type of GC
that I have found to be really useful is Proxy GC.

Jon Harrop

unread,
Mar 29, 2008, 8:52:16 PM3/29/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13usjnh...@corp.supernews.com...
>> Reference counting was one of the earliest forms of garbage collection
>> and it is riddled with many very serious and well known problems. You are
>> literally reinventing the GC and you are now several decades out of date.
> [...]
>
> How can you say that when you have absolutely no idea what reference
> counting algorithms I use, or how I use them?

Because the problems are a consequence of counting references. The precise
details don't matter beyond the fact that you're still counting references.

This has been well documented over the past 48 years.

Chris Thomasson

unread,
Mar 29, 2008, 9:12:48 PM3/29/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13utpke...@corp.supernews.com...

> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13usjnh...@corp.supernews.com...
>>> Reference counting was one of the earliest forms of garbage collection
>>> and it is riddled with many very serious and well known problems. You
>>> are
>>> literally reinventing the GC and you are now several decades out of
>>> date.
>> [...]
>>
>> How can you say that when you have absolutely no idea what reference
>> counting algorithms I use, or how I use them?
>
> Because the problems are a consequence of counting references. The precise
> details don't matter beyond the fact that you're still counting
> references.

The precise details don't matter? REALLY? Before I go on, please explain
yourself. You are starting to sound like you don't now any of the new
reference counting algorithms that are out there. Before I waste my time
trying to explain them to you, I need to hear your explanation on how the
details don't matter.

:^/


>
> This has been well documented over the past 48 years.

For what algorithm? You are talking non-sense because some of the new
counting tricks have only recently been invented.

Jon Harrop

unread,
Mar 29, 2008, 9:17:28 PM3/29/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13utpke...@corp.supernews.com...

>> Because the problems are a consequence of counting references. The
>> precise details don't matter beyond the fact that you're still counting
>> references.
>
> The precise details don't matter? REALLY? Before I go on, please explain
> yourself. You are starting to sound like you don't now any of the new
> reference counting algorithms that are out there. Before I waste my time
> trying to explain them to you, I need to hear your explanation on how the
> details don't matter.

I only just explained that and gave you a reference. Just read the
reference.

> You are talking non-sense because some of the new counting tricks have
> only recently been invented.

You make it sound as if reference counting is making a come-back. I'll give
you the benefit of the doubt though: can you cite any references indicating
that reference counting can be even vaguely competitive compared to a real
GC?

Jon Harrop

unread,
Mar 29, 2008, 9:28:34 PM3/29/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13usjnh...@corp.supernews.com...
>> Reference counting was one of the earliest forms of garbage collection
>> and it is riddled with many very serious and well known problems.
>
> cycles aside, can you list several other very serious problems?

Sure:

. Fragility.
. Memory overhead.
. Performance degradation.
. Fragmentation.
. Not incremental, i.e. awful worst case performance.
...

> BTW, IMHO, cycles are a red-herring and can usually be designed around.

Of course: by reinventing the GC.

>> You are literally reinventing the GC and you are now several decades out
>> of date.
>
> How are virtually zero-overhead counting algorithms out-of-date? Please
> explain...

Are you referring to your own patented algorithm for which there are no
verifiable results?

>>> BTW, garbage collectors have their share of problems; here are some
>>> of them:
>>>
>>
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d
>>
>> The thread you cite is a discussion about two different articles, neither
>> of
>> which substantiate your claim:
>>
>> The first is an article describing problems with a specific Java data
>> structure (WeakHashMap):
> [...]
>
> How do you effectively cache object's in a "collect-the-world"
> environment? GC and caching don't get along _sometimes_... From the
> applications point of view an object in the cache is in a quiescent state.
> However, from the GC perspective a node in the cache is a pointer to a
> "live object".

You are simply restating that author's misconceptions. Read the
documentation.

Chris Thomasson

unread,
Mar 29, 2008, 9:47:55 PM3/29/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13utr3l...@corp.supernews.com...

Okay, I will give just a couple of examples... You seem to think that one
needs to reference count every single object; you do not. A simple example
of this is a Proxy GC implemented with reference counting. Here are some
more details on the concept of proxy GC in general:

http://groups.google.com/group/comp.programming.threads/msg/41f29efe33e7f124

http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=proxy+gc

http://groups.google.com/group/comp.lang.c++/msg/e24a9459777ec430

You can download Joe Seighs atomic-ptr-plus package from here:

http://atomic-ptr-plus.sourceforge.net

http://sourceforge.net/project/showfiles.php?group_id=127837

Dmitriy V'jukov has created nice one here:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/42b035cd0280bb31


Here is my work in the area:

http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=vzoom&qt_g=Search+this+group

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220070067770%22.PGNR.&OS=DN/20070067770&RS=DN/20070067770


The vZOOM algorithm uses per-thread reference counts which don't need any
memory barriers or atomic operations in order to update them.

Here are some of my proxy collector that I released into public domain:

http://home.comcast.net/~vzoom/demos/pc_sample.c

http://appcore.home.comcast.net/misc/pc_sample_h_v1.html


Here is a reader/writer usage pattern:

http://groups.google.com/group/comp.programming.threads/msg/325a046ac6ed2800


Dmitriy V'jukov has also created this low-overhead reference counting
algorithm:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/dab22a17c32c6b13


Before I go on and on:

This is a lot of information to absorbed and you probably won't be able to
learn all of it any time soon. So, before you label the work cited above as
non-sense, please try and understand some of it. The folks over on
'comp.programming.threads' have been working with these types of algorithms
for almost a decade and we can help you out.

:^)

Chris Thomasson

unread,
Mar 29, 2008, 10:00:36 PM3/29/08
to
[comp.programming.threads added]


"Jon Harrop" <use...@jdh30.plus.com> wrote in message

news:13utrof...@corp.supernews.com...


> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13usjnh...@corp.supernews.com...
>>> Reference counting was one of the earliest forms of garbage collection
>>> and it is riddled with many very serious and well known problems.
>>
>> cycles aside, can you list several other very serious problems?
>
> Sure:
>
> . Fragility.
> . Memory overhead.
> . Performance degradation.
> . Fragmentation.
> . Not incremental, i.e. awful worst case performance.

Reference counting is fragile? I don't think so; its only as fragile as the
programmer using them. There are no silver bullets.


Memory overhead? No, a reference count releases the object when the
reference drops to zero which is more accurate than a traditional GC could
ever be. Are you talking about the extra word of space per-object? Did you
know that many garbage collectors use per-object meta-data as well?


Performance degradation? Again, I ask you what specific algorithms are you
talking about? It does matter.


Not incremental? Drill down on that one some more please.


> ...
>
>> BTW, IMHO, cycles are a red-herring and can usually be designed around.
>
> Of course: by reinventing the GC.
>
>>> You are literally reinventing the GC and you are now several decades out
>>> of date.
>>
>> How are virtually zero-overhead counting algorithms out-of-date? Please
>> explain...
>
> Are you referring to your own patented algorithm for which there are no
> verifiable results?

That's not the only one out there. Anyway, my counting algorihtm uses
per-thread counters that do not need atomics or membars to be mutated. BTW,
this algorihtm won be a T2000 from Sun:

https://coolthreads.dev.java.net
(the vzoom project)


It has made me some $$$ because I managed to license it to a few companies
who are very pleased with the results. One of them was involved with
creating embedded devices (QUADROS on an ARM9) and wanted to decrease the
overhead of their reference counting, vZOOM helped them out. Also, the
low-overhead memory allocator allowed them to create a heap out of QUADROS
tasks, they were VERY pleased with that because they were getting ready to
alter the OS source code:

http://groups.google.com/group/comp.programming.threads/msg/f6d16f5323311361
(last paragraph...)


>>>> BTW, garbage collectors have their share of problems; here are some
>>>> of them:
>>>>
>>>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d
>>>
>>> The thread you cite is a discussion about two different articles,
>>> neither
>>> of
>>> which substantiate your claim:
>>>
>>> The first is an article describing problems with a specific Java data
>>> structure (WeakHashMap):
>> [...]
>>
>> How do you effectively cache object's in a "collect-the-world"
>> environment? GC and caching don't get along _sometimes_... From the
>> applications point of view an object in the cache is in a quiescent
>> state.
>> However, from the GC perspective a node in the cache is a pointer to a
>> "live object".
>
> You are simply restating that author's misconceptions. Read the
> documentation.

I am asking you how to do it. Explain how to create effectively object
caches using "traditional" collect-the-world GC...

Jon Harrop

unread,
Mar 29, 2008, 10:56:03 PM3/29/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13utrof...@corp.supernews.com...

>> . Fragility.
>> . Memory overhead.
>> . Performance degradation.
>> . Fragmentation.
>> . Not incremental, i.e. awful worst case performance.
>
> Reference counting is fragile? I don't think so; its only as fragile as
> the programmer using them. There are no silver bullets.

You cannot have your cake and eat it: you said that the programmer must be
aware of the low-level details of their data structures in order to work
around the pathological performance problems inherent with reference
counting. That is a form of fragility. Fragmentation is one specific
example.

With a real GC you just fire and forget. Exceptional circumstances are
extremely rare. Improper use of weak hash tables is not an example of this.

> Memory overhead? No, a reference count releases the object when the
> reference drops to zero which is more accurate than a traditional GC could
> ever be.

That is commonly claimed but actually wrong. Scope can keep reference counts
above zero and values alive unnecessarily when a real GC can collect them
because they are unreferenced even though they are still in scope.

> Are you talking about the extra word of space per-object?

Yes.

> Did you know that many garbage collectors use per-object meta-data as
> well?

None that we use. Which GCs are you referring to?

> Performance degradation? Again, I ask you what specific algorithms are you
> talking about? It does matter.

I benchmarked all the mainstream reference counters for C++ many years ago.
If you think the situation has improved, perhaps we could benchmark C++
against some GC'd languages for collection-intensive tasks now?

> Not incremental? Drill down on that one some more please.

You actually just described the problem: reference counting releases values
when their reference drops to zero. At the end of a scope, many reference
counts can be zeroed at the same time and the ensuing collections can stall
the program for an unacceptable amount of time. This can also be extremely
difficult to workaround without simply resorting to a real GC.

We had this exact problem on a 250kLOC C++ product line and eventually fixed
it by rewriting everything from scratch in a more modern language. The
worst case performance is now 5x faster. We wasted a couple of months
trying to fix the problem in C++ before having the revelation that we were
just reinventing a garbage collector and doing a poor job of it at that.

>> You are simply restating that author's misconceptions. Read the
>> documentation.
>
> I am asking you how to do it. Explain how to create effectively object
> caches using "traditional" collect-the-world GC...

You use a weak hash table to cache data without keeping it alive.

You do not use a weak hash table to cache data that has no other references
to keep it alive on the assumption that the GC will be inefficient enough
for your cache to work.

Chris Thomasson

unread,
Mar 30, 2008, 12:22:54 AM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13uu0sg...@corp.supernews.com...

> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13utrof...@corp.supernews.com...
>>> . Fragility.
>>> . Memory overhead.
>>> . Performance degradation.
>>> . Fragmentation.
>>> . Not incremental, i.e. awful worst case performance.
>>
>> Reference counting is fragile? I don't think so; its only as fragile as
>> the programmer using them. There are no silver bullets.
>
> You cannot have your cake and eat it: you said that the programmer must be
> aware of the low-level details of their data structures in order to work
> around the pathological performance problems inherent with reference
> counting. That is a form of fragility. Fragmentation is one specific
> example.

Wrong. You completely misunderstood me. I was talking about the implementers
of refcounting. The implementation of reference counting should try and
avoid making calls into atomic instructions and memory barriers. This is
transparent to the user.


> With a real GC you just fire and forget. Exceptional circumstances are
> extremely rare. Improper use of weak hash tables is not an example of
> this.

Fire and forget? lol. You can get into big trouble if you fire and forget;
simple example of VERY poor programming practice. Improper use of weak
references is a direct example of the consequences that arise when a naive
programmer fires-and-forget. There are no silver bullets. GC is a tool, and
it has weaknesses.


>> Memory overhead? No, a reference count releases the object when the
>> reference drops to zero which is more accurate than a traditional GC
>> could
>> ever be.
>
> That is commonly claimed but actually wrong.

Please show me a 100% accurate GC; good luck finding one. Well, let me help
you... A proxy GC can be accurate indeed. But, they are hardly traditional.
In fact they were invented on comp.programming.threads by Joe Seigh.


> Scope can keep reference counts
> above zero and values alive unnecessarily when a real GC can collect them
> because they are unreferenced even though they are still in scope.

It will only collect anything when the GC decides to run!! It does not run
all the time. If it did the performance would be so crappy that nobody would
every even think about using it. If you think this is comparable to a proxy
GC then your simply not getting it. Anyway, traditional GC is NOT about
performance, its about helping some programmers that do not know how, or
want, to manage their memory manually.


>> Are you talking about the extra word of space per-object?
>
> Yes.

How is that different than GC meta-data? BTW, there are refcounting
algorithms that do not need any per-object meta-data. For instance, vZOOM
keeps this information on a per-thread basis. There is GC meta-data that
keeps track of objects. There has to be. Think about it.


>> Did you know that many garbage collectors use per-object meta-data as
>> well?
>
> None that we use. Which GCs are you referring to?

Many GC languages use object headers; Java. Or, keep their object meta-data
in another place. It does not matter because it all takes up memory.


>> Performance degradation? Again, I ask you what specific algorithms are
>> you
>> talking about? It does matter.
>
> I benchmarked all the mainstream reference counters for C++ many years
> ago.
> If you think the situation has improved, perhaps we could benchmark C++
> against some GC'd languages for collection-intensive tasks now?

C++ has no GC. Anyway, are you talking about a lock-free reader patterns?
What collection-intensive tasks? I know I can compete, and most likely beat,
a traditional GC with handcrafted implementations that C++ give me the
freedom to work with. C++ gives me the flexibility to use custom allocators
and I can use architecture specific optimizations. BTW, how do would you
construct a proper benchmark? What patterns? IMO, I think that a lock-free
reader pattern would be okay.


>> Not incremental? Drill down on that one some more please.
>
> You actually just described the problem: reference counting releases
> values
> when their reference drops to zero.

Yeah. This is NOT true with a traditional GC.


> At the end of a scope, many reference
> counts can be zeroed at the same time and the ensuing collections can
> stall
> the program for an unacceptable amount of time. This can also be extremely
> difficult to workaround without simply resorting to a real GC.

Stall? Are you sure you know how Proxy GC works? There is no stall. There is
no contrived introspection of thread stacks. There is no collect-the-world.
There is no mark-and-sweep. There is no thread suspension. Ect, ect... No
stop-the-world... None of that non-sense. Stall! Get real.


> We had this exact problem on a 250kLOC C++ product line and eventually
> fixed
> it by rewriting everything from scratch in a more modern language. The
> worst case performance is now 5x faster. We wasted a couple of months
> trying to fix the problem in C++ before having the revelation that we were
> just reinventing a garbage collector and doing a poor job of it at that.

Well, from your false comments on reference counting, I would expect you to
not be able to beat a tranditional garbage collector in any way shape or
form.


>>> You are simply restating that author's misconceptions. Read the
>>> documentation.
>>
>> I am asking you how to do it. Explain how to create effectively object
>> caches using "traditional" collect-the-world GC...
>
> You use a weak hash table to cache data without keeping it alive.

> You do not use a weak hash table to cache data that has no other
> references
> to keep it alive on the assumption that the GC will be inefficient enough
> for your cache to work.

Wrong, let me inform you that a cache keeps QUIESCENT objects for further
optimized allocations. Are you sure you know what a cache is? A cache keep
objects that have no references around to optimize further allocations. Do
you know about the Solaris slab allocator?

Chris Thomasson

unread,
Mar 30, 2008, 1:37:36 AM3/30/08
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:m4mdnbbimOjpiHLa...@comcast.com...

> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13uu0sg...@corp.supernews.com...
>> Chris Thomasson wrote:
>>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>>> news:13utrof...@corp.supernews.com...
>>>> . Fragility.
>>>> . Memory overhead.
>>>> . Performance degradation.
>>>> . Fragmentation.
>>>> . Not incremental, i.e. awful worst case performance.
>>>
>>> Reference counting is fragile? I don't think so; its only as fragile as
>>> the programmer using them. There are no silver bullets.
>>
>> You cannot have your cake and eat it: you said that the programmer must
>> be
>> aware of the low-level details of their data structures in order to work
>> around the pathological performance problems inherent with reference
>> counting. That is a form of fragility. Fragmentation is one specific
>> example.
>
> Wrong. You completely misunderstood me. I was talking about the
> implementers of refcounting. The implementation of reference counting
> should try and avoid making calls into atomic instructions and memory
> barriers. This is transparent to the user.
[...]

Perhaps I misunderstood you... I did state that ref-counting is only as
fragile as the programmer using it. Well, guess what, that applies to GC as
well:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d

Ref-counting and GC have their problems. Are you trying to convince me that
GC is just there and you can fire objects at it and forget? Oh, wait... You
did say exactly that. Remember?

Jerry Coffin

unread,
Mar 30, 2008, 3:19:57 PM3/30/08
to
In article <13utr3l...@corp.supernews.com>, use...@jdh30.plus.com
says...

[ ... ]

> You make it sound as if reference counting is making a come-back. I'll give
> you the benefit of the doubt though: can you cite any references indicating
> that reference counting can be even vaguely competitive compared to a real
> GC?

Paul Wilson's garbage collector survey, section 2.1.

http://www.cs.utexas.edu/ftp/pub/garbage/gcsurvey.ps

Most of what you've said about reference counting is complete nonsense.
For example, you claim that one of the problems with reference counting
is:

> . Not incremental, i.e. awful worst case performance.

As the reference above makes clear to anybody capable of reading at all:

One advantage of reference counting is the _incremental_
nature of most of its operation--garbage collection work
(updating rerence counts) is interleaved closely with the
running program's own execution. It can easily be made
completely incremental and _real time_; that is,
performing at most a small and bounded amount of work per
unit of program execution.

he goes on to say:

This is not to say that reference counting is a dead
technique. It still has advantages in terms of the
immediacy with which it reclaims most garbage, and
corresponding beneficial effects on locality of reference;
a reference counting system may perform with little
degradation when almost all of the heap space is occupied
by live objects, while other collectors rely on trading
more space for higher efficiency. Reference counts
themselves may be valuable in some systems. For example,
they may support optimizations in functional garbage
collection implementations by allowing destrutive modification
to uniquely-referenced objects. Distributed garbage collection
is often done with reference-counting between nodes of a
distributed system, combined with mark-sweep or copying
collection within a node. Future systems may find other uses
for reference counting, perhaps in hybrid collectors also
involving other techniques, or when augmented by specialized
hardware.

He mentions two points of particular interest with respect to reference
counting making a "comeback". The difference in speed between CPUs and
main memory seems to be growing, forcing ever-greater dependence on
caching -- which means that the improved locality of reference with
reference counting means more all the time. This also means that the
primary performance cost of reference counting (having to update counts)
now means relatively little as a rule -- in most cases, a few extra
operations inside the CPU mean nearly nothing, while extra references to
main memory mean a great deal.

Likewise, individual computers now correspond much more closely to the
distributed systems at the time of the survey. In particlar, it's now
quite common to see a number of separate OS instances running in virtual
machines in a single box.

You opinions on garbage collection reflect a tremendous enthusiasm, but
equally tremendous ignorance of the subject matter.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jon Harrop

unread,
Mar 30, 2008, 4:35:34 PM3/30/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13uu0sg...@corp.supernews.com...

>> You cannot have your cake and eat it: you said that the programmer must
>> be aware of the low-level details of their data structures in order to
>> work around the pathological performance problems inherent with reference
>> counting. That is a form of fragility. Fragmentation is one specific
>> example.
>
> Wrong. You completely misunderstood me. I was talking about the
> implementers of refcounting. The implementation of reference counting
> should try and avoid making calls into atomic instructions and memory
> barriers. This is transparent to the user.

A data structure implementor using reference counting must be aware of the
overhead of reference counting and work around it by amortizing reference
counts whenever possible. You yourself only just described this in the
context of performance.

>>> Memory overhead? No, a reference count releases the object when the
>>> reference drops to zero which is more accurate than a traditional GC
>>> could ever be.
>>
>> That is commonly claimed but actually wrong.
>

> Please show me a 100% accurate GC; good luck finding one...

That is irrelevant. Your argument in favor of reference counting was
completely fallacious.

You claimed was that reference counting is "more accurate than a traditional
GC could ever be". Consider:

{
Bar bar()
f(bar);
g()
}

Reference counting will keep "bar" alive until its reference count happens
to be zeroed when it drops out of scope even though it is not reachable
during the call to "g()". Real garbage collectors can collect "bar" during
the call to "g" because it is no longer reachable.

So GCs can clearly collect sooner than reference counters.

>> Scope can keep reference counts above zero and values alive unnecessarily
>> when a real GC can collect them because they are unreferenced even though
>> they are still in scope.
>
> It will only collect anything when the GC decides to run!

Which may well be before the scope happens to end.

> Anyway, traditional GC is NOT about performance, its about helping some
> programmers that do not know how, or want, to manage their memory
> manually.

By the same logic: C++ is for programmers who don't know, or want, to write
assembler manually.

>>> Are you talking about the extra word of space per-object?
>>
>> Yes.
>
> How is that different than GC meta-data?

Reference counts consume a lot more space.

> BTW, there are refcounting
> algorithms that do not need any per-object meta-data. For instance, vZOOM
> keeps this information on a per-thread basis. There is GC meta-data that
> keeps track of objects. There has to be. Think about it.

Yes. For example, OCaml hides two bits inside each pointer totalling zero
overhead. In contrast, a reference counting system is likely to add a
machine word, bloating each and every value by 8 bytes unnecessarily on
modern hardware.

>>> Did you know that many garbage collectors use per-object meta-data as
>>> well?
>>
>> None that we use. Which GCs are you referring to?
>
> Many GC languages use object headers; Java.

Just Java?

>> I benchmarked all the mainstream reference counters for C++ many years
>> ago. If you think the situation has improved, perhaps we could benchmark
>> C++ against some GC'd languages for collection-intensive tasks now?
>
> C++ has no GC. Anyway, are you talking about a lock-free reader patterns?

Let's do a single-threaded benchmark first.

> What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the arbitrary-precision
ints with doubles.

> I know I can compete, and most likely beat, a traditional GC with
> handcrafted implementations that C++ give me the freedom to work with.

I admire your faith but I would like to see some evidence to back up such
claims because they run contrary to common wisdom accumulated over the past
few decades.

>>> Not incremental? Drill down on that one some more please.
>>
>> You actually just described the problem: reference counting releases
>> values when their reference drops to zero.
>
> Yeah. This is NOT true with a traditional GC.

Absolutely. GCs do not have to wait that long: they can collect sooner.

>> At the end of a scope, many reference
>> counts can be zeroed at the same time and the ensuing collections can
>> stall the program for an unacceptable amount of time. This can also be
>> extremely difficult to workaround without simply resorting to a real GC.
>
> Stall? Are you sure you know how Proxy GC works? There is no stall. There
> is no contrived introspection of thread stacks. There is no
> collect-the-world. There is no mark-and-sweep. There is no thread
> suspension. Ect, ect...

You've just made a series of irrelevant references here.

>> We had this exact problem on a 250kLOC C++ product line and eventually
>> fixed
>> it by rewriting everything from scratch in a more modern language. The
>> worst case performance is now 5x faster. We wasted a couple of months
>> trying to fix the problem in C++ before having the revelation that we
>> were just reinventing a garbage collector and doing a poor job of it at
>> that.
>
> Well, from your false comments on reference counting, I would expect you
> to not be able to beat a tranditional garbage collector in any way shape
> or form.

Of course. I fully expect you to fail on the above benchmark but am willing
to give you the benefit of the doubt.

>>>> You are simply restating that author's misconceptions. Read the
>>>> documentation.
>>>
>>> I am asking you how to do it. Explain how to create effectively object
>>> caches using "traditional" collect-the-world GC...
>>
>> You use a weak hash table to cache data without keeping it alive.
>
>> You do not use a weak hash table to cache data that has no other
>> references
>> to keep it alive on the assumption that the GC will be inefficient enough
>> for your cache to work.
>
> Wrong, let me inform you that a cache keeps QUIESCENT objects for further
> optimized allocations.

That would be a specific kind of cache. Caches do not inherently have
anything to do with allocation.

> Are you sure you know what a cache is? A cache keep objects that have no
> references around to optimize further allocations.

Apparently what you wanted to ask me was how I would implement an allocation
cache in a garbage collected language.

The answer depends upon the language. If you look at OCaml, for example,
there is no need to because the run-time has already automated this.

> Do you know about the Solaris slab allocator?

No.

Jon Harrop

unread,
Mar 30, 2008, 4:50:57 PM3/30/08
to
Chris Thomasson wrote:
> Perhaps I misunderstood you... I did state that ref-counting is only as
> fragile as the programmer using it. Well, guess what, that applies to GC
> as well:
>
>
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d
>
> Ref-counting and GC have their problems. Are you trying to convince me
> that GC is just there and you can fire objects at it and forget? Oh,
> wait... You did say exactly that. Remember?

We already discussed that exact reference and it has nothing to do with
garbage collection. Read the comments left on his blog post.

Jon Harrop

unread,
Mar 30, 2008, 5:01:57 PM3/30/08
to
Jerry Coffin wrote:
> In article <13utr3l...@corp.supernews.com>, use...@jdh30.plus.com
> says...
>> You make it sound as if reference counting is making a come-back. I'll
>> give you the benefit of the doubt though: can you cite any references
>> indicating that reference counting can be even vaguely competitive
>> compared to a real GC?
>
> Paul Wilson's garbage collector survey, section 2.1.
>
> http://www.cs.utexas.edu/ftp/pub/garbage/gcsurvey.ps

That is 16 years out of date.

> Most of what you've said about reference counting is complete nonsense.

Then why is your best counterexample ancient history?

And sixteen years on his prophecy has not come true: we do not have
dedicated hardware for reference counting and none of the major GCs are
based upon reference counting.

Python and Mathematica are reference counted but both are slow and suffer
from stalls due to lack of incremental collection.

> He mentions two points of particular interest with respect to reference
> counting making a "comeback". The difference in speed between CPUs and
> main memory seems to be growing, forcing ever-greater dependence on
> caching --

Yes.

> which means that the improved locality of reference with
> reference counting means more all the time. This also means that the
> primary performance cost of reference counting (having to update counts)
> now means relatively little as a rule -- in most cases, a few extra
> operations inside the CPU mean nearly nothing, while extra references to
> main memory mean a great deal.

No. Locality is worse with reference counting which is why it has gotten
relatively slower since that ancient survey. You can easily test this
yourself.

> Likewise, individual computers now correspond much more closely to the
> distributed systems at the time of the survey. In particlar, it's now
> quite common to see a number of separate OS instances running in virtual
> machines in a single box.
>
> You opinions on garbage collection reflect a tremendous enthusiasm, but
> equally tremendous ignorance of the subject matter.

Yet you cannot cite a single relevant reference or piece of objective
evidence to support an argument that flies in the face of almost all modern
software development (outside restricted memory environments).

Ioannis Vranos

unread,
Mar 30, 2008, 5:12:01 PM3/30/08
to
mohang...@gmail.com wrote:
> hello everyone,
> i want to use multithreading in c++,i have no previous experience of
> implementing multithreading but from my operating system concepts i
> know few concepts of multithreading .
> can anyone please guide me how and were to begin from.
> all help is apprecitated.
> thank you
> mohan gupta


Multithreading is a platform-specific feature. Check the documentation
of the API you are going to use (e.g. MFC, .NET, QT, GTKMM, wxWidgets,
SDL, etc).

Dmitriy V'jukov

unread,
Mar 30, 2008, 5:27:01 PM3/30/08
to
On 31 мар, 00:35, Jon Harrop <use...@jdh30.plus.com> wrote:

> > Please show me a 100% accurate GC; good luck finding one...
>
> That is irrelevant. Your argument in favor of reference counting was
> completely fallacious.
>
> You claimed was that reference counting is "more accurate than a traditional
> GC could ever be". Consider:
>
> {
> Bar bar()
> f(bar);
> g()
> }
>
> Reference counting will keep "bar" alive until its reference count happens
> to be zeroed when it drops out of scope even though it is not reachable
> during the call to "g()". Real garbage collectors can collect "bar" during
> the call to "g" because it is no longer reachable.


Can Real garbage collectors do this w/o help of the compiler? I'm not
sure.
With compiler support reference-counting can easily collect bar
precisely at the end of f(). Don't you think so?


Dmitriy V'jukov

Barry Kelly

unread,
Mar 30, 2008, 6:02:14 PM3/30/08
to
Jon Harrop wrote:

> Jerry Coffin wrote:
> > You opinions on garbage collection reflect a tremendous enthusiasm, but
> > equally tremendous ignorance of the subject matter.
>
> Yet you cannot cite a single relevant reference or piece of objective
> evidence to support an argument that flies in the face of almost all modern
> software development (outside restricted memory environments).

Of course he can't. That's because he's living in the world of 16 years
and more ago.

I wouldn't waste so much time with Jerry, Chris and others of their ilk.
They're so steeped in obsolete experience that they're blind even to the
present, much less the future. Even Einstein never accepted quantum
theory.

"All truth goes through three stages. First it is ridiculed; then it is
violently opposed; finally it is accepted as self-evident."
-- Schopenhauer

PS: the old physicists eventually die; and software is a field that's
kindest to the young.

-- Barry

--
http://barrkel.blogspot.com/

Jon Harrop

unread,
Mar 30, 2008, 5:57:47 PM3/30/08
to
Dmitriy V'jukov wrote:

> On 31 ???, 00:35, Jon Harrop <use...@jdh30.plus.com> wrote:
>> > Please show me a 100% accurate GC; good luck finding one...
>>
>> That is irrelevant. Your argument in favor of reference counting was
>> completely fallacious.
>>
>> You claimed was that reference counting is "more accurate than a
>> traditional GC could ever be". Consider:
>>
>> {
>> Bar bar()
>> f(bar);
>> g()
>> }
>>
>> Reference counting will keep "bar" alive until its reference count
>> happens to be zeroed when it drops out of scope even though it is not
>> reachable during the call to "g()". Real garbage collectors can collect
>> "bar" during the call to "g" because it is no longer reachable.
>
> Can Real garbage collectors do this w/o help of the compiler?

Absolutely, yes. I just checked this with a couple of GC'd languages and
they both collected "bar" before the end of its scope, i.e. sooner than
referencing counting can.

The GC is totally unaware of scope (i.e. how the programmer happened to lay
out the code) and is only aware of the global roots and current dependency
graph in the heap while the program runs. This is typically traversed in
slices of major collection that occur at allocations and when functions
return, many of which may occur during the call to "g()" and any of which
are likely to collect "bar" and anything therein that is also unreferenced.

> With compiler support reference-counting can easily collect bar
> precisely at the end of f(). Don't you think so?

In general, that is not solvable because the complete dependency information
is only ever available at run time. So even in theory it is only possible
to trim down value lifetimes and never achieve optimality using only static
analysis. In other words, the compiler could do better but never as well as
run time analysis like a real GC.

In practice, the nearest working solutions I am aware of are based
upon "regioning": the MLKit SML compiler and Stalin Scheme compiler.
However, once you have used regioning to statically determine object
lifetimes, reference counts are obsolete. Consequently, these approaches
make no use of referencing counting.

Jon Harrop

unread,
Mar 30, 2008, 6:00:32 PM3/30/08
to
Barry Kelly wrote:
> Jon Harrop wrote:
>> Jerry Coffin wrote:
>> > You opinions on garbage collection reflect a tremendous enthusiasm, but
>> > equally tremendous ignorance of the subject matter.
>>
>> Yet you cannot cite a single relevant reference or piece of objective
>> evidence to support an argument that flies in the face of almost all
>> modern software development (outside restricted memory environments).
>
> Of course he can't. That's because he's living in the world of 16 years
> and more ago.
>
> I wouldn't waste so much time with Jerry, Chris and others of their ilk.
> They're so steeped in obsolete experience that they're blind even to the
> present, much less the future. Even Einstein never accepted quantum
> theory.

Agreed. I hadn't realised this until I Googled and discovered that they
constitute a small clique of intraciting kooks on usenet. :-)

Ian Collins

unread,
Mar 30, 2008, 6:15:43 PM3/30/08
to
Barry Kelly wrote:
> Jon Harrop wrote:
>
>> Jerry Coffin wrote:
>>> You opinions on garbage collection reflect a tremendous enthusiasm, but
>>> equally tremendous ignorance of the subject matter.
>> Yet you cannot cite a single relevant reference or piece of objective
>> evidence to support an argument that flies in the face of almost all modern
>> software development (outside restricted memory environments).
>
> Of course he can't. That's because he's living in the world of 16 years
> and more ago.
>
> I wouldn't waste so much time with Jerry, Chris and others of their ilk.
> They're so steeped in obsolete experience that they're blind even to the
> present, much less the future.

Have you ever read Chris' posts on c.p.t? If not, you should.

--
Ian Collins.

Ian Collins

unread,
Mar 30, 2008, 6:19:15 PM3/30/08
to
But surely that requires coupling between the compiler and the GC? The
latter has to provide the data used by the former.

--
Ian Collins.

Jon Harrop

unread,
Mar 30, 2008, 6:19:37 PM3/30/08
to
Ian Collins wrote:
> But surely that requires coupling between the compiler and the GC? The
> latter has to provide the data used by the former.

The GC places requirements on the code generated by the compiler. This is
done when the language implementation is designed though, and not at
compile time or run time.

Is that what you meant?

Ian Collins

unread,
Mar 30, 2008, 6:46:07 PM3/30/08
to
Jon Harrop wrote:
> Ian Collins wrote:
>> But surely that requires coupling between the compiler and the GC? The
>> latter has to provide the data used by the former.
>
> The GC places requirements on the code generated by the compiler. This is
> done when the language implementation is designed though, and not at
> compile time or run time.
>
> Is that what you meant?
>
Yes and think Dmitriy did as well when he asked "can Real garbage

collectors do this w/o help of the compiler?"

So it does require help from the compiler to generate appropriate code.

An after-the-fact GC library would not be able to do as you described.

--
Ian Collins.

Szabolcs Ferenczi

unread,
Mar 30, 2008, 6:59:48 PM3/30/08
to
On Mar 26, 9:24 pm, mohangupt...@gmail.com wrote:
> hello everyone,
> i want to use multithreading in c++,i have no previous experience of
> implementing multithreading but from my operating system concepts i
> know few concepts of multithreading .
> can anyone please guide me how and were to begin from.
> all help is apprecitated.

If you know some concepts of multi-threading from your operating
systems concepts, go ahead and apply them. Just pick up some canonical
examples, e.g. the producer-consumer problem, the dining philosophers
problem, or any other ones. Pick up Pthreads or Boost library and
adapt the examples in C/C++. Later on, you can also formulate the
algorithms in your favourite notation and hand-compile them into the
available library calls.

If you need further help, just let me know.

Best Regards,
Szabolcs

Jon Harrop

unread,
Mar 30, 2008, 6:52:08 PM3/30/08
to
Ian Collins wrote:
> So it does require help from the compiler to generate appropriate code.

Yes.

> An after-the-fact GC library would not be able to do as you described.

The language implementation is now split into the compiler and the GC which
must be designed to cooperate.

Jerry Coffin

unread,
Mar 30, 2008, 7:35:57 PM3/30/08
to
In article <13v00gd...@corp.supernews.com>, use...@jdh30.plus.com
says...

> Jerry Coffin wrote:
> > In article <13utr3l...@corp.supernews.com>, use...@jdh30.plus.com
> > says...
> >> You make it sound as if reference counting is making a come-back. I'll
> >> give you the benefit of the doubt though: can you cite any references
> >> indicating that reference counting can be even vaguely competitive
> >> compared to a real GC?
> >
> > Paul Wilson's garbage collector survey, section 2.1.
> >
> > http://www.cs.utexas.edu/ftp/pub/garbage/gcsurvey.ps
>
> That is 16 years out of date.

It's 16 years old. The wheel is far older. Neither is out of date at
all.



> > Most of what you've said about reference counting is complete nonsense.
>
> Then why is your best counterexample ancient history?

It's not an example, it's a survey. As noted above, it remains as
relevant as the wheel.



> > This is not to say that reference counting is a dead
> > technique. It still has advantages in terms of the
> > immediacy with which it reclaims most garbage, and
> > corresponding beneficial effects on locality of reference;
> > a reference counting system may perform with little
> > degradation when almost all of the heap space is occupied
> > by live objects, while other collectors rely on trading
> > more space for higher efficiency. Reference counts
> > themselves may be valuable in some systems. For example,
> > they may support optimizations in functional garbage
> > collection implementations by allowing destrutive modification
> > to uniquely-referenced objects. Distributed garbage collection
> > is often done with reference-counting between nodes of a
> > distributed system, combined with mark-sweep or copying
> > collection within a node. Future systems may find other uses
> > for reference counting, perhaps in hybrid collectors also
> > involving other techniques, or when augmented by specialized
> > hardware.
>
> And sixteen years on his prophecy has not come true: we do not have
> dedicated hardware for reference counting and none of the major GCs are
> based upon reference counting.

You're simply displaying still more of your ignorance. Yes, there is
hardware that has dedicated reference counting capability. I know this
with absolute certainty, because I've designed precisely such hardware.

You claim that: "none of the major GCs are [sic] based upon reference
counting", but then:



> Python and Mathematica are reference counted but both are slow and suffer
> from stalls due to lack of incremental collection.

You cite two major GCs that _are_ based on reference counting.

Of course, as evidence of anything except that some major GCs are based
on reference counting, these mean precisely nothing -- you've shown
nothing about how much of their time is spent on reference counting vs.
other issues. As has already been pointed out, reference counting CAN be
made incremental (in fact, by its very nature it's much more incremental
than most other forms of GC) so your claim that they suffer due to the
lack of incremental collection lacks credibility, even at best.

In reality, making a collector incremental generally makes it marginally
_slower_ than an otherwise similar, but non-incremental collector. A
collector that executes all at once has to start with a heap in a
consistent state, and ensure that it is returned to a consistent state
when finished. An incremental collector must set the heap to a
consistent state at the end of each collection increment, instead of
only once at the very end of operation.

In addition, an incremental collector must restore its state each time a
collection increment begins and save its state each time a collection
increment ends. This obviously adds time to its overall execution as
well.



> > He mentions two points of particular interest with respect to reference
> > counting making a "comeback". The difference in speed between CPUs and
> > main memory seems to be growing, forcing ever-greater dependence on
> > caching --
>
> Yes.
>
> > which means that the improved locality of reference with
> > reference counting means more all the time. This also means that the
> > primary performance cost of reference counting (having to update counts)
> > now means relatively little as a rule -- in most cases, a few extra
> > operations inside the CPU mean nearly nothing, while extra references to
> > main memory mean a great deal.
>
> No. Locality is worse with reference counting which is why it has gotten
> relatively slower since that ancient survey. You can easily test this
> yourself.

I have. You're wrong. The fact that you'd claim this at all indicates
that you're utterly clueless about how GC works at all.

In reference counting, you have an object and the reference count is
part of that object. The reference count is never touched except when
there is an operation on the object, so locality of reference is usually
nearly perfect -- the only exception being when/if a cache line boundary
happens to fall precisely between the rest of the object and the
reference count. In many implementations, this isn't even possible; in
the relatively rare situation that it's possible, it's still quite rare.

In other garbage collection, you start from a root set of objects and
walk through everything they point at to find all the active objects.
You then collect those that are no longer in use. When carried out as a
simple mark/sweep collector, the locality of reference of this is
absolutely _terrible_ -- every active object gets touched, and all its
pointers followed at _every_ collection cycle.

Modern collectors (e.g. generational scavengers) attempt to ameliorate
this massive problem. They do this by assuming that when an object has
survived enough cycles of collection that the object will probably
survive many more cycles as well, so the object (and everything to which
it refers) is moved to another area where it those objects are simply
treated as permanent.

The first work along this line simply created two areas, one for
transient objects and another for permanent objects. More recent work
creates more or less a gradient, in which objects that have survived the
longest are inspected the last often, and as objects survive collection
cycles, they are slowly promoted up the hierarchy until they are almost
never inspected for the possibility of being garbage.

This improves locality of reference compared to old mark-sweep
collectors -- but it still has substantially worse locality of reference
than a typical reference counting system. In particular, in reference
counting, the reference count is NEVER touched unless an operation that
creates or destroys a reference to that object takes place.

By contrast, even a generational scavenger will chase pointers through
objects many times before the object is promoted to the point that the
frequency of collection on that object is reduced. Even then, in the
gradient-style system, the frequency of chasing pointers through that
object is reduced only slightly until many more fruitless attempts at
collecting it have been made -- at which point, the collection frequency
is again reduced, but (again) not stopped by any means.

This, of course, points to another serious problem with most such
garbage collectors: they are based on the assumption that the longer an
object has lived, the less likely that object is to die at any given
point in time. In short, it assumes that object usage tends to follow a
roughly stack-like model.

That's certainly often true -- but it's equally certainly not always
true. Quite the contrary, some types of applications tend to follow
something much closer to a queue model, in which most objects have
roughly similar life spans, so the longer an object has lived, the MORE
likely it is to die at any given point in time.

For applications that follow this pattern, most of the relatively recent
work in garbage collectors is not only useless, but actively
detrimental. The GC spends nearly all its time inspecting objects that
are not likely to die any time soon at all, while ignoring nearly all
the objects that are already dead or likely to die soon. The result is
lots of time wasted copying objects AND lots of memory wasted storing
objects that are actually dead.



> > Likewise, individual computers now correspond much more closely to the
> > distributed systems at the time of the survey. In particlar, it's now
> > quite common to see a number of separate OS instances running in virtual
> > machines in a single box.
> >
> > You opinions on garbage collection reflect a tremendous enthusiasm, but
> > equally tremendous ignorance of the subject matter.
>
> Yet you cannot cite a single relevant reference or piece of objective
> evidence to support an argument that flies in the face of almost all modern
> software development (outside restricted memory environments).

Quite the contrary: the reference provided was completely relevant and
remains as accurate today as it was when it was new. The techniques to
make reference counting incremental and, if necessary, real-time work
today, just like they did 16 years ago.

Some types of data become obsolete quickly. The specific details of a
particular model of CPU that was current 16 years ago would have little
relevance today.

Other kinds of information remain valid and accurate essentially
permanently. The wheel has been known for thousands of years, and far
from becoming obsolete, its use grows every year. Many of the basic
algorithms of computer science, such as binary searching, heap sorting,
merge sorting, and yes, reference counting, are much the same -- they've
been with us for essentially the entire history of computing, but each
remains as relevant today as when it was new.

Of course, this isn't really a binary situation: there's a gradient all
the way from the wheel (thousands of years and growing) to specific CPUs
(sometimes outdated in a matter of days).

It's obvious that if reference counting could be made incremental 16
years ago, it still can. Your claim to the contrary was absolutely
false.

Much worse, however, is your apparently inability to recognize that this
information is of the sort that doesn't become out of date quickly, or
ever for that matter. Once the technique to make reference counting
incremental is discovered, that technique remains usable essentially
permanently. Mere passage of time will not change the fact that
reference counting CAN be done incrementally and (if necessary) can be
done in real time.

Much worse, however, is the lack of logical reasoning involved in your
claim. What you appear to lack is not just the knowledge of the specific
subject matter at hand, but the ability to recognize the implications of
the facts when they are made known to you. Originally you showed
ignorance, but now you're showing either fallacious reasoning, or
outright dishonesty.

Jerry Coffin

unread,
Mar 30, 2008, 7:35:57 PM3/30/08
to
In article <6ade2425-d268-4fa2-b136-
eee379...@d21g2000prf.googlegroups.com>, dvy...@gmail.com says...

> On 31 ???, 00:35, Jon Harrop <use...@jdh30.plus.com> wrote:
>
> > > Please show me a 100% accurate GC; good luck finding one...
> >
> > That is irrelevant. Your argument in favor of reference counting was
> > completely fallacious.
> >
> > You claimed was that reference counting is "more accurate than a traditional
> > GC could ever be". Consider:
> >
> > {
> > Bar bar()
> > f(bar);
> > g()
> > }
> >
> > Reference counting will keep "bar" alive until its reference count happens
> > to be zeroed when it drops out of scope even though it is not reachable
> > during the call to "g()". Real garbage collectors can collect "bar" during
> > the call to "g" because it is no longer reachable.
>
> Can Real garbage collectors do this w/o help of the compiler? I'm not
> sure.

Actually, even WITH the help of the compiler, most GCs won't collect bar
until it goes out of scope.

A garbage collector starts from all known pointers, and marks each
object they point at, recursively, until all objects that can be reached
by the program have been marked as being in use. The starting point is
taken from all global variables and all variables created on the stack.

In short, the garbage collector leaves the object intact until it can
prove (in a rather simpleminded fashion, based only on pointers) that
the object _can't_ possibly be referred to ever again. At least in the
typical case, however, this is done based only on pointers to the object
NOT on analyzing code. The fact that a stack frame contains a pointer to
an object is considered reason to keep that object around, even if the
code ceases to dereference that pointer at some time.

Dilip

unread,
Mar 30, 2008, 8:49:18 PM3/30/08
to
On Mar 30, 6:35 pm, Jerry Coffin <jcof...@taeus.com> wrote:
> In article <13v00gd4s2dh...@corp.supernews.com>, use...@jdh30.plus.com
> says...
>
> > Jerry Coffin wrote:
> > > In article <13utr3lmik7l...@corp.supernews.com>, use...@jdh30.plus.com

> > > says...
> > >> You make it sound as if reference counting is making a come-back. I'll
> > >> give you the benefit of the doubt though: can you cite any references
> > >> indicating that reference counting can be even vaguely competitive
> > >> compared to a real GC?
>
> > > Paul Wilson's garbage collector survey, section 2.1.
>
> > >http://www.cs.utexas.edu/ftp/pub/garbage/gcsurvey.ps
>
> > That is 16 years out of date.
>
> It's 16 years old. The wheel is far older. Neither is out of date at
> all.
>
> > > Most of what you've said about reference counting is complete nonsense.
>
> > Then why is your best counterexample ancient history?
>
> It's not an example, it's a survey. As noted above, it remains as
> relevant as the wheel.

Jerry

Can I invite you to take a look at this post[1] back from a few years
ago where a MSFT employee laid out his thoughts on why they went the
GC way w/o bothering about refcounting? Also shortly after the post
was written, MSFT funded a project to try to add ref-counting to their
Rotor (their version of open sourced CLR) codebase[2] as a kind of
feasibility study and it failed miserably. The latter project, I have
the details only as a word document and have uploaded it here[2]. Let
me know what you think.

[1] http://discuss.develop.com/archives/wa.exe?A2=ind0010A&L=DOTNET&D=0&P=39459
[2] http://cid-ff220db24954ce1d.skydrive.live.com/browse.aspx/RefcountingToRotor

Jon Harrop

unread,
Mar 30, 2008, 9:20:07 PM3/30/08
to
Jerry Coffin wrote:
> Actually, even WITH the help of the compiler, most GCs won't collect bar
> until it goes out of scope.
>
> A garbage collector starts from all known pointers, and marks each
> object they point at, recursively, until all objects that can be reached
> by the program have been marked as being in use. The starting point is
> taken from all global variables and all variables created on the stack.
>
> In short, the garbage collector leaves the object intact until it can
> prove (in a rather simpleminded fashion, based only on pointers) that
> the object _can't_ possibly be referred to ever again. At least in the
> typical case, however, this is done based only on pointers to the object
> NOT on analyzing code. The fact that a stack frame contains a pointer to
> an object is considered reason to keep that object around, even if the
> code ceases to dereference that pointer at some time.

That's a nice theory but it is not representative of how GCs actually work
and your opening statement is completely wrong. Here is a counter-example
in OCaml:

$ cat >a.ml
open Printf

let rec init t = function
| 0 -> ()
| n -> init (n::t) (n-1)

let () =
let n = try int_of_string Sys.argv.(1) with _ -> 1000 in
let t = [n] in
let weak = Weak.create 1 in
Weak.set weak 0 (Some t);
init t n;
init [] n;
printf "%s\n" (if Weak.check weak 0 then "uncollected" else "collected");
$ ocamlopt a.ml -o a
$ ./a 10000
collected

As you can see, the value was collected before the scope ended.

I've also tested this in F# (on .NET) and the value is also collected.

Chris Thomasson

unread,
Mar 30, 2008, 9:33:58 PM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13uvuuv...@corp.supernews.com...

> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13uu0sg...@corp.supernews.com...
>>> You cannot have your cake and eat it: you said that the programmer must
>>> be aware of the low-level details of their data structures in order to
>>> work around the pathological performance problems inherent with
>>> reference
>>> counting. That is a form of fragility. Fragmentation is one specific
>>> example.
>>
>> Wrong. You completely misunderstood me. I was talking about the
>> implementers of refcounting. The implementation of reference counting
>> should try and avoid making calls into atomic instructions and memory
>> barriers. This is transparent to the user.
>
> A data structure implementor using reference counting must be aware of the
> overhead of reference counting and work around it by amortizing reference
> counts whenever possible. You yourself only just described this in the
> context of performance.

A data-structure implementer should always be aware of how he is going to
safely reclaim memory in the presence of multi-threading. Garbage collection
does not get you a free lunch here. Neither does reference counting.

>
>>>> Memory overhead? No, a reference count releases the object when the
>>>> reference drops to zero which is more accurate than a traditional GC
>>>> could ever be.
>>>
>>> That is commonly claimed but actually wrong.
>>
>> Please show me a 100% accurate GC; good luck finding one...
>
> That is irrelevant. Your argument in favor of reference counting was
> completely fallacious.

Your incorrect. You need to open your eyes here:


> You claimed was that reference counting is "more accurate than a
> traditional
> GC could ever be". Consider:
>
> {
> Bar bar()
> f(bar);
> g()
> }


> Reference counting will keep "bar" alive until its reference count happens
> to be zeroed when it drops out of scope even though it is not reachable
> during the call to "g()". Real garbage collectors can collect "bar" during
> the call to "g" because it is no longer reachable.
>
> So GCs can clearly collect sooner than reference counters.

Now your just trolling. Why are you thinking in terms of scope here? Did you
know that C++ allows the programmer to do something like:

{
{
Bar bar()
f(bar);
}
g()
}

See? We can go in circles for ever here. Your posing non-sense.


>>> Scope can keep reference counts above zero and values alive
>>> unnecessarily
>>> when a real GC can collect them because they are unreferenced even
>>> though
>>> they are still in scope.
>>
>> It will only collect anything when the GC decides to run!
>
> Which may well be before the scope happens to end.

Probably not. A GC only runs once in a while. Its nowhere near as accurate
as general forms of reference counting. Does that make reference counting
better? Na. It's just another tool in our box.


>> Anyway, traditional GC is NOT about performance, its about helping some
>> programmers that do not know how, or want, to manage their memory
>> manually.
>
> By the same logic: C++ is for programmers who don't know, or want, to
> write
> assembler manually.

You comparing creating efficient manual memory management schemes with
programming assembly language? Are you serious, or just trolling again?


>>>> Are you talking about the extra word of space per-object?
>>>
>>> Yes.
>>
>> How is that different than GC meta-data?
>
> Reference counts consume a lot more space.

Oh boy. Here we go again. I ask you: What reference counting algorithm? You
make blanket statements that are completely false!

:^/


>> BTW, there are refcounting
>> algorithms that do not need any per-object meta-data. For instance, vZOOM
>> keeps this information on a per-thread basis. There is GC meta-data that
>> keeps track of objects. There has to be. Think about it.
>
> Yes. For example, OCaml hides two bits inside each pointer totalling zero
> overhead. In contrast, a reference counting system is likely to add a
> machine word, bloating each and every value by 8 bytes unnecessarily on
> modern hardware.

There are some counting algorithms that use pointer stealing. Anyway, your
make false assumptions based on blanket statements. Drill down on some
specific reference counting algorithms please.


>>>> Did you know that many garbage collectors use per-object meta-data as
>>>> well?
>>>
>>> None that we use. Which GCs are you referring to?
>>
>> Many GC languages use object headers; Java.
>
> Just Java?

.NET. Object meta-data is required by collectors and reference counting
algorithms. Some keep the data per-object. Some keep it per-thread. Some
keep it in a separate chunk of memory. Ect, ect...


>>> I benchmarked all the mainstream reference counters for C++ many years
>>> ago. If you think the situation has improved, perhaps we could benchmark
>>> C++ against some GC'd languages for collection-intensive tasks now?
>>
>> C++ has no GC. Anyway, are you talking about a lock-free reader patterns?
>
> Let's do a single-threaded benchmark first.

That's no good! Multi-threading is the way things are now. You don't get a
free lunch anymore.


>> What collection-intensive tasks?
>
> Symbolic rewriting would make a good benchmark. Here is one:
>
> http://www.lambdassociates.org/studies/study10.htm
>
> Try implementing this using reference counting and we can see how it
> compares (probably on a trickier rewrite). Replace the arbitrary-precision
> ints with doubles.

I need to take a look at this, but, are you sure that this even needs memory
management? Could I just reserve a large block of memory and carve
data-structures out of it and use caching? I don't think this needs GC or
reference counting.


>> I know I can compete, and most likely beat, a traditional GC with
>> handcrafted implementations that C++ give me the freedom to work with.
>
> I admire your faith but I would like to see some evidence to back up such
> claims because they run contrary to common wisdom accumulated over the
> past
> few decades.

Here is a stupid contrived example:
________________________________________________________
struct object {
object* cache_next;
bool cached;
[...];
};

#define OBJ_DEPTH() 100000

static object g_obj_buf[OBJ_DEPTH()] = { NULL, true };
static object* g_obj_cache = NULL;

void object_prime() {
for(int i = 0; i < OBJ_DEPTH(); ++i) {
g_obj_buf[i].cache_next = g_obj_cache;
g_obj_cache = &g_obj_buf[i];
}
}

object* object_pop() {
object* obj = g_obj_cache;
if (! obj) {
if (obj = malloc(sizeof(*obj))) {
obj->cached = false;
}
}
return obj;
}

void object_push(object* obj) {
if (obj->cached) {
obj->cache_next = g_obj_cache;
g_obj_cache = obj;
} else {
free(obj);
}
}

void foo() {
for (;;) {
object* foo = object_pop();
object_push(foo);
}
}
________________________________________________________


This is a fast object cache that will likely perform better than doing it
the GC way where nobody wants to manage their own memory:
________________________________________________________
void foo() {
for (;;) {
object* foo = gc_malloc(sizeof(*foo));
}
}
________________________________________________________

Yes manual memory management is more work and if you don't want to do that,
well, GC can be a life saver indeed.


>>>> Not incremental? Drill down on that one some more please.
>>>
>>> You actually just described the problem: reference counting releases
>>> values when their reference drops to zero.
>>
>> Yeah. This is NOT true with a traditional GC.
>
> Absolutely. GCs do not have to wait that long: they can collect sooner.

A garbage collector generally cannot be as accurate as a reference count.
Your contrived scope example is non-sense.


>>> At the end of a scope, many reference
>>> counts can be zeroed at the same time and the ensuing collections can
>>> stall the program for an unacceptable amount of time. This can also be
>>> extremely difficult to workaround without simply resorting to a real GC.
>>
>> Stall? Are you sure you know how Proxy GC works? There is no stall. There
>> is no contrived introspection of thread stacks. There is no
>> collect-the-world. There is no mark-and-sweep. There is no thread
>> suspension. Ect, ect...
>
> You've just made a series of irrelevant references here.

How are the implementation details of a GC irrelevant to a discussion on GC?


>>> We had this exact problem on a 250kLOC C++ product line and eventually
>>> fixed
>>> it by rewriting everything from scratch in a more modern language. The
>>> worst case performance is now 5x faster. We wasted a couple of months
>>> trying to fix the problem in C++ before having the revelation that we
>>> were just reinventing a garbage collector and doing a poor job of it at
>>> that.
>>
>> Well, from your false comments on reference counting, I would expect you
>> to not be able to beat a tranditional garbage collector in any way shape
>> or form.
>
> Of course. I fully expect you to fail on the above benchmark but am
> willing
> to give you the benefit of the doubt.

Any benchmark on GC these days has to be able to use multiple processes or
threads. Ahh... Here is something we can do! Okay, we can create a
multi-threaded daemon that serves factory requests to multiple processes.
This would use multi-threading, multi-processing, and shared memory:

I want to implement the wait-free factory as a multi-threaded daemon process
that multiple producers can register the path and name of a shared library,
which concurrent consumer processes can lookup by name; dynamically link
with the resulting library and call a "common" instance function (e.g.,
define common api). I guess you could think of it as a highly concurrent
per-computer COM daemon. The factory consumer threads can use the lock-free
reader pattern, and the producer threads can use a form of mutual exclusion;
including, but not limited, to locks.

Or we can do a efficient observer pattern:

I want the wait-free observer to be a multi-threaded daemon that can allow
multiple threads to create/register delegates; message-types and allow
consumer threads to register with those delegates and receive their
messages; producer threads create messages with subsequently signal the
delegates that manages those messages to multicast to their registered
consumers.

Now those programs will definitely test a GC to the limits. BTW, do you know
of a GC that can collect across multi-process working with shared memory???

Which one do you want to do? The multi-thread/process factory or the
multi-thread observer?

;^)


>>>>> You are simply restating that author's misconceptions. Read the
>>>>> documentation.
>>>>
>>>> I am asking you how to do it. Explain how to create effectively object
>>>> caches using "traditional" collect-the-world GC...
>>>
>>> You use a weak hash table to cache data without keeping it alive.
>>
>>> You do not use a weak hash table to cache data that has no other
>>> references
>>> to keep it alive on the assumption that the GC will be inefficient
>>> enough
>>> for your cache to work.
>>
>> Wrong, let me inform you that a cache keeps QUIESCENT objects for further
>> optimized allocations.
>
> That would be a specific kind of cache. Caches do not inherently have
> anything to do with allocation.

A cache is an aid to allocation.


>> Are you sure you know what a cache is? A cache keep objects that have no
>> references around to optimize further allocations.
>
> Apparently what you wanted to ask me was how I would implement an
> allocation
> cache in a garbage collected language.

Yes.


> The answer depends upon the language. If you look at OCaml, for example,
> there is no need to because the run-time has already automated this.

Then pick a lanaguage that does do that.


>> Do you know about the Solaris slab allocator?
>
> No.

You should take a look at it.

Chris Thomasson

unread,
Mar 30, 2008, 9:35:33 PM3/30/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13uvvro...@corp.supernews.com...

> Chris Thomasson wrote:
>> Perhaps I misunderstood you... I did state that ref-counting is only as
>> fragile as the programmer using it. Well, guess what, that applies to GC
>> as well:
>>
>>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5e9357a6fb746e5d
>>
>> Ref-counting and GC have their problems. Are you trying to convince me
>> that GC is just there and you can fire objects at it and forget? Oh,
>> wait... You did say exactly that. Remember?
>
> We already discussed that exact reference and it has nothing to do with
> garbage collection. Read the comments left on his blog post.

It does have to do with how the Java garbage collector interacts with the
data-structures he was using.

Chris Thomasson

unread,
Mar 30, 2008, 9:58:55 PM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v00gd...@corp.supernews.com...

> Jerry Coffin wrote:
>> In article <13utr3l...@corp.supernews.com>, use...@jdh30.plus.com
>> says...
>>> You make it sound as if reference counting is making a come-back. I'll
>>> give you the benefit of the doubt though: can you cite any references
>>> indicating that reference counting can be even vaguely competitive
>>> compared to a real GC?
>>
>> Paul Wilson's garbage collector survey, section 2.1.
>>
>> http://www.cs.utexas.edu/ftp/pub/garbage/gcsurvey.ps
>
> That is 16 years out of date.
>
>> Most of what you've said about reference counting is complete nonsense.
>
> Then why is your best counterexample ancient history?

Trolling again.

GC in its essence tracks references to objects. I already pointed this out:

http://groups.google.com/group/comp.lang.c++/msg/9d01df98534428ee

>
> Python and Mathematica are reference counted but both are slow and suffer
> from stalls due to lack of incremental collection.
>
>> He mentions two points of particular interest with respect to reference
>> counting making a "comeback". The difference in speed between CPUs and
>> main memory seems to be growing, forcing ever-greater dependence on
>> caching --
>
> Yes.
>
>> which means that the improved locality of reference with
>> reference counting means more all the time. This also means that the
>> primary performance cost of reference counting (having to update counts)
>> now means relatively little as a rule -- in most cases, a few extra
>> operations inside the CPU mean nearly nothing, while extra references to
>> main memory mean a great deal.
>
> No. Locality is worse with reference counting which is why it has gotten
> relatively slower since that ancient survey. You can easily test this
> yourself.

Will you stop making FALSE blanket statements! For your information,
different forms of distributed reference counting have excellent locality of
reference. You can keep the counts on a per-cpu or per-thread basis.


>> Likewise, individual computers now correspond much more closely to the
>> distributed systems at the time of the survey. In particlar, it's now
>> quite common to see a number of separate OS instances running in virtual
>> machines in a single box.
>>
>> You opinions on garbage collection reflect a tremendous enthusiasm, but
>> equally tremendous ignorance of the subject matter.
>
> Yet you cannot cite a single relevant reference or piece of objective
> evidence to support an argument that flies in the face of almost all
> modern
> software development (outside restricted memory environments).

Can you even list at least 5 difference forms of reference counting? You
make blanket statements about all of them, yet I bet that you don't even
know most of the specific algorithms. Ahh, you said that the specific
algorithms do not matter:

http://groups.google.com/group/comp.programming.threads/msg/05a4e3e3a4a75f05

I have to let you know that the details are very important. Your claim that
they are irrelevant sheds a lot of insight into your knowledge of the
matter.

Chris Thomasson

unread,
Mar 30, 2008, 10:00:37 PM3/30/08
to
"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:6ade2425-d268-4fa2...@d21g2000prf.googlegroups.com...

Sure. And so can the programmer:

{
{
Bar bar()
f(bar);
}
g()
}


;^)

Chris Thomasson

unread,
Mar 30, 2008, 10:05:21 PM3/30/08
to

"Barry Kelly" <barry....@gmail.com> wrote in message
news:6v20v3pgomkkf5aqu...@4ax.com...

> Jon Harrop wrote:
>
>> Jerry Coffin wrote:
>> > You opinions on garbage collection reflect a tremendous enthusiasm, but
>> > equally tremendous ignorance of the subject matter.
>>
>> Yet you cannot cite a single relevant reference or piece of objective
>> evidence to support an argument that flies in the face of almost all
>> modern
>> software development (outside restricted memory environments).
>
> Of course he can't. That's because he's living in the world of 16 years
> and more ago.
>
> I wouldn't waste so much time with Jerry, Chris and others of their ilk.
> They're so steeped in obsolete experience that they're blind even to the
> present, much less the future. Even Einstein never accepted quantum
> theory.
[...]

Trolling? Anyway, have nothing against GC. I simply wanted to let Jon know
that most of his blanket claims about reference counting were false, or
misleading at best.

Chris Thomasson

unread,
Mar 30, 2008, 10:14:18 PM3/30/08
to
"Dilip" <rdi...@lycos.com> wrote in message
news:f91d727c-b7a7-4bcd...@m73g2000hsh.googlegroups.com...

Thanks for posting that. I will read through them when I get some time. One
point on circular references... The simplistic, contrived and most naive
solutions is to use offsets from zero to determine when an object is
destroyed. Every circular reference counts as an offset. So if there are two
circular refs, then the offset would be 2. This means than when the
reference count drops to 2 its analogous to dropping to zero. Will this work
for you? Probably not. Does it work in certain scenarios, yes.

Chris Thomasson

unread,
Mar 30, 2008, 10:20:41 PM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v03p2...@corp.supernews.com...

> Dmitriy V'jukov wrote:
>> On 31 ???, 00:35, Jon Harrop <use...@jdh30.plus.com> wrote:
>>> > Please show me a 100% accurate GC; good luck finding one...
>>>
>>> That is irrelevant. Your argument in favor of reference counting was
>>> completely fallacious.
>>>
>>> You claimed was that reference counting is "more accurate than a
>>> traditional GC could ever be". Consider:
>>>
>>> {
>>> Bar bar()
>>> f(bar);
>>> g()
>>> }
>>>
>>> Reference counting will keep "bar" alive until its reference count
>>> happens to be zeroed when it drops out of scope even though it is not
>>> reachable during the call to "g()". Real garbage collectors can collect
>>> "bar" during the call to "g" because it is no longer reachable.
>>
>> Can Real garbage collectors do this w/o help of the compiler?
>
> Absolutely, yes. I just checked this with a couple of GC'd languages and
> they both collected "bar" before the end of its scope, i.e. sooner than
> referencing counting can.
[...]

Yet another naive statement. Try this:

{
{
Bar bar()
f(bar);
}
g()
}


I as the programmer know more than a reference counting algorithms does,
even your almighty savior: GC.

Chris Thomasson

unread,
Mar 30, 2008, 10:22:02 PM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v03u7...@corp.supernews.com...

:^D

Jon Harrop

unread,
Mar 30, 2008, 10:27:17 PM3/30/08
to
Chris Thomasson wrote:
> Trolling? Anyway, have nothing against GC. I simply wanted to let Jon
> know that most of his blanket claims about reference counting were false,
> or misleading at best.

Ironic result then. :-)

Jon Harrop

unread,
Mar 30, 2008, 10:45:34 PM3/30/08
to
Chris Thomasson wrote:
> Yet another naive statement. Try this:
>
> {
> {
> Bar bar()
> f(bar);
> }
> g()
> }

My counter example has proven that your claim was fallacious.

Jon Harrop

unread,
Mar 30, 2008, 10:44:34 PM3/30/08
to
Jerry Coffin wrote:
> You're simply displaying still more of your ignorance. Yes, there is
> hardware that has dedicated reference counting capability. I know this
> with absolute certainty, because I've designed precisely such hardware.

Citation?

> You claim that: "none of the major GCs are [sic] based upon reference
> counting", but then:
>
>> Python and Mathematica are reference counted but both are slow and suffer
>> from stalls due to lack of incremental collection.
>
> You cite two major GCs that _are_ based on reference counting.

Sure. Lots of old programs used reference counting and CPython and the old C
core of Mathematica are examples.

They are being superceded though. We now have a multitude of Python
implementations and Mathematica is migrating to the JVM. They all have two
things in common: they're faster for allocation-intensive tasks and they
don't use reference counting.

Both PyPy and IronPython use real GCs and both beat CPython on Python's
standard GC benchmark "gcbench.py". From the horse's mouth:

"PyPy shines on gcbench, which is mostly just about allocating and freeing
many objects. Our gc is simply better than refcounting, even though we've
got shortcomings in other places." -
http://morepypy.blogspot.com/2008/03/as-fast-as-cpython-for-carefully-taken.html

Note that the inefficiency of reference counting is so severe that it can
even be significant in interpreted languages!

There are many other independent benchmarks:

CPython: 11.00s reference counted
IronPython: 4.14s .NET GC
Jython: 1.24s JVM GC

http://pyinsci.blogspot.com/2007/09/parallel-processing-in-cpython.html

> Of course, as evidence of anything except that some major GCs are based
> on reference counting, these mean precisely nothing -- you've shown
> nothing about how much of their time is spent on reference counting vs.
> other issues.

Exactly. Reference counting has been optimized out of all performant GCs.

> As has already been pointed out, reference counting CAN be
> made incremental (in fact, by its very nature it's much more incremental
> than most other forms of GC)

Mathematica and OCaml demonstrate the opposite.

>> No. Locality is worse with reference counting which is why it has gotten
>> relatively slower since that ancient survey. You can easily test this
>> yourself.
>
> I have. You're wrong. The fact that you'd claim this at all indicates
> that you're utterly clueless about how GC works at all.

Yet you still haven't ported the benchmark I cited.

> In reference counting, you have an object and the reference count is
> part of that object. The reference count is never touched except when
> there is an operation on the object,

No: reference counts are updated whenever the value is referenced or
dereferenced and those are not operations on the value.

> so locality of reference is usually
> nearly perfect -- the only exception being when/if a cache line boundary
> happens to fall precisely between the rest of the object and the
> reference count. In many implementations, this isn't even possible; in
> the relatively rare situation that it's possible, it's still quite rare.
>
> In other garbage collection, you start from a root set of objects and
> walk through everything they point at to find all the active objects.
> You then collect those that are no longer in use. When carried out as a
> simple mark/sweep collector, the locality of reference of this is
> absolutely _terrible_ -- every active object gets touched, and all its
> pointers followed at _every_ collection cycle.

That description is 48 years out of date.

> Modern collectors (e.g. generational scavengers) attempt to ameliorate
> this massive problem. They do this by assuming that when an object has
> survived enough cycles of collection that the object will probably
> survive many more cycles as well, so the object (and everything to which
> it refers) is moved to another area where it those objects are simply
> treated as permanent.

Older generations are not regarded as "permanent".

> The first work along this line simply created two areas, one for
> transient objects and another for permanent objects. More recent work
> creates more or less a gradient, in which objects that have survived the
> longest are inspected the last often, and as objects survive collection
> cycles, they are slowly promoted up the hierarchy until they are almost
> never inspected for the possibility of being garbage.
>
> This improves locality of reference compared to old mark-sweep
> collectors -- but it still has substantially worse locality of reference
> than a typical reference counting system. In particular, in reference
> counting, the reference count is NEVER touched unless an operation that
> creates or destroys a reference to that object takes place.

Which happens all the time and is the reason why reference counting has
worse locality of reference and performance. The other reason is that
reference counting causes fragmentation because it cannot move values.

> By contrast, even a generational scavenger will chase pointers through
> objects many times before the object is promoted to the point that the
> frequency of collection on that object is reduced. Even then, in the
> gradient-style system, the frequency of chasing pointers through that
> object is reduced only slightly until many more fruitless attempts at
> collecting it have been made -- at which point, the collection frequency
> is again reduced, but (again) not stopped by any means.
>
> This, of course, points to another serious problem with most such
> garbage collectors: they are based on the assumption that the longer an
> object has lived, the less likely that object is to die at any given
> point in time. In short, it assumes that object usage tends to follow a
> roughly stack-like model.
>
> That's certainly often true -- but it's equally certainly not always
> true. Quite the contrary, some types of applications tend to follow
> something much closer to a queue model, in which most objects have
> roughly similar life spans, so the longer an object has lived, the MORE
> likely it is to die at any given point in time.
>
> For applications that follow this pattern, most of the relatively recent
> work in garbage collectors is not only useless, but actively
> detrimental. The GC spends nearly all its time inspecting objects that
> are not likely to die any time soon at all, while ignoring nearly all
> the objects that are already dead or likely to die soon. The result is
> lots of time wasted copying objects AND lots of memory wasted storing
> objects that are actually dead.

This is all speculation for which there is overwhelming evidence to the
contrary. In short, if any of your points were correct then people would be
building major GCs on reference counting but they are not.

>> > You opinions on garbage collection reflect a tremendous enthusiasm, but
>> > equally tremendous ignorance of the subject matter.
>>
>> Yet you cannot cite a single relevant reference or piece of objective
>> evidence to support an argument that flies in the face of almost all
>> modern software development (outside restricted memory environments).
>
> Quite the contrary: the reference provided was completely relevant and
> remains as accurate today as it was when it was new.

Even its prophecies that we now know to be wrong?

> The techniques to
> make reference counting incremental and, if necessary, real-time work
> today, just like they did 16 years ago.

But they aren't used because everyone has moved on to real GCs because they
are better in almost every respect, including the one's you're trying to
contest.

> Some types of data become obsolete quickly. The specific details of a
> particular model of CPU that was current 16 years ago would have little
> relevance today.

The memory gap is the name given to the phenomenon that has obsoleted your
argument.

Then you should be able to provide credible references and worked counter
examples as I have.

Chris Thomasson

unread,
Mar 30, 2008, 11:07:13 PM3/30/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0kkk...@corp.supernews.com...

> Chris Thomasson wrote:
>> Yet another naive statement. Try this:
>>
>> {
>> {
>> Bar bar()
>> f(bar);
>> }
>> g()
>> }
>
> My counter example has proven that your claim was fallacious.

Please tell me exactly what's wrong with the following:

{
{
refcount<Bar> bar(new Bar);
f(bar);
}
g()
}

?

Jon Harrop

unread,
Mar 30, 2008, 11:08:10 PM3/30/08
to
Chris Thomasson wrote:
> Please tell me exactly what's wrong with the following:
>
> {
> {
> refcount<Bar> bar(new Bar);
> f(bar);
> }
> g()
> }

The problem is that it is irrelevant, not that it is wrong.

Chris Thomasson

unread,
Mar 30, 2008, 11:25:34 PM3/30/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0luv...@corp.supernews.com...

> Chris Thomasson wrote:
>> Please tell me exactly what's wrong with the following:
>>
>> {
>> {
>> refcount<Bar> bar(new Bar);
>> f(bar);
>> }
>> g()
>> }
>
> The problem is that it is irrelevant, not that it is wrong.

It is very relevant INDEED! First you say that a reference count is not 100%
by using the following example:

{
refcount<Bar> bar(new Bar);
f(bar);
g()
}


And claming that 'bar' will not be collected while g() is executing. Then I
say your totally wrong and give a simple counter example that renders your
non-sense example totally false:

{
{
refcount<Bar> bar(new Bar);
f(bar);
}
g()
}


And what do you do? Of course you make a silly statement: "it is
irrelevant". LOL! You are getting ridiculous!

Chris Thomasson

unread,
Mar 30, 2008, 11:27:49 PM3/30/08
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:4r2dna4yd4IfxG3a...@comcast.com...

>
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13v0luv...@corp.supernews.com...
>> Chris Thomasson wrote:
>>> Please tell me exactly what's wrong with the following:
>>>
>>> {
>>> {
>>> refcount<Bar> bar(new Bar);
>>> f(bar);
>>> }
>>> g()
>>> }
>>
>> The problem is that it is irrelevant, not that it is wrong.
>
> It is very relevant INDEED! First you say that a reference count is not
> 100% by using the following example:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is very relevant INDEED! First you say that a reference count is not 100%

__accurate_ by using the following example:

Jon Harrop

unread,
Mar 30, 2008, 11:37:13 PM3/30/08
to
Chris Thomasson wrote:
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13uvuuv...@corp.supernews.com...

>> A data structure implementor using reference counting must be aware of
>> the overhead of reference counting and work around it by amortizing
>> reference counts whenever possible. You yourself only just described this
>> in the context of performance.
>
> A data-structure implementer should always be aware of how he is going to
> safely reclaim memory in the presence of multi-threading.

Yes.

> Garbage collection does not get you a free lunch here.

Your example before was a linked list. With a GC, these are implemented
completely naively because the GC takes care of everything (a free lunch).
Look at the OCaml and F# standard libraries, for example.

>>>>> Memory overhead? No, a reference count releases the object when the
>>>>> reference drops to zero which is more accurate than a traditional GC
>>>>> could ever be.
>>>>
>>>> That is commonly claimed but actually wrong.
>>>
>>> Please show me a 100% accurate GC; good luck finding one...
>>
>> That is irrelevant. Your argument in favor of reference counting was
>> completely fallacious.
>
> Your incorrect. You need to open your eyes here:

I have proven you wrong by providing a worked counter example. There is no
point in discussing this further.

>> You claimed was that reference counting is "more accurate than a
>> traditional
>> GC could ever be". Consider:
>>
>> {
>> Bar bar()
>> f(bar);
>> g()
>> }
>
>
>> Reference counting will keep "bar" alive until its reference count
>> happens to be zeroed when it drops out of scope even though it is not
>> reachable during the call to "g()". Real garbage collectors can collect
>> "bar" during the call to "g" because it is no longer reachable.
>>
>> So GCs can clearly collect sooner than reference counters.
>
> Now your just trolling. Why are you thinking in terms of scope here? Did
> you know that C++ allows the programmer to do something like:
>
> {
> {
> Bar bar()
> f(bar);
> }
> g()
> }
>
> See? We can go in circles for ever here. Your posing non-sense.

This neither proves nor disproves your claim. However, it is relevant to the
previous point because, again, GC provides a free lunch here by collecting
earlier without forcing you to rewrite your code as reference counting did.

>>>> Scope can keep reference counts above zero and values alive
>>>> unnecessarily
>>>> when a real GC can collect them because they are unreferenced even
>>>> though
>>>> they are still in scope.
>>>
>>> It will only collect anything when the GC decides to run!
>>
>> Which may well be before the scope happens to end.
>
> Probably not.

I just tested this in OCaml and F# and both collect the value before the
scope ends.

>>> Anyway, traditional GC is NOT about performance, its about helping some
>>> programmers that do not know how, or want, to manage their memory
>>> manually.
>>
>> By the same logic: C++ is for programmers who don't know, or want, to
>> write assembler manually.
>
> You comparing creating efficient manual memory management schemes with
> programming assembly language?

Yes. Both are only necessary in very specific circumstances.

>>>>> Are you talking about the extra word of space per-object?
>>>>
>>>> Yes.
>>>
>>> How is that different than GC meta-data?
>>
>> Reference counts consume a lot more space.
>
> Oh boy. Here we go again. I ask you: What reference counting algorithm?
> You make blanket statements that are completely false!

I have described OCaml's zero overhead in this context. Show me a reference
counter that uses no space for its reference counts?

Failing that, show me a benchmark where any reference counted program uses
less memory than my GC'd equivalent?

>>> BTW, there are refcounting
>>> algorithms that do not need any per-object meta-data. For instance,
>>> vZOOM keeps this information on a per-thread basis. There is GC
>>> meta-data that keeps track of objects. There has to be. Think about it.
>>
>> Yes. For example, OCaml hides two bits inside each pointer totalling zero
>> overhead. In contrast, a reference counting system is likely to add a
>> machine word, bloating each and every value by 8 bytes unnecessarily on
>> modern hardware.
>
> There are some counting algorithms that use pointer stealing. Anyway, your
> make false assumptions based on blanket statements. Drill down on some
> specific reference counting algorithms please.

I am referring to all reference counting algorithms that require any space
for their counts, i.e. all of them.

>> Let's do a single-threaded benchmark first.
>
> That's no good! Multi-threading is the way things are now. You don't get a
> free lunch anymore.

Very well. Let's do it in parallel.

>>> What collection-intensive tasks?
>>
>> Symbolic rewriting would make a good benchmark. Here is one:
>>
>> http://www.lambdassociates.org/studies/study10.htm
>>
>> Try implementing this using reference counting and we can see how it
>> compares (probably on a trickier rewrite). Replace the
>> arbitrary-precision ints with doubles.
>
> I need to take a look at this, but, are you sure that this even needs
> memory management?

Yes: the task rewrites expressions trees. There is a very similar C++
program here:

http://www.codecodex.com/wiki/index.php?title=Derivative

You could use that as a basis and add reference counting to it.

> Could I just reserve a large block of memory and carve data-structures out
> of it and use caching?

You will run out of memory very quickly. This program spends all of its time
in allocation and collection.

> I don't think this needs GC or reference counting.

Unless you detect unused subexpressions and deallocate them you will run out
of memory.

We need this program to perform an irreducible task that I can code up in a
GC'd language to compare performance.

>>>>> Not incremental? Drill down on that one some more please.
>>>>
>>>> You actually just described the problem: reference counting releases
>>>> values when their reference drops to zero.
>>>
>>> Yeah. This is NOT true with a traditional GC.
>>
>> Absolutely. GCs do not have to wait that long: they can collect sooner.
>
> A garbage collector generally cannot be as accurate as a reference count.
> Your contrived scope example is non-sense.

I have proven your original statement wrong. Adding "generally" is better
but there is no evidence to support it.

We can at least test this on the symbolic rewriter by measuring the memory
consumption.

>>>> At the end of a scope, many reference
>>>> counts can be zeroed at the same time and the ensuing collections can
>>>> stall the program for an unacceptable amount of time. This can also be
>>>> extremely difficult to workaround without simply resorting to a real
>>>> GC.
>>>
>>> Stall? Are you sure you know how Proxy GC works? There is no stall.
>>> There is no contrived introspection of thread stacks. There is no
>>> collect-the-world. There is no mark-and-sweep. There is no thread
>>> suspension. Ect, ect...
>>
>> You've just made a series of irrelevant references here.
>
> How are the implementation details of a GC irrelevant to a discussion on
> GC?

Those aren't implementation details of a GC or, if they were supposed to be,
they are decades out of date. Mark and sweep has been incremental for over
three decades.

>> Of course. I fully expect you to fail on the above benchmark but am
>> willing to give you the benefit of the doubt.
>
> Any benchmark on GC these days has to be able to use multiple processes or
> threads. Ahh... Here is something we can do! Okay, we can create a
> multi-threaded daemon that serves factory requests to multiple processes.
> This would use multi-threading, multi-processing, and shared memory:
>
> I want to implement the wait-free factory as a multi-threaded daemon
> process that multiple producers can register the path and name of a shared
> library, which concurrent consumer processes can lookup by name;
> dynamically link with the resulting library and call a "common" instance
> function (e.g., define common api). I guess you could think of it as a
> highly concurrent per-computer COM daemon. The factory consumer threads
> can use the lock-free reader pattern, and the producer threads can use a
> form of mutual exclusion; including, but not limited, to locks.
>
> Or we can do a efficient observer pattern:
>
> I want the wait-free observer to be a multi-threaded daemon that can allow
> multiple threads to create/register delegates; message-types and allow
> consumer threads to register with those delegates and receive their
> messages; producer threads create messages with subsequently signal the
> delegates that manages those messages to multicast to their registered
> consumers.
>
> Now those programs will definitely test a GC to the limits. BTW, do you
> know of a GC that can collect across multi-process working with shared
> memory???

Not multi-process, no. Either multi-threaded with shared memory or
multi-process with message passing.

> Which one do you want to do? The multi-thread/process factory or the
> multi-thread observer?

I can't see how to make an irreducible task with a correct answer out of
these problem descriptions. I can try to port your C++ code but my
translations will be open to accusions of cheating in the absence of a
well-defined problem to solve.

>> That would be a specific kind of cache. Caches do not inherently have
>> anything to do with allocation.
>
> A cache is an aid to allocation.

The cache on a harddrive is not an "aid to allocation", for example.

>>> Are you sure you know what a cache is? A cache keep objects that have no
>>> references around to optimize further allocations.
>>
>> Apparently what you wanted to ask me was how I would implement an
>> allocation cache in a garbage collected language.

>> The answer depends upon the language. If you look at OCaml, for example,
>> there is no need to because the run-time has already automated this.
>

> Then pick a language that does do that.

Allocations are already cached by the minor (or first generation) heap of
most GCs. There is some sense in caching allocations on .NET because it
incurs a lock. You could do that in F# by preallocating an array of values.

However, I'm not sure that would be beneficial. I think allocating an array
of objects would incur multiple locks and allocating an array of structs
would incur a single lock but an indirection at every access (you cannot
pass structs by reference). I'd have to check this out though.

>>> Do you know about the Solaris slab allocator?
>>
>> No.
>
> You should take a look at it.

Will do.

Jon Harrop

unread,
Mar 30, 2008, 11:40:39 PM3/30/08
to
Chris Thomasson wrote:
> Then I say your totally wrong and give a simple counter example...

To be a valid counter example your program would have to prove that
reference counting is always more accurate. Your example does not prove
that. It is not a counter example.

Chris Thomasson

unread,
Mar 30, 2008, 11:53:06 PM3/30/08
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:nOedncvAJsfHom3a...@comcast.com...

> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
[...]

>>> I know I can compete, and most likely beat, a traditional GC with
>>> handcrafted implementations that C++ give me the freedom to work with.
>>
>> I admire your faith but I would like to see some evidence to back up such
>> claims because they run contrary to common wisdom accumulated over the
>> past
>> few decades.
>
> Here is a stupid contrived example:
> ________________________________________________________
> struct object {
> object* cache_next;
> bool cached;
> [...];
> };
>
> #define OBJ_DEPTH() 100000
>
> static object g_obj_buf[OBJ_DEPTH()] = { NULL, true };
> static object* g_obj_cache = NULL;
>
> void object_prime() {
> for(int i = 0; i < OBJ_DEPTH(); ++i) {
> g_obj_buf[i].cache_next = g_obj_cache;
> g_obj_cache = &g_obj_buf[i];
> }
> }

WHOOPS! There are several STUPID TYPO's in there. Well, that's what I get
for typing this out in the newsreader! Anyway, here is the full code for the
very simplistic object cache that will compile with a C compiler:
______________________________________________________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>


typedef struct object_s object;

struct object_s {
object* cache_next;
int cached;
};


#define OBJ_DEPTH() 100000


static object g_obj_buf[OBJ_DEPTH()] = { { NULL } };


static object* g_obj_cache = NULL;


void object_prime(void) {
int i;
for(i = 0; i < OBJ_DEPTH(); ++i) {


g_obj_buf[i].cache_next = g_obj_cache;

g_obj_buf[i].cached = 1;
g_obj_cache = &g_obj_buf[i];
}
}

object* object_pop(void) {


object* obj = g_obj_cache;
if (! obj) {
if (obj = malloc(sizeof(*obj))) {

obj->cache_next = NULL;
obj->cached = 0;
}
} else {
g_obj_cache = obj->cache_next;
}
return obj;
}

void object_push(object* obj) {
if (obj) {


if (obj->cached) {
obj->cache_next = g_obj_cache;
g_obj_cache = obj;
} else {
free(obj);
}
}
}

void foo(unsigned long int depth) {
for (;depth > 0; --depth) {


object* foo = object_pop();
object_push(foo);
}
}


int main(void) {
object_prime();
foo(5);


/*---------------------------------------------------------*/
puts("\n\n\n______________________________________________\n\
press <ENTER> to exit...");
getchar();
return 0;
}

______________________________________________________________________

Sorry about that non-sense Jon!

;^(...


BTW, thank you for not flaming me too bad on this! ACK!

Chris Thomasson

unread,
Mar 30, 2008, 11:54:40 PM3/30/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0nrt...@corp.supernews.com...

> Chris Thomasson wrote:
>> Then I say your totally wrong and give a simple counter example...
>
> To be a valid counter example your program would have to prove that
> reference counting is always more accurate. Your example does not prove
> that. It is not a counter example.

You example did not prove that reference counting is not accurate.

Jon Harrop

unread,
Mar 30, 2008, 11:49:54 PM3/30/08
to

Irrelevant.

Jerry Coffin

unread,
Mar 31, 2008, 1:18:04 AM3/31/08
to
In article <f91d727c-b7a7-4bcd-9a7a-c15f20324238
@m73g2000hsh.googlegroups.com>, rdi...@lycos.com says...

[ ... ]

> Jerry
>
> Can I invite you to take a look at this post[1] back from a few years
> ago where a MSFT employee laid out his thoughts on why they went the
> GC way w/o bothering about refcounting?

Sure. It'll take a while to read the whole thing, so I can't comment on
it in any detail just yet, but my immediate reaction based on glancing
it over is that it appears that their starting point was primarily the
semantics of VB6.

If that's accurate, then it appears there's not really a huge amount to
say: VB-6 is quite a lot different from C++. Specifically, C++ allows
one to execute arbitrary code in a destructor, meaning that the
destruction of an object can (and often does) have implications far
above and beyond that of simply destroying the object itself. In many
cases, correct operation of the program depends completely upon
execution of that code at precisely the correct time.

In VB-6 (or earlier) I don't believe that's the case. At least offhand,
I can't think of any observable semantics associated with destroying
anything (though I've never used VB much, so that may easily be wrong).
If that's the case, changing the timing of destroying objects can't make
an observable difference either. That makes almost any sort of garbage
collection comparatively trivial to incorporate.

In a situation like that, your choice of garbage collection techniques
comes down primarily to trade-offs between memory usage and speed, with
(quite) a bit of tuning to support the specific patterns of object usage
you see. By the time MS was doing this work, people had been working on
garbage collectors for years, and writing something that worked at least
reasonably well was a matter of combining well known elements in
relatively well known ways to get what were probably about the expected
results.

> Also shortly after the post
> was written, MSFT funded a project to try to add ref-counting to their
> Rotor (their version of open sourced CLR) codebase[2] as a kind of
> feasibility study and it failed miserably. The latter project, I have
> the details only as a word document and have uploaded it here[2]. Let
> me know what you think.
>
> [1] http://discuss.develop.com/archives/wa.exe?A2=ind0010A&L=DOTNET&D=0&P=39459
> [2] http://cid-ff220db24954ce1d.skydrive.live.com/browse.aspx/RefcountingToRotor

This second paper is a bit harder to comment on. They openly admit that
they 1) made no attempt at optimization, and 2) didn't profile the code
(tried to, but failed).

They make the case (correct, as a rule, I think) that deterministic
finalization allows one to simplify client code. The fact that their
code was slow in the absence of optimization, and that they don't (or at
that time didn't) know exactly why or how it was so slow, only points to
what we don't really know.

At least at first glance, their implementation sounds pretty slow,
though some points aren't entirely clear. They mention calling AddRef
and Release, but it's not clear whether they mean they're literally
calling member functions, or just using those names to make the idea
clear to people accustomed to COM.

If they were really using COM-style member functions, each increment or
decrement would involve loading the object's vtable pointer, then
looking up the function in the vtable, then calling that function to
increment or decrement. We're up to something like three memory
references, of which only one (the vtable pointer) is at all likely to
be in the cache already.

Then they talk about having to store the count for each object at the
end of the object, with a potentially different offset for each type of
object. That tends to indicate that they'd have to store the proper
offset and look it up for each object. If, heaven forbid, they followed
standard COM practice, they'd have to call _another_ function out of the
vtable to get that offset. If they access the data directly, we're
looking at a couple more memory references, and if they called a COM-
style member function we'd be looking at another three memory references
or so, of which (again) only one would be at all likely to be in the
cache already.

All in all, what would normally be a single reference to memory that
would almost always be in the cache would turn into around four to six
references to memory, only one or two of which would be in the cache on
more than rare ocassion.

Reference counting requires only simple operations (incrementing or
decrementing) but it does those quite frequently. You only ever get away
with it at all because those are normally quite fast. When you add
something like 4 references to memory that's not likely to be cached to
every one of those operations, dismal performance sounds like the best
you'd expect.

I should also point out that at no point have I attempted to advocate
reference counting as a general purpose garbage collection solution, or
anything on that order. Quite the contrary, I think if somebody were to
attempt to implement Java, Lisp, Smalltalk, Self, OCaml, etc., using
reference counting as it's only method of garbage collection, they'd be
making an extremely unwise decision at best.

My advocacy in this matter is purely for at least a reasonable degree of
accuracy. My assumption is that people considering various forms of
garbage collection should already know something about their memory
usage patterns. Matching that up with the characteristics of various
forms of garbage collection will allow them to make an intelligent
choice.

I have no problem at all if that choice happens to be something other
than reference counting -- in fact, I'd go so far as to say that
reference counting is only a good choice in relatively limited
circumstances. At the same time, within those limited circumstances,
reference counting will typically be a substantially better choice than
most alternatives.

If there was _never_ a situation in which reference counting was useful,
I wouldn't worry much if advice against it was based on inaccurate
reasoning -- even if inaccurate, it wouldn't be particularly misleading.
That's not the situation here. Jon Harrop's statements are both wildly
inaccurate, AND grossly misleading. IMO, that's unacceptable, and his
false statements _need_ to be corrected -- not in any particular hope
that anybody in particular will choose to use reference counting, but
only in the hope that they can make an intelligent, well-informed
decision based on facts, not blatant falsehoods.

Jerry Coffin

unread,
Mar 31, 2008, 2:02:37 AM3/31/08
to
In article <13v0kl8...@corp.supernews.com>, use...@jdh30.plus.com
says...

> Jerry Coffin wrote:
> > You're simply displaying still more of your ignorance. Yes, there is
> > hardware that has dedicated reference counting capability. I know this
> > with absolute certainty, because I've designed precisely such hardware.
>
> Citation?

I just gave you a citation, moron! If you want more details: it's used
in an MPEG decoder. MPEG includes I-frames (which are vaguely similar to
JPEG pictures) and P-frames and B-frames. In a P-frame, you use a block
of pixels from a previous frame as a prediction of a block in the
current frame, and then you encode only the differences between that
block and the current block. In a B-frame, you use bidirectional
prediction, meaning you use both a previous AND a succeeding frame for
your prediction.

In managing the memory, you've got a few choices: you can simply keep
the entire previous frame, just in case a succeeding frame might predict
parts of itself from this data. Alternatively, you can keep only the
parts that are really _used_ for prediction -- which, of course, means
that you count up the references, and dispose of each block when it's no
longer needed.

[ ... ]



> > As has already been pointed out, reference counting CAN be
> > made incremental (in fact, by its very nature it's much more incremental
> > than most other forms of GC)
>
> Mathematica and OCaml demonstrate the opposite.

They demonstrate nothing of the sort.

>
> >> No. Locality is worse with reference counting which is why it has gotten
> >> relatively slower since that ancient survey. You can easily test this
> >> yourself.
> >
> > I have. You're wrong. The fact that you'd claim this at all indicates
> > that you're utterly clueless about how GC works at all.
>
> Yet you still haven't ported the benchmark I cited.

Believing that benchmark means _anything_ about GC demonstrates still
more thoroughly that you haven't a clue of what you're talking about.

> > In reference counting, you have an object and the reference count is
> > part of that object. The reference count is never touched except when
> > there is an operation on the object,
>
> No: reference counts are updated whenever the value is referenced or
> dereferenced and those are not operations on the value.

If you're going to try to use a word like "dereferenced", you should
learn what it means first. As it stands right now, you come off a lot
like a six-year old trying to sound adult.

> > so locality of reference is usually
> > nearly perfect -- the only exception being when/if a cache line boundary
> > happens to fall precisely between the rest of the object and the
> > reference count. In many implementations, this isn't even possible; in
> > the relatively rare situation that it's possible, it's still quite rare.
> >
> > In other garbage collection, you start from a root set of objects and
> > walk through everything they point at to find all the active objects.
> > You then collect those that are no longer in use. When carried out as a
> > simple mark/sweep collector, the locality of reference of this is
> > absolutely _terrible_ -- every active object gets touched, and all its
> > pointers followed at _every_ collection cycle.
>
> That description is 48 years out of date.

Nonsense. I specifically said that "...when carried out as a simple
mark/sweep collector..." This is absolutely true today, just as it was
in the very first Lisp interpreter.

> > Modern collectors (e.g. generational scavengers) attempt to ameliorate
> > this massive problem. They do this by assuming that when an object has
> > survived enough cycles of collection that the object will probably
> > survive many more cycles as well, so the object (and everything to which
> > it refers) is moved to another area where it those objects are simply
> > treated as permanent.
>
> Older generations are not regarded as "permanent".

You're showing still more of your ignorance -- there are certainly
collectors that treat objects as permanent once they reach a sufficient
age.


> > The first work along this line simply created two areas, one for
> > transient objects and another for permanent objects. More recent work
> > creates more or less a gradient, in which objects that have survived the
> > longest are inspected the last often, and as objects survive collection
> > cycles, they are slowly promoted up the hierarchy until they are almost
> > never inspected for the possibility of being garbage.
> >
> > This improves locality of reference compared to old mark-sweep
> > collectors -- but it still has substantially worse locality of reference
> > than a typical reference counting system. In particular, in reference
> > counting, the reference count is NEVER touched unless an operation that
> > creates or destroys a reference to that object takes place.
>
> Which happens all the time and is the reason why reference counting has
> worse locality of reference and performance. The other reason is that
> reference counting causes fragmentation because it cannot move values.

Still more complete nonsense. "Locality of reference" is apparently
something else you shouldn't use because you obviously don't know what
it means. Reference counting and movement of objects are _completely_
orthogonal. Reference counting deals with figuring out _when_ an object
is dead, and has nothing whatsoever to do with _where_ the object is
stored.

[ ... ]

> > For applications that follow this pattern, most of the relatively recent
> > work in garbage collectors is not only useless, but actively
> > detrimental. The GC spends nearly all its time inspecting objects that
> > are not likely to die any time soon at all, while ignoring nearly all
> > the objects that are already dead or likely to die soon. The result is
> > lots of time wasted copying objects AND lots of memory wasted storing
> > objects that are actually dead.
>
> This is all speculation for which there is overwhelming evidence to the
> contrary. In short, if any of your points were correct then people would be
> building major GCs on reference counting but they are not.

Still more complete nonsense. Just for example, see David Ungar's
_Outwitting GC Devils: A Hybrid Incremental Garbage Collector_, where he
talks about problems they ran into when they first implemented
generational scavenging, and how they ameliorated them at that time. He
concluded that paper by saying:

Much work remains. We need to analyze the performance of
this system and see what devils plague it now. In the
meantime, this work may rekindle interest in pause-free
reclamation without read barriers, and we hope that those
who come after us in this field may profit from our
experience by watching out for unexpected devils.

Of course, if you want to argue with me about garbage collection, you're
going to end up reading a LOT of those old papers you disdain so much,
because as it stands right now, you don't even have a clue about the
basic background necessary to get started. Trying to discuss recent
developments with you would be a little like trying to teach string
theory to a three year old when he asks why things always fall down
instead of up -- you lack any of the background necessary to begin to
understand any of it.



> >> > You opinions on garbage collection reflect a tremendous enthusiasm, but
> >> > equally tremendous ignorance of the subject matter.
> >>
> >> Yet you cannot cite a single relevant reference or piece of objective
> >> evidence to support an argument that flies in the face of almost all
> >> modern software development (outside restricted memory environments).
> >
> > Quite the contrary: the reference provided was completely relevant and
> > remains as accurate today as it was when it was new.
>
> Even its prophecies that we now know to be wrong?

What would those be? So far, the only one you've disputed HAS come true.
Worse, despite your attempts to change the subject, the citation was
originally about the incremental nature of reference counting -- and
you've yet show a single shred of evidence that this is not the case.



> > The techniques to
> > make reference counting incremental and, if necessary, real-time work
> > today, just like they did 16 years ago.
>
> But they aren't used because everyone has moved on to real GCs because they
> are better in almost every respect, including the one's you're trying to
> contest.

You obviously don't know WHAT I'm contesting at all!



> > Some types of data become obsolete quickly. The specific details of a
> > particular model of CPU that was current 16 years ago would have little
> > relevance today.
>
> The memory gap is the name given to the phenomenon that has obsoleted your
> argument.

You wish.

[ ... ]



> Then you should be able to provide credible references and worked counter
> examples as I have.

Your inability to recognize when you've been proved wrong is hardly my
problem.

Chris Thomasson

unread,
Mar 31, 2008, 3:27:27 AM3/31/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v06uu...@corp.supernews.com...
> Ian Collins wrote:
>> So it does require help from the compiler to generate appropriate code.
>
> Yes.
>
>> An after-the-fact GC library would not be able to do as you described.
>
> The language implementation is now split into the compiler and the GC
> which
> must be designed to cooperate.

Is a great fact that we can use C/C++ __and__ some assembly language to
create any garbage collector you can think of. C/C++ programmers have that
freedom. Does that prove anything? Na. It only shows how C/C++ is at a low
enough level to allow the creation of Java, C#, or basically any other
language and/or GC. Got it?

:^D

Chris Thomasson

unread,
Mar 31, 2008, 3:27:38 AM3/31/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0od7...@corp.supernews.com...

> Chris Thomasson wrote:
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13v0nrt...@corp.supernews.com...
>>> Chris Thomasson wrote:
>>>> Then I say your totally wrong and give a simple counter example...
>>>
>>> To be a valid counter example your program would have to prove that
>>> reference counting is always more accurate. Your example does not prove
>>> that. It is not a counter example.
>>
>> You example did not prove that reference counting is not accurate.
>
> Irrelevant.

Nope.

Chris Thomasson

unread,
Mar 31, 2008, 3:33:59 AM3/31/08
to

"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0jib...@corp.supernews.com...

> Chris Thomasson wrote:
>> Trolling? Anyway, have nothing against GC. I simply wanted to let Jon
>> know that most of his blanket claims about reference counting were false,
>> or misleading at best.
>
> Ironic result then. :-)

Your trolling AGAIN! You say I am living many years in the past simply
because I make use of several forms of distributed reference counting. You
are full of it, and need to understand that GC is a great tool. Its only a
TOOL! Not an all purpose answer! There are many different forms of reference
counting and GC; so be it. You make false claims on ALL of them with your
bullshi% blanket ignorant statements. Therefore, I point out problems with
your logic; so be it. GC is good, Ref-Counting is good: Get it???

Chris Thomasson

unread,
Mar 31, 2008, 3:46:18 AM3/31/08
to
"Jon Harrop" <use...@jdh30.plus.com> wrote in message
news:13v0nlf...@corp.supernews.com...
> Chris Thomasson wrote:
[...]

> Yes: the task rewrites expressions trees. There is a very similar C++
> program here:
>
> http://www.codecodex.com/wiki/index.php?title=Derivative
[...]

I do not have the time either today nor tomorrow to create a program that
can compete with a GC lang wrt specific problem arena at hand. Well, I am
going to give your benchmark a whirl anyway in C++, give me two or three
days please??? I think I can cut the number of new/delete calls by a fairly
wide margin. If I can use an inherently non garbage collected language like
C and/or C++ to even get a 1ms gain over one of the other langs that are GC
by nature... Well, if I can do it, will you go ahead and try an compete with
me in creating a process-wide distributed factory/obserbver pattern? I know
that GC does not collect over multi-process programs. Well, that's your
problem. I use C/C++ which gives be the freedom to implement such things.
Your GC native languages will have to compete in the multi-threaded __and__
multi-process world. If I can get a 1ms gain, good luck trying to HACK a GC
lang to go into uncharted waters!

:^D


What say you? I know that GC is not multi-process friendly... That your
problem, not mine. I use C/C++/x86-asm/SPARC-asm... GC lang? Okay. Lets rock
and roll!

:^|

Chris Thomasson

unread,
Mar 31, 2008, 3:55:47 AM3/31/08
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:Ya-dnVvMo6EBC23a...@comcast.com...

> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13v0nlf...@corp.supernews.com...
>> Chris Thomasson wrote:
> [...]
>> Yes: the task rewrites expressions trees. There is a very similar C++
>> program here:
>>
>> http://www.codecodex.com/wiki/index.php?title=Derivative
> [...]
>
> I do not have the time either today nor tomorrow to create a program that
> can compete with a GC lang wrt specific problem arena at hand. Well, I am
> going to give your benchmark a whirl anyway in C++, give me two or three
> days please??? I think I can cut the number of new/delete calls by a
> fairly wide margin. If I can use an inherently non garbage collected
> language like C and/or C++ to even get a 1ms gain over one of the other
> langs that are GC by nature... Well, if I can do it, will you go ahead and
> try an compete with me in creating a process-wide distributed
> factory/obserbver pattern? I know that GC does not collect over
> multi-process programs. Well, that's your problem.
[...]

I am not so sure I can even get a 1ms gain, I have not tried yet but I think
this is going to be a challenge indeed! Thanks for the opportunity! Really,
I appreciate your time and patience with me Doctor!

:^)

BTW, the above was not meant to be sarcastic in any way, shape or form.

Alexander Terekhov

unread,
Mar 31, 2008, 3:55:55 AM3/31/08
to

Jon Harrop wrote:
[...]

> {
> Bar bar()
> f(bar);
> g()
> }
>
> Reference counting will keep "bar" alive until its reference count happens
> to be zeroed when it drops out of scope even though it is not reachable
> during the call to "g()". Real garbage collectors can collect "bar" during
> the call to "g" because it is no longer reachable.

The above is pretty much what you get from

using (Bar bar = new Bar()) {
f(bar);
g();
}

in GC environment. See

http://msdn2.microsoft.com/en-us/library/yh598w02(VS.80).aspx

>
> So GCs can clearly collect sooner than reference counters.

You probably mean

using (Bar bar = new Bar()) {
f(bar);
}
g();

which is pretty much what you get from

{
Bar bar();
f(bar);
}
g();

;-)

I'm not sure what does that have to do with shared_ptr<> vs GC topic...

regards,
alexander.

Ian Collins

unread,
Mar 31, 2008, 4:06:48 AM3/31/08
to
Chris Thomasson wrote:
>
> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
> news:13v06uu...@corp.supernews.com...
>> Ian Collins wrote:
>>> So it does require help from the compiler to generate appropriate code.
>>
>> Yes.
>>
>>> An after-the-fact GC library would not be able to do as you described.
>>
>> The language implementation is now split into the compiler and the GC
>> which
>> must be designed to cooperate.
>
> Is a great fact that we can use C/C++ __and__ some assembly language to
> create any garbage collector you can think of.

I thought C/C++ resulted in undefined behaviour :)

--
Ian Collins.

Dmitriy V'jukov

unread,
Mar 31, 2008, 8:05:13 AM3/31/08
to
On Mar 31, 12:35 am, Jon Harrop <use...@jdh30.plus.com> wrote:

> > Please show me a 100% accurate GC; good luck finding one...
>
> That is irrelevant. Your argument in favor of reference counting was
> completely fallacious.
>

> You claimed was that reference counting is "more accurate than a traditional
> GC could ever be". Consider:
>
> {

> Bar bar()
> f(bar);
> g()
> }
>
> Reference counting will keep "bar" alive until its reference count happens
> to be zeroed when it drops out of scope even though it is not reachable
> during the call to "g()". Real garbage collectors can collect "bar" during
> the call to "g" because it is no longer reachable.


Can Real garbage collectors do this w/o help of compiler? I'm not
sure.
With help of compiler reference-counting can easily collect bar
precisely and promptly at the end of f().


Dmitriy V'jukov

Dmitriy V'jukov

unread,
Mar 31, 2008, 8:12:24 AM3/31/08
to
On Mar 31, 12:35 am, Jon Harrop <use...@jdh30.plus.com> wrote:

> Reference counts consume a lot more space.
>

> > BTW, there are refcounting
> > algorithms that do not need any per-object meta-data. For instance, vZOOM
> > keeps this information on a per-thread basis. There is GC meta-data that
> > keeps track of objects. There has to be. Think about it.
>
> Yes. For example, OCaml hides two bits inside each pointer totalling zero
> overhead. In contrast, a reference counting system is likely to add a
> machine word, bloating each and every value by 8 bytes unnecessarily on
> modern hardware.


There is proxy-collector based on reference-counting which can be
implemented w/o any per-object overhead.

There is 'one-bit' reference counting which also uses only low bit of
pointer.

It seems that you limit your vision only to decades-old plain-old-
reference-counting algorithm.


Dmitriy V'jukov

Chris Thomasson

unread,
Mar 31, 2008, 8:39:15 AM3/31/08
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:Ya6dnaEscqVIBW3a...@comcast.com...

>
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:Ya-dnVvMo6EBC23a...@comcast.com...
>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>> news:13v0nlf...@corp.supernews.com...
>>> Chris Thomasson wrote:
>> [...]
>>> Yes: the task rewrites expressions trees. There is a very similar C++
>>> program here:
>>>
>>> http://www.codecodex.com/wiki/index.php?title=Derivative
>> [...]
[...]

Actually, the only way I can really think of competing with a GC lang wrt
the C++ code provided in the link above is to simply create a slab
per-concrete-object (e.g., Var, Int, Plus and Times). That will drastically
cut down on calls to new/delete. I will have code for you in a day or two.
Probably two because I am getting ready for a move from the Bay Area to the
South Tahoe Basin. Anyway, here is simple sketch for the Int class in the
example you linked to:

origin:
_______________________________________________________________________
class Int : public Base {
const int n;
public:
Int(int m) : n(m) {}
~Int() {}
const Base *clone() { return new Int(n); }
const Base *d(const string &v) const { return new Int(0); }
ostream &print(ostream &o) const { return o << n; }
};
_______________________________________________________________________

<sketch w/ typos>
_______________________________________________________________________
#define INT_DEPTH() 10000


struct Int : public Base {
int n;
Int* m_next;

Int(int m) : n(m) {}
~Int() {}

void Ctor(int m) {
n = m;
}

static Int* g_head;
static int g_depth;

static Int* CachePop(int const m) {
Int* _this = g_head;
if (! _this) {
_this = new Int(m);
} else {
g_head = _this->m_next;
--g_depth;
_this->Ctor(m);
}
return _this;
}

static void CachePush(Int* const _this) {
if (g_depth < INT_DEPTH()) {
_this->m_next = g_head;
g_head = _this;
++g_depth;
} else {
delete _this;
}
}


const Base *clone() { return CachePop(n); }
const Base *d(const std::string &v) const { return CachePop(0); }
std::ostream &print(std::ostream &o) const { return o << n; }
};


Int* Int::g_head = NULL;
int Int::g_depth = 0;

_______________________________________________________________________

Alls I have to do is replace 'Var, Int, Plus and Times' with the VERY simple
cache outlined above and I know it will cut the number of new/delete calls
by a large margin. That about all I can do in C++. Is that fair?

;^D

Chris Thomasson

unread,
Mar 31, 2008, 8:49:34 AM3/31/08
to

"Ian Collins" <ian-...@hotmail.com> wrote in message
news:65bkcoF...@mid.individual.net...

I really do like that C and C++ is low-level enough to be able to provide
the ability to actually create other great languages, and the GC's that go
along with them of course...

:^)

Chris Thomasson

unread,
Mar 31, 2008, 8:57:14 AM3/31/08
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:ruCdncUwnszeRm3a...@comcast.com...

>
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:Ya6dnaEscqVIBW3a...@comcast.com...
>>
>> "Chris Thomasson" <cri...@comcast.net> wrote in message
>> news:Ya-dnVvMo6EBC23a...@comcast.com...
>>> "Jon Harrop" <use...@jdh30.plus.com> wrote in message
>>> news:13v0nlf...@corp.supernews.com...
>>>> Chris Thomasson wrote:
>>> [...]
>>>> Yes: the task rewrites expressions trees. There is a very similar C++
>>>> program here:
>>>>
>>>> http://www.codecodex.com/wiki/index.php?title=Derivative
>>> [...]
> [...]
>
[...]

> Alls I have to do is replace 'Var, Int, Plus and Times' with the VERY
> simple cache outlined above and I know it will cut the number of
> new/delete calls by a large margin. That about all I can do in C++. Is
> that fair?

Also, I would need to create a special virtual function or id in the base
class which allows be to replace delete calls. Sketch:
___________________________________________________________________________
class Base {
public:
virtual ~Base() {};
virtual void CacheDelete() = 0;
virtual const Base *clone() = 0;
virtual const Base *d(const std::string &v) const = 0;
virtual std::ostream &print(std::ostream &o) const = 0;
};


#define INT_DEPTH() 10000


struct Int : public Base {
int n;
Int* m_next;

Int(int m) : n(m) {}
~Int() {}

void Ctor(int m) {
n = m;
}

static Int* g_head;
static int g_depth;

void CacheDelete() {
CachePush(this);
}

static Int* CachePop(int const m) {
Int* _this = g_head;
if (! _this) {
_this = new Int(m);
} else {
g_head = _this->m_next;
--g_depth;
_this->Ctor(m);
}
return _this;
}

static void CachePush(Int* const _this) {
if (g_depth < INT_DEPTH()) {
_this->m_next = g_head;
g_head = _this;
++g_depth;
} else {
delete _this;
}
}


const Base *clone() { return CachePop(n); }
const Base *d(const std::string &v) const { return CachePop(0); }
std::ostream &print(std::ostream &o) const { return o << n; }
};


Int* Int::g_head = NULL;
int Int::g_depth = 0;

___________________________________________________________________________

This is only the Int class, but you get my drift... I would replace delete
calls on the Base with Base->DELETE(). This would be simpler than an id.

Dilip

unread,
Mar 31, 2008, 9:28:04 AM3/31/08
to

I am not so sure. In the former example all the 'using' block does is
to call bar.Dispose() as a syntactic shortcut on reaching the closing
scope. That is useful only if Bar is encapsulating some kind of non-
memory resource like a file handle or a socket and that needs to be
closed as soon as f(bar) returns. The 'bar' object itself (after
f(bar) returns) is only _eligible_ for collection. Its memory will
_not_ be reclaimed until the GC actually runs (which won't happen
until there is a memory pressure in Gen 0 regions).

In the latter case, bar is destructed right after f(bar) returns which
is what I thought C++ meant by _deterministic_ destruction.

This is what James Kanze has been lamenting through out this thread.

Am I wrong?

Alexander Terekhov

unread,
Mar 31, 2008, 11:15:26 AM3/31/08
to

IOW, you want some observable behavior (other than memory reclamation,
see below) to take place precisely at scope exit.

> f(bar) returns) is only _eligible_ for collection. Its memory will
> _not_ be reclaimed until the GC actually runs (which won't happen
> until there is a memory pressure in Gen 0 regions).
>
> In the latter case, bar is destructed right after f(bar) returns which
> is what I thought C++ meant by _deterministic_ destruction.
>

Think of the latter case with bar being a handle/wrapper for GC_MALLOC'd
memory (using Boehm collector or some such) containing a bunch of
something "like a file handle or a socket and that needs to be closed as
soon as f(bar) returns."

regards,
alexander.

Dilip

unread,
Mar 31, 2008, 12:03:28 PM3/31/08
to

I am still not getting it. In the former case, non-memory resources
are disposed of promptly at the end of scope. In the latter case, not
only are the non-memory resources disposed of, the destructor ensures
that the memory used by that object is also released. Isn't there a
difference?

Alexander Terekhov

unread,
Mar 31, 2008, 1:33:12 PM3/31/08
to

Dilip wrote:
[...]

> I am still not getting it. In the former case, non-memory resources
> are disposed of promptly at the end of scope. In the latter case, not
> only are the non-memory resources disposed of, the destructor ensures
> that the memory used by that object is also released. Isn't there a
> difference?

In the latter case, the destructor can basically do the same as
IDisposable.Dispose() in the former case and simply "offload" memory
release to some garbage collector like Boehm's one ("leaking" memory
garbage and relying on GC to collect it).

regards,
alexander.

Dmitriy V'jukov

unread,
Mar 31, 2008, 1:39:07 PM3/31/08
to
On Mar 31, 12:35 am, Jon Harrop <use...@jdh30.plus.com> wrote:

> > Please show me a 100% accurate GC; good luck finding one...
>
> That is irrelevant. Your argument in favor of reference counting was
> completely fallacious.
>
> You claimed was that reference counting is "more accurate than a traditional
> GC could ever be". Consider:
>
> {

> Bar bar()
> f(bar);
> g()
> }
>
> Reference counting will keep "bar" alive until its reference count happens
> to be zeroed when it drops out of scope even though it is not reachable
> during the call to "g()". Real garbage collectors can collect "bar" during
> the call to "g" because it is no longer reachable.
>

> So GCs can clearly collect sooner than reference counters.


Can Real garbage collectors do this w/o help of compiler? I don't
think so...


With help of compiler reference-counting can easily collect 'bar'

precisely and promptly at the end of f().

You are mixing up GC with language, compiler and runtime.


Dmitriy V'jukov

Dmitriy V'jukov

unread,
Mar 31, 2008, 1:55:46 PM3/31/08
to
On Mar 31, 12:35 am, Jon Harrop <use...@jdh30.plus.com> wrote:

> Reference counts consume a lot more space.
>
> > BTW, there are refcounting
> > algorithms that do not need any per-object meta-data. For instance, vZOOM
> > keeps this information on a per-thread basis. There is GC meta-data that
> > keeps track of objects. There has to be. Think about it.
>
> Yes. For example, OCaml hides two bits inside each pointer totalling zero
> overhead. In contrast, a reference counting system is likely to add a
> machine word, bloating each and every value by 8 bytes unnecessarily on
> modern hardware.


If you want as big overhead as possible then you can even add 1 MB per
each and every value. If you don't then you can use reference-counting
scheme w/o per object overhead.


Dmitriy V'jukov

It is loading more messages.
0 new messages