Reference counting

Janos Blazi

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

What does the expression "reference counting" mean?
Can somebody tell me?

Janos Blazi

Barry Margolin

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

In article <abhorrent-7vnhei/INN-2.2.1/alc...@broadway.news.is-europe.net>,

Janos Blazi <jbl...@netsurf.de> wrote:
>What does the expression "reference counting" mean?
>Can somebody tell me?

It's a garbage collection technique that works by having each object
maintain a count of the number of pointers that refer to it. Every time a
pointer is created, the counter in the object it points to is increased.
Every time a pointer is deleted, this counter is decremented. And when a
pointer is modified, the counter in the old object is decreased, while the
counter in the new object is increased. Whenever a counter drops to 0, the
object has become garbage and may be collected; if the object contains any
pointers, they get deleted in the process, which will decrement the
counters in the objects they point to, and so on.

Reference is not usually used as the sole GC mechanism in a general system,
because it will never clean objects that are involved in circular reference
chains. However, it can be used in applications that are known not to have
such chains or take care to deal with them appropriately. Some programming
language implementations also use RC as the primary GC mechanism, but will
revert to a more effective GC mechanism when necessary (I think most
Smalltalk implementations are like this).

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Paul Wallich

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

In article
<abhorrent-7vnhei/INN-2.2.1/alc...@broadway.news.is-europe.net>, "Janos
Blazi" <jbl...@netsurf.de> wrote:

>What does the expression "reference counting" mean?
>Can somebody tell me?

Reference counting means keeping track of how many things
refer to a particular object/chunk of memory. If nothing
refers/has a pointer to the thing, then for practical purposes
it no longer exists and the space it takes can be reclaimed
(see garbage collection) for other objects. (Analogy to a disk
directory: if no file descriptor contains a reference to a particular
block on disk, that block is free for use.) The reference count is
usually stored with the object in question, and incremented or
decremented automatically by routines that handle references to
objects.

Reference counting is one way of managing memory allocation
but is not generally used in lisp in its (r.c.'s) most straightforward form.

paul somewhat surprised by such a basic question

Christopher R. Barry

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

"Janos Blazi" <jbl...@netsurf.de> writes:

> What does the expression "reference counting" mean?
> Can somebody tell me?

One day a student came to Moon and said, "I understand how to
make a better garbage collector. We must keep a reference count
of the pointers to each cons." Moon patiently told the student
the following story:

"One day a student came to Moon and said, `I understand how
to make a better garbage collector . . . '"

Christopher

Janos Blazi

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

In my native tongue Hungarian there is proverb: "There are no old jokes,
there are only old men. For a new born are all jokes new."

For ROBERT:
"Nincsenek régi viccek, csak öreg emberek vannak. Egy újszülöttnek minden
vicc új."

Janos Blazi

Paul Wallich <p...@panix.com> schrieb in im Newsbeitrag:
pw-021199...@166.84.250.180...

> In article
> <abhorrent-7vnhei/INN-2.2.1/alc...@broadway.news.is-europe.net>, "Janos
> Blazi" <jbl...@netsurf.de> wrote:
>

> >What does the expression "reference counting" mean?
> >Can somebody tell me?
>

David Hanley

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

Barry Margolin wrote:Some programming

> language implementations also use RC as the primary GC mechanism, but will
> revert to a more effective GC mechanism when necessary (I think most
> Smalltalk implementations are like this).

I find this a little weird. The recent research I've seen indicates that a good GC

is better than ref counting, particularly when there is a reasonable amount of
objects changing hands, and absolutely in a multithreaded environment.

dave

Joachim Achtzehnter

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

"Janos Blazi" <jbl...@netsurf.de> writes:
>
> Paul Wallich <p...@panix.com> schrieb in im Newsbeitrag:
> >

> > somewhat surprised by such a basic question
>

> In my native tongue Hungarian there is proverb: "There are no old
> jokes, there are only old men. For a new born are all jokes new."

You are missing the point. The fact that the answers to these "basic
questions" may be new and interesting revelations to you is not
surprising. What is surpising is that you seem to have the expectation
that more experienced people will continue to volunteer their precious
time answering such questions, when the person asking them obviously
has made zero effort trying to find answers on his own.

Joachim

--
joa...@kraut.bc.ca (http://www.kraut.bc.ca)
joa...@mercury.bc.ca (http://www.mercury.bc.ca)

Frank A. Adrian

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

Barry Margolin <bar...@bbnplanet.com> wrote in message
news:U4IT3.10$Q6.936@burlma1-snr2...

> Some programming
> language implementations also use RC as the primary GC mechanism, but will
> revert to a more effective GC mechanism when necessary (I think most
> Smalltalk implementations are like this).

This is incorrect. There hasn't been a reference counting based Smalltalk
implementation since Ungar's generational GC papers came out. Even most
non-Xerox implementations of that era already had real GC as their memory
collection technique. And they did it for the same reasons Lisper's have
known for years - systems that generate large numbers of small, short-lived
objects do better with GC than RC.

And don't try to mention this misconception on comp.lang.smalltalk, either.
You'll be lucky if your asbestos suit doesn't get burnt up. This statement
is as insulting to Smalltalkers as the "All Lisps are interpreted" statement
is to Lispers.

At this point, the only people still thinking RC is a good memory management
technique are the brain damage cases (usually from the C++ camps) who (a)
are so used to NO memory management that they think ANY kind of memory
management is a good idea and (b) can't understand that anything other than
heavyweight objects need to be allocated.

faa

Bijan Parsia

unread,

Nov 2, 1999, 3:00:00 AM11/2/99

to

Barry Margolin <bar...@bbnplanet.com> wrote:

[snip]

> Some programming
> language implementations also use RC as the primary GC mechanism, but will
> revert to a more effective GC mechanism when necessary (I think most
> Smalltalk implementations are like this).

Actually, I don't believe any extent or historical implementation of
Smalltalk used this dual schema, and while there were probably a few
experimental or prototype systems which used reference counting as the
primary mechanism, I don't *think* than any production system did. The
Blue Book describes reference counting as a possible gc technique, but
the state of the art in Smalltalk GC is, FWICT, at least as good as the
state of the art in Common Lisp and Scheme implementations. Squeak, for
example, has a very interesting generational gc where the strategy is to
gc a small young pool very frequently. The pool is small enough that
gcing even on modest hardware takes very little--and predictable--time.
Full gcs are exceeding rare. This permitted Squeak to, for example, do
realtime FM synthesis with 4 voices (or something like that :)), again,
on fairly old macs. I don't think I've *ever* had a spontaneous
perceivable GC pause while working in Squeak. On the contrary, I find
that many large apps written in MCL flash the little GC cursor at me
quite a bit (my latest nemisis in this regard is StarLogo), but that
seems to be a problem with those apps rather than an inherent defect of
MCL, as I had no such troubles with SK8.

Interestingly, there was just a thread on the Squeak list about *adding*
reference counting for certain kinds of objects. (And it would be a good
thing, I think, for objects in need of finalization, filestreams,
sockets, etc.)

VisualWorks has extensive tuning capabilities for it's gc, but I don't
*think* reference counting is one of them. IMHO, once you've got full
garbage collection, reference counting is silly *except* for those
objects for which a guaranteed freeing is a very good idea. And that's a
relatively small number of objects.

CPython and Perl, OTOH, do in fact use nothing but reference counting.

(I still bet that an often gced small pool is probably a better idea, in
the long run. I think Python holds onto reference counting to simplify
integration with C modules.)

Cheers,
Bijan.

Janos Blazi

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

You are right, I have not tried to find books on the subject. To tell you
the truth I would not even know where to start. An would be a lot of work. I
am reading the news and every now and then I ask a question.

But anyway: Isn't is a free newsgroup? Can't I ask as many questions as I
want to? NOBODY MUST ANSWER MY QUESTIONS. IF YOU DO NOT WANT TO ASK, SIMPLY
SKIP THEM. ALSO YOU CAN MARK THEM AS READ.

But a lot of people answer. What is wrong with that? I learn a lot, other
newcomers may learn a lot and even the experts cac clarify their positions.
Look at the threads I created.

So I do not understand your lack of patience. My questions, however naive or
stupid they are, cannot disturb you in any tangible way. Or do you mean
"somebody with your history should not have the right to ask any questions"?

Janos Blazi

Joachim Achtzehnter <joa...@kraut.bc.ca> schrieb in im Newsbeitrag:
x5yacgo...@soft.mercury.bc.ca...

Erik Naggum

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

* Janos Blazi -> Joachim Achtzehnter

| So I do not understand your lack of patience.

then you should make an effort to understand people's lack of patience
with you, rather than telling them there's something wrong with them
because you don't understand something. thinking very carefully about
this now may be the best spent time of the rest of your life, as it may
dramatically change the way you interact with people you don't know.

#:Erik

Joachim Achtzehnter

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

"Janos Blazi" <jbl...@netsurf.de> writes:
>
> You are right, I have not tried to find books on the subject. To
> tell you the truth I would not even know where to start.

Most people these days would probably do an online search to
start. General Web search engines can dig up a lot of useful links,
and there is always www.deja.com to search newsgroup archives.

> An would be a lot of work.

That was my point. You expect OTHERS to do the work for you, while your
effort is limited to "asking a question now and then".

> But anyway: Isn't is a free newsgroup? Can't I ask as many questions
> as I want to? NOBODY MUST ANSWER MY QUESTIONS. IF YOU DO NOT WANT TO
> ASK, SIMPLY SKIP THEM. ALSO YOU CAN MARK THEM AS READ.

> [more angry bla bla omitted]

Yes, you can certainly continue behaving the way you do. You again
missed the point. I was not trying to stop you from doing anything,
this would have been hopeless, anyway. I was merely trying to help you
understand what the other poster meant with his remark.

Barry Margolin

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

In article <1e0nzhx.1tm91xf1glv5faN%bpa...@email.unc.edu>,

Bijan Parsia <bpa...@email.unc.edu> wrote:
>Actually, I don't believe any extent or historical implementation of
>Smalltalk used this dual schema, and while there were probably a few
>experimental or prototype systems which used reference counting as the
>primary mechanism, I don't *think* than any production system did. The
>Blue Book describes reference counting as a possible gc technique, but
>the state of the art in Smalltalk GC is, FWICT, at least as good as the
>state of the art in Common Lisp and Scheme implementations.

What I know about Smalltalk implementations I learned from the book
"Smalltalk-80: Bits of History, Words of Advice", which is a collection of
papers about Smalltalk implementations and analyses thereof. Reference
counting is mentioned frequently.

Admittedly, this book was published 15 years ago, and the papers were
written when modern GC techniques (e.g. generational GC) were still active
research projects, whereas the papers describe the actual, deployed ST-80
implementations.

Tim Bradshaw

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

* Bijan Parsia wrote:

> CPython and Perl, OTOH, do in fact use nothing but reference counting.

> (I still bet that an often gced small pool is probably a better idea, in
> the long run. I think Python holds onto reference counting to simplify
> integration with C modules.)

ref counting presumably has the advantage that you don't have to move
objects, which is death (or a lot of work) for C integration. I guess
you could use a noncompacting mark-sweep GC though?

It also seems to me that ref counting, unless you work very hard at
it, is really bad for performance on modern HW because it increases
memory traffic a lot (you have to keep incrementing/decrementing
references) and memory bandwidth is a huge bottleneck for any modern
machine. Someone once told me that one of the reasons for the bad
performance of large C++ systems is that they all really do ref
counting (because they have to do *some* mem management, and they are
unwilling to use, say, the Boehm collector or maybe they just don't
know about it), and they typically do it naively, and this basically
doubles memory traffic or something.

Someone (Duane!) will now point out that this is all wrong and you can
do ref counting with no overhead...

--tim

Rolf-Thomas Happe

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

ftp.cs.utexas.edu/pub/garbage/ holds papers on garbage collection
and dynamic memory allocation, including Paul Wilson's survey paper
"Uniprocessor garbage collection techniques":

ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps
ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps (extended draft)

rthappe

Paul Wallich

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

In article <x5u2n4n...@soft.mercury.bc.ca>, Joachim Achtzehnter
<joa...@kraut.bc.ca> wrote:

What the other poster meant was, "Gee, for someone who has been
expressing opinions on lisp vehemently and at length, you appear to
have no clue about one of the fundamental features of the language."
(Although it might be possible to learn about automatic storage
management aka garbage collection without being able to figure out
what reference counting was, it would take a lot of effort.)

paul

Barry Margolin

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

In article <ey3so2o...@lostwithiel.tfeb.org>,

Tim Bradshaw <t...@tfeb.org> wrote:
>* Bijan Parsia wrote:
>
>> CPython and Perl, OTOH, do in fact use nothing but reference counting.
>
>> (I still bet that an often gced small pool is probably a better idea, in
>> the long run. I think Python holds onto reference counting to simplify
>> integration with C modules.)
>
>ref counting presumably has the advantage that you don't have to move
>objects, which is death (or a lot of work) for C integration. I guess
>you could use a noncompacting mark-sweep GC though?

Right.

>It also seems to me that ref counting, unless you work very hard at
>it, is really bad for performance on modern HW because it increases
>memory traffic a lot (you have to keep incrementing/decrementing
>references) and memory bandwidth is a huge bottleneck for any modern
>machine.

The papers in the ancient Smalltalk book I referred to earlier mentioned
something called "deferred reference counting", which is supposed to
improve performance.

Duane Rettig

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

Tim Bradshaw <t...@tfeb.org> writes:

> * Bijan Parsia wrote:
>
> > CPython and Perl, OTOH, do in fact use nothing but reference counting.
>
> > (I still bet that an often gced small pool is probably a better idea, in
> > the long run. I think Python holds onto reference counting to simplify
> > integration with C modules.)
>
> ref counting presumably has the advantage that you don't have to move
> objects, which is death (or a lot of work) for C integration. I guess
> you could use a noncompacting mark-sweep GC though?

Any memory management that does not compact in some way has the
additional disadvantage of having to deal with memory fragmentation;
on larger apps locality of reference is eventually lost and the
working set (or "resident set") grows. Whenever you want fast
interactions, start-up times, etc. on modern paging machines, the
working or resident set size is the number you want to minimize.
This is why the generational gcs are so good; because they tend
to work on a small part of memory and thus limit the number of
pages actually in memory.

> It also seems to me that ref counting, unless you work very hard at
> it, is really bad for performance on modern HW because it increases
> memory traffic a lot (you have to keep incrementing/decrementing
> references) and memory bandwidth is a huge bottleneck for any modern

> machine. Someone once told me that one of the reasons for the bad
> performance of large C++ systems is that they all really do ref
> counting (because they have to do *some* mem management, and they are
> unwilling to use, say, the Boehm collector or maybe they just don't
> know about it), and they typically do it naively, and this basically
> doubles memory traffic or something.

Yet another cost to reference counting is the fact that each object
or group of objects being counted must actually have a counter
associated with it. This directly increases the memory cost of the
object, and coupled with the increased memory traffic you point out,
it increases the resident set size.

> Someone (Duane!) will now point out that this is all wrong and you can
> do ref counting with no overhead...

If anyone knows how to do ref counting with no overhead, please let
me know!

--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)

Janos Blazi

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

Well, I have never pretended to be a LISP expert. And I did not express any
opinions about LISP but that a liked it.

But anyway I expressed my doubts. Not in LISP. I asked myself and then I
asked this newsgroup, if LISP was as popular as it deserved.

In return I was called a NAZI and when I proved that could not possibly be
one things got even much worse.

The point is that none of you contradicted and so you tacitly expressed you
approval. You, dear Paul, did not contradict either.
So what kind of people are you? Is nothing sacred for you?

Is LISP the only value in your life? And good is what is good for LISP and
bad is what is bad for LISP, whatever the price may be.

So please, dear Paul and dear others, stop lecturing me about what is good
and what is evil.

Janos Blazi

Paul Wallich <p...@panix.com> schrieb in im Newsbeitrag:

pw-031199...@166.84.250.180...

Erik Naggum

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

* Janos Blazi

| In return I was called a NAZI

really? can you please quote the article where you were?

| So please, dear Paul and dear others, stop lecturing me about what is
| good and what is evil.

no, when you are an evil person, it has to be brought to the attention of
the world in general so people can respond responsibly to you existence.

suddenly, I understand why you have to attack people and attribute so
many incredible things to them that they have never said or done: you're
an evil person, you know that you are, and the only hope you can have for
this not to be commonly known is to make _everybody_ equally evil so your
own evil nature drowns in the noise you create. that's why you had to
keep posting insane accusations about me for several days, too: your need
was to divert attention from my acute perception that you are destructive
to the Lisp world. please start using C++. you deserve each other.

you have been evaluated. you have a negative reference count. prepare
to be garbage collected. persistence is futile.

#:Erik

Eugene Zaikonnikov

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

Janos Blazi wrote:
>
[snip]

> The point is that none of you contradicted and so you tacitly expressed you
> approval. You, dear Paul, did not contradict either.
> So what kind of people are you? Is nothing sacred for you?
>

Well Janos, usually it's a good rule not to intrude into mutual attacks
of others. Otherwise you'll be unavoidably flamed from at least one side
for that.
I think most readers suppose that you are adult person and can stand for
yourself. Do *you* always insert your words when someone flamed?

> Is LISP the only value in your life? And good is what is good for LISP and
> bad is what is bad for LISP, whatever the price may be.
>

> So please, dear Paul and dear others, stop lecturing me about what is good
> and what is evil.
>

Nobody lectures you. Just note for the future that starting with posts
like "Is FOO dying" in a dedicated newsgroup is not a good idea. What
other response did you expected? That people would say "Yes, it is"?

--
Eugene.

Jim Driese

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

Erik Naggum stands accused of writing:

> * Janos Blazi
> | In return I was called a NAZI
>
> really? can you please quote the article where you were?
>

> | So please, dear Paul and dear others, stop lecturing me about what is
> | good and what is evil.
>

> no, when you are an evil person, it has to be brought to the attention of
> the world in general so people can respond responsibly to you existence.
>
> suddenly, I understand why you have to attack people and attribute so
> many incredible things to them that they have never said or done: you're
> an evil person, you know that you are, and the only hope you can have for
> this not to be commonly known is to make _everybody_ equally evil so your
> own evil nature drowns in the noise you create. that's why you had to
> keep posting insane accusations about me for several days, too: your need
> was to divert attention from my acute perception that you are destructive
> to the Lisp world. please start using C++. you deserve each other.
>
> you have been evaluated. you have a negative reference count. prepare
> to be garbage collected. persistence is futile.
>
> #:Erik

You're both completely off-topic. Please move this to a suitable forum and
think before you post.

Jim Driese

Frank A. Adrian

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

Barry Margolin <bar...@bbnplanet.com> wrote in message

news:yCYT3.44$Q6.2091@burlma1-snr2...

> The papers in the ancient Smalltalk book I referred to earlier mentioned
> something called "deferred reference counting", which is supposed to
> improve performance.

It does improve performance - but not enough :-).

faa

Tim Bradshaw

unread,

Nov 3, 1999, 3:00:00 AM11/3/99

to

* Duane Rettig wrote:

> Any memory management that does not compact in some way has the
> additional disadvantage of having to deal with memory fragmentation;
> on larger apps locality of reference is eventually lost and the
> working set (or "resident set") grows. Whenever you want fast
> interactions, start-up times, etc. on modern paging machines, the
> working or resident set size is the number you want to minimize.
> This is why the generational gcs are so good; because they tend
> to work on a small part of memory and thus limit the number of
> pages actually in memory.

Or, perhaps more significantly, in cache. I suspect this may matter
more for most real machines now (certainly I almost never run anything
big enough to drive a machine into paging, and if I did I'd just buy
more memory, it's very cheap unless you have weirdo laptops or
something (hmm... I just bought one of those, dang)).

> Yet another cost to reference counting is the fact that each object
> or group of objects being counted must actually have a counter
> associated with it. This directly increases the memory cost of the
> object, and coupled with the increased memory traffic you point out,
> it increases the resident set size.

Can't you do a trick where you assume that almost all objects have
reference 1 or 0, and use a single bit in the pointer, together with
some pool of high-refcount objects. Of course those bits aren't free,
but it might mean that (at the cost of perhaps important type
information or something) you can do only one fetch rather than 2.

(I wonder if the C/C++ steam ref counting systems do this trick by
bit-twiddling the pointers, that's the sort of thing they'd really
delight in).

--tim

Janos Blazi

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Eugene Zaikonnikov <vik...@cit.org.by> schrieb in im Newsbeitrag:
38208290...@cit.org.by...

> Well Janos, usually it's a good rule not to intrude into mutual attacks
> of others. Otherwise you'll be unavoidably flamed from at least one side
> for that.
> I think most readers suppose that you are adult person and can stand for
> yourself. Do *you* always insert your words when someone flamed?
>

Well, this is a good question. To tell you the truth, if something like this
happens, I do not always interfere as I am a coward. But I know I would have
to.

> > Is LISP the only value in your life? And good is what is good for LISP
and
> > bad is what is bad for LISP, whatever the price may be.
> >

> > So please, dear Paul and dear others, stop lecturing me about what is
good
> > and what is evil.
> >

> Nobody lectures you. Just note for the future that starting with posts
> like "Is FOO dying" in a dedicated newsgroup is not a good idea. What
> other response did you expected? That people would say "Yes, it is"?
>

With hindsigth it was really naive. On the other hand I was VERY SERIOUSLY
considering using LISP (in my teaching) and wanted to be informed. And I
learned so much that maybe it was worth it.

But I really expected that you would say "No it isn't". Or "Yes it is, most
unfortunately: But there are still applications and new implementations and
some rather interesting new directions."

Or: "Well: it is not really dying, it is only being transformed into a more
dilute form". (That was a joke.)

Concerning books: It is possible that getting ANY book om some other
countries is a problem. If lived in such a country (as I did for a long
time) , I should be content with ANY translation of course.

But I wonder: Sometimes it seems to me that Russia has more mental power
than the whole Western World. Once a well know mathematician told me: "If I
want to learn about a new topic, I simply wait until a Russian book on the
subject is published." Are there not many books from Russian authors about
computing?

Janos Blazi

> --
> Eugene.

Duane Rettig

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Tim Bradshaw <t...@tfeb.org> writes:

> * Duane Rettig wrote:
>
> > Any memory management that does not compact in some way has the
> > additional disadvantage of having to deal with memory fragmentation;
> > on larger apps locality of reference is eventually lost and the
> > working set (or "resident set") grows. Whenever you want fast
> > interactions, start-up times, etc. on modern paging machines, the
> > working or resident set size is the number you want to minimize.
> > This is why the generational gcs are so good; because they tend
> > to work on a small part of memory and thus limit the number of
> > pages actually in memory.
>
> Or, perhaps more significantly, in cache.

No, for two reasons:

1. The ratio of cache speed to main memory speed might be 50:1
or even 100:1, but the ratio of memory speed to disk latency is more
like 500K:1 or a million to one. Thus the hit on data access when it
comes from or goes to disk is much larger than if it has to be in memory
rather than cache.

2. Cache provides a finer granularity than memory. Cache lines are
on the order of tens of bytes (typically, 8 to 256 bytes). Memory
pages are 4k or 8k bytes, and sometimes much larger. Thus, more obects
fit into a page than a line, which means locality of reference makes a
much bigger difference when reading and writing memory pages than it
does when reading and writing cache lines.

> I suspect this may matter
> more for most real machines now (certainly I almost never run anything
> big enough to drive a machine into paging, and if I did I'd just buy
> more memory, it's very cheap unless you have weirdo laptops or
> something (hmm... I just bought one of those, dang)).

Well, yes, you can load up your machine until it ceases to page,
but for some large applications like CAD programs, that means a couple
of Gb of memory, if the machine will take it.

> > Yet another cost to reference counting is the fact that each object
> > or group of objects being counted must actually have a counter
> > associated with it. This directly increases the memory cost of the
> > object, and coupled with the increased memory traffic you point out,
> > it increases the resident set size.
>
> Can't you do a trick where you assume that almost all objects have
> reference 1 or 0, and use a single bit in the pointer, together with
> some pool of high-refcount objects. Of course those bits aren't free,
> but it might mean that (at the cost of perhaps important type
> information or something) you can do only one fetch rather than 2.

The count, however it's implemented, has to be associated with the
object itself, not pointers to it. Otherwise, how would you know
how many references to the object unless you find all of the pointers?
Sounds like a job for a standard gc, to me.

> (I wonder if the C/C++ steam ref counting systems do this trick by
> bit-twiddling the pointers, that's the sort of thing they'd really
> delight in).
>
> --tim

--

Eugene Zaikonnikov

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Janos Blazi <jbl...@netsurf.de> wrote in message
news:archaism-7vrcss/INN-2.2.1/bab...@broadway.news.is-europe.net...

>
> Eugene Zaikonnikov <vik...@cit.org.by> schrieb in im Newsbeitrag:
> 38208290...@cit.org.by...

[snip]

> > Nobody lectures you. Just note for the future that starting with posts
> > like "Is FOO dying" in a dedicated newsgroup is not a good idea. What
> > other response did you expected? That people would say "Yes, it is"?
> >
>
> With hindsigth it was really naive. On the other hand I was VERY SERIOUSLY
> considering using LISP (in my teaching) and wanted to be informed. And I
> learned so much that maybe it was worth it.
>

www.alu.org already has Lisp advocacy pages. All questions that you raised
are listed there. BTW, it was the first link popped in a search engine when
I queried "lisp", though it was few years ago.

> But I really expected that you would say "No it isn't". Or "Yes it is,
most
> unfortunately: But there are still applications and new implementations
and
> some rather interesting new directions."
>

If you'll restrict yourself to the technical questions then you'll get
reasonable answers. Try to write Lisp, post your questions, and people will
help you.

> Concerning books: It is possible that getting ANY book om some other
> countries is a problem. If lived in such a country (as I did for a long
> time) , I should be content with ANY translation of course.
>

Nevertheless, the book was great and easy to read. You see, it isn't a
Shakespearian poetry, and could be translated without significant
distortion.

> But I wonder: Sometimes it seems to me that Russia has more mental power
> than the whole Western World. Once a well know mathematician told me: "If
I
> want to learn about a new topic, I simply wait until a Russian book on the
> subject is published." Are there not many books from Russian authors about
> computing?
>

You should not overestimate mental power of Russia, we lost the Cold War
after all. Certainly it has advanced fields of science like mathmatics,
theoretical physics, medecine and cosmonautics, but it compensated by lag in
other areas and overall low production culture. In particular, computer
science is a pitiful resemblance of the Western CS, for a number of
historical reasons. Very few programmers here know anything but C/C++ and
Delphi, and it looks like I am the only Lisper in a 500km range.

--
Eugene.

Rahul Jain

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Duane Rettig wrote:

> > Can't you do a trick where you assume that almost all objects have
> > reference 1 or 0, and use a single bit in the pointer, together with
> > some pool of high-refcount objects. Of course those bits aren't free,
> > but it might mean that (at the cost of perhaps important type
> > information or something) you can do only one fetch rather than 2.
>
> The count, however it's implemented, has to be associated with the
> object itself, not pointers to it. Otherwise, how would you know
> how many references to the object unless you find all of the pointers?
> Sounds like a job for a standard gc, to me.

(I'm nowhere near an expert but I figure I might as well try sounding
smart...)
Why would you care if an object has 2 or 20 refs to it? The fact that it
has any is enough information in this situation. Once you've gone down to 0
refs, delete the object. It would probably make sense to have something
like 2, maybe 3 bits for the refcount, since you'd want to allow at least a
couple refs. If the refcount was in the LSBs, the object could be aligned
on an n-word boundary (n = # of refcount bits). For large enough objects,
the alignment wouldn't matter, and would probably be preferable since most
processors are faster loading data with some number of byte-alignment.
There is the issue of the "users" of the pointer not knowing what the
others are doing, so maybe this system would only work for a 0/1 ref
situation anyway... Although in those cases you're basically just working
with a local variable, and that's a trivial case for the compiler to
optimize.
Hmm... now I think I see why Fortran is so weird... Although the compiler
can probably figure out all of that information anyway (IN/OUT).

--
-> -=-=-=-=-=-=-=-=-=- < Rahul -=- Jain > -=-=-=-=-=-=-=-=-=- <-
-> "I never could get the hang of Thursdays." -Douglas N. Adams <-
-> -=-=-=- URL: http://hoohoo.brrsd.k12.nj.us/~rjain/ -=-=-=- <-
-> -=-=-=-=-=- E-mail: mailto:rahul...@usa.net -=-=-=-=-=- <-
Version 9.105.999.1111111111111.23.042
(c)1996-1998, All rights reserved.
Disclaimer available upon request.

Erik Naggum

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

* Janos Blazi

| Sometimes it seems to me that Russia has more mental power than the whole
| Western World.

great! feel free to take your postings to fido7.ru.lisp. (there is no
relcom-hierarchy group that I recognize, but there may be one such, too.)

#:Erik

Lars Lundback

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Duane Rettig wrote:
>
>
> 2. Cache provides a finer granularity than memory. Cache lines are
> on the order of tens of bytes (typically, 8 to 256 bytes). Memory
> pages are 4k or 8k bytes, and sometimes much larger. Thus, more obects
> fit into a page than a line, which means locality of reference makes a
> much bigger difference when reading and writing memory pages than it
> does when reading and writing cache lines.
>

What about trying to stay in the code cache (assuming different caches for code
and data, eg Pentiums)? Inlining of native code ought to crowd the cache earlier
than reusing code fragments. I suppose Franz has analyzed different
implementation solutions extensively? Is any material on this subject publicly
available?

Lars Lundback

Pekka P. Pirinen

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Rahul Jain <ra...@owlnet.rice.edu> writes:
> Duane Rettig wrote:
> > > Can't you do a trick where you assume that almost all objects have
> > > reference 1 or 0, and use a single bit in the pointer, together with

> > > some pool of high-refcount objects. [...]

> >
> > The count, however it's implemented, has to be associated with the
> > object itself, not pointers to it. Otherwise, how would you know
> > how many references to the object unless you find all of the pointers?

But if you know there's only one reference, then you do know where all
of pointers are! It's called a one-bit reference count, see <URL:http
://www.harlequin.com/mm/reference/glossary/o.html#one-bit.reference.co
unt> in the Memory Management Reference for details. It's said to be
quite efficient in suitable applications. Even so, you still want a
tracing GC to clean up all the multiply-referenced objects now and
then.

> (I'm nowhere near an expert but I figure I might as well try

> sounding smart...) [...] It would probably make sense to have

> something like 2, maybe 3 bits for the refcount, since you'd want to
> allow at least a couple refs.

Actually, as Tim Bradshaw suggests, in many applications, you can
assume most objects have only one reference. If there are too many
cases of transiently having two references, there's a interesting
additional technique for accounting for that, called an
ought-to-be-two cache.

> If the refcount was in the LSBs, the object could be aligned on an

> n-word boundary [...] There is the issue of the "users" of the

> pointer not knowing what the others are doing, so maybe this system
> would only work for a 0/1 ref situation anyway...

That's right, it doesn't really work beyond one bit for that reason.

> Although in those cases you're basically just working with a local
> variable, and that's a trivial case for the compiler to optimize.

No, there are all kinds of data structures (e.g., linked lists,
combinator trees) where most objects have only one reference. But
it's true that one-bit refcounting needs some support in the
compiler/interpreter, otherwise it'll always think it has two
references whenever a pointer is written or read.
--
Pekka P. Pirinen Adaptive Memory Management Group, Harlequin Limited
"A man never speaks of himself without losing something. What he
says in his disfavor is always believed but when he commends himself,
he arouses mistrust." - Michel Montaigne

Tim Bradshaw

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

* Duane Rettig wrote:

> Tim Bradshaw <t...@tfeb.org> writes:
> 2. Cache provides a finer granularity than memory. Cache lines are
> on the order of tens of bytes (typically, 8 to 256 bytes). Memory
> pages are 4k or 8k bytes, and sometimes much larger. Thus, more obects
> fit into a page than a line, which means locality of reference makes a
> much bigger difference when reading and writing memory pages than it
> does when reading and writing cache lines.

Good point.

> Well, yes, you can load up your machine until it ceases to page,
> but for some large applications like CAD programs, that means a couple
> of Gb of memory, if the machine will take it.

Not so good a point I think. Memory in the UK is now under a pound a
Mb, so 2Gb of memory is under L2000. That does not equate to very
much time needed to be saved every day to make it worth the
investment. I don't know what ordinary PCs can take nowadays so that
may be a limit, but I'd be kind of distressed (but not altogether
surprised) to find people spending 10s or 100s of K on a package +
time at whatever per day and then not making this relatively small
investment.

Although I don't run really big programs (I'm only a lisp person after
all, I use this small efficient language unlike those C++ people...),
I noticed a really significant difference when I upgraded my machine
from 128 to 640Mb. For me this wasn't paging as such, but filesystem
performance: Unix is quite happy to keep as much filesystem in core as
it has core pretty much, so you get really astonishing performance for
even grotty tasks like large C/C++ compilations -- once you can keep
*all* the include files in memory (and /tmp is a memory filesystem), C
compilers run at almost civilised speeds.

One of the great lessons I learned while being a system person is that
memory and disk are very cheap compared to time, and have been for a
long time, so the right solution in many cases is just buy a *lot*.
Unfortunately management types (with refreshing exceptions) often have
limited rationality and can't equate the saving of time over 3 yrs
with the upfront cost now.

Of course, 64bit systems may change all this because it will again be
possible to deal with directly-accessible data structures much larger
than you can buy memory to hold.

> The count, however it's implemented, has to be associated with the
> object itself, not pointers to it. Otherwise, how would you know
> how many references to the object unless you find all of the pointers?

> Sounds like a job for a standard gc, to me.

Yes, sorry I wasn't thinking...

Back to GC techniques and locality. There's a whole world of people
who use big machines to do things like data mining (in case it's not
obvious I have various friends in this world which is why I tend to
harp on about multiprocessor support and claim that memory is really
cheap). These people are willing to buy enough memory not to page,
and typically deal with their (very large) amounts of data by
streaming it through the machine, so disk latency is not an issue (and
big machines can have more I/O bandwidth than memory bandwidth...).

So these people don't care about paging, and they can get enough disk
bandwidth to keep them happy. They *do* care deeply about cache
performance though. Someone made an impression on me by pointing out
that no one would use red-black trees because they are bad for the
cache, you want to use some btree variant with nodes carefully tuned
to cache-line size.

It does seem to me that GC-bases systems ought to be able to win here
too, so I wonder if anyone has done work on that? Perhaps they don't
because (as you said at the top), the cache-line size is small enough
that the kind of fragmentation they can fix doesn't matter. (What
about NUMA machines, GC should be good for them as they have
macroscopic locality stuff like disks.)

--tim

Robert Monfera

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Rahul Jain wrote:
>
> Duane Rettig wrote:
>
> > > Can't you do a trick where you assume that almost all objects have
> > > reference 1 or 0, and use a single bit in the pointer, together with

> > > some pool of high-refcount objects. Of course those bits aren't free,
> > > but it might mean that (at the cost of perhaps important type
> > > information or something) you can do only one fetch rather than 2.
> >

> > The count, however it's implemented, has to be associated with the
> > object itself, not pointers to it. Otherwise, how would you know
> > how many references to the object unless you find all of the pointers?
> > Sounds like a job for a standard gc, to me.
>

> (I'm nowhere near an expert but I figure I might as well try sounding
> smart...)

> Why would you care if an object has 2 or 20 refs to it?

Let me quote Duane (the sentence you replied to):

> > The count, however it's implemented, has to be associated with the
> > object itself, not pointers to it. Otherwise, how would you know
> > how many references to the object unless you find all of the pointers?

It means that you need to know how many times an object is referred to,
so that decreasing this number by 1 each time a reference dies, you get
to 0 _exactly_ when all the references died, in which case you free the
object and decrease counters on objects this dying object is referring
to.

If you don't have at least this amount of information, then you need to
do a proper GC, i.e., every once in a while you need to scan through
your graph to free memory.

An alternative idea is to do an experiment on paper: can you tell
whether your object is dead or alive (without a GC scan) if you don't
have the exact number of referring objects?

Regards
Robert

Pekka P. Pirinen

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Barry Margolin <bar...@bbnplanet.com> writes:
> The papers in the ancient Smalltalk book I referred to earlier mentioned
> something called "deferred reference counting", which is supposed to
> improve performance.

Yep, although usually not as much as a modern tracing GC.

We just added a description of this technique to the Memory Management
Reference, when we revised all the refcounting material, but it's not
out on the website <URL:http://www.harlequin.com/mm/reference/> yet --
next version will be out some time this month. Here's a preview:

Deferred reference counting reduces the cost of maintaining reference
counts by avoiding adjustments when the reference is stored on the
stack.

On many systems, the majority of pointer stores are made into local
variables, which are kept on the stack. Deferred reference counting
only counts references from other heap objects. This requires compiler
support, but can lead to substantial performance improvements.

Objects cannot be reclaimed as soon as their reference count becomes
zero, because there might still be references to them from the
stack. Such objects are added to a zero count table (ZCT) instead. If
a reference to an object with a count of zero is stored into the heap,
then the object is removed from the ZCT. Periodically the stack is
scanned, and any objects in the ZCT which were not referenced from the
stack are reclaimed.
--
Pekka P. Pirinen, editor, Memory Management Reference

Adaptive Memory Management Group, Harlequin Limited

Only fools learn by their experience; smart people use the experience
of others. - Bismarck

Robert Monfera

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Tim Bradshaw wrote to Duane:

> > Well, yes, you can load up your machine until it ceases to page,
> > but for some large applications like CAD programs, that means a couple
> > of Gb of memory, if the machine will take it.
>

> Memory in the UK is now under a pound a
> Mb, so 2Gb of memory is under L2000. That does not equate to very

...
> Back to GC techniques and locality. There's a whole world of people
> who use big machines to do things like data mining (in case it's not
> obvious I have various friends in this world which is why I tend to
> harp on about multiprocessor support and claim that memory is really
> cheap). These people are willing to buy enough memory not to page,
> and typically deal with their (very large) amounts of data by
> streaming it through the machine, so disk latency is not an issue (and
> big machines can have more I/O bandwidth than memory bandwidth...).

Incidentally, SAP has purchased a share of ILOG for the exact reason of
bypassing databases for certain kind of data manipulations, and they are
working on a server that uploads input from a database (another R/3
server), does logistics optimizations in multiple G's of memory and
throws back results into a database on another server.

There were analysts who suggested it's a threat to DB manufacturers
(e.g., all SAP products have traditionally relied on a relational
database), as it may not be necessary to focus on disk access.

By the way, has someone got comparative experiences or ideas about
performance, considering two scenarios (both assuming a large memory
capable of holding everything):

A. Lisp communicates with a database server to perform queries and
transactions (big DB memory slice)

B. Lisp sucks up all data into hash tables and arrays, runs queries and
transactions, and sometimes saves a snapshot (as a raw image or as
relational database records)

(Of course, I'm not debating the need for a compact image, which seems
to be an advantage of ACL, as it allows a wider (e.g., PC or laptop)
distribution of applications. Efforts to make the image compact are
paying off, because Lisp vendors work on it, not developers creating
custom applications, reducing the need to micro-optimize or infect the
design. It also allows Lisp to sneak into non-traditional environments,
without having to start with a hefty memory upgrade.)

Best regards
Robert

Jason Trenouth

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

On Thu, 04 Nov 1999 10:41:55 -0500, Robert Monfera <mon...@fisec.com> wrote:

> Incidentally, SAP has purchased a share of ILOG for the exact reason of
> bypassing databases for certain kind of data manipulations, and they are
> working on a server that uploads input from a database (another R/3
> server), does logistics optimizations in multiple G's of memory and
> throws back results into a database on another server.
>
> There were analysts who suggested it's a threat to DB manufacturers
> (e.g., all SAP products have traditionally relied on a relational
> database), as it may not be necessary to focus on disk access.
>
> By the way, has someone got comparative experiences or ideas about
> performance, considering two scenarios (both assuming a large memory
> capable of holding everything):

NB You don't have to by-pass databases to get data in-memory. E.g. the
TimesTen folks have an in-memory RDBMS:

http://www.timesten.com/products/timesten/index.html

I presume the company/product name is a hint at the comparative performance.
:-j

__Jason

William Deakin

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Robert Monfera wrote:

> By the way, has someone got comparative experiences or ideas about
> performance

No. Well not really not with lisp anyway. But, when you looking at stuff like
this I would suggest that you think about:

Is this a batch or on-line processing system you want to build? If an on-line
processing system how fast a response time do you want? This is determined by
what are you going to do with the results. For example: dump them via a
postscript output module to film/CTP or have a sexy HTTP gui that connects
across the internet.

How thick/thin a client/server do you want? All processing carried out on the
server? all processing carried out on the client uber-pc and a very
lightweight processes that passes sql/results to the database? Or something in
between.

What logs, tracing or workflow do you need? This affects how you back stuff up
when the system dies.
Also the ability to keep logs of what happened yesterday is good (or
whenever), particularly if somebody types in a load of tosh that causes some
form of data corruption.

I'm sure you know about this already. I would be interested if come up with an
answer to this,

Best Regards,

:) will

Malcy

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

"Eugene Zaikonnikov" <vik...@removeme.cit.org.by> writes:

> You should not overestimate mental power of Russia, we lost the Cold War
> after all. Certainly it has advanced fields of science like mathmatics,
> theoretical physics, medecine and cosmonautics, but it compensated by lag in
> other areas and overall low production culture. In particular, computer
> science is a pitiful resemblance of the Western CS, for a number of
> historical reasons. Very few programmers here know anything but C/C++ and
> Delphi, and it looks like I am the only Lisper in a 500km range.

Erm.. am I in 500km range(msk)? Ja ne lisper, ja tol'ko uchus' >B)

--
mailto:mg...@chat.ru

Larry Hunter

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Robert Monfera asks:

Has someone got comparative experiences or ideas about performance,

considering two scenarios (both assuming a large memory capable of holding
everything):

A. Lisp communicates with a database server to perform queries and

transactions (big DB memory slice)

B. Lisp sucks up all data into hash tables and arrays, runs queries and
transactions, and sometimes saves a snapshot (as a raw image or as
relational database records)

I have some experience with this sort of thing, running complex machine
learing code over about 100k examples, each described with a roughly 500
dimensional vector (of floats, usually doubles).

I tried both PLOB, a persistent object store for Lisp, and sucking
everything into lisp (hash tables of vectors) and dumping an
image. Performance was roughly an order of magnitude better for native lisp
structures than going back and forth to an even pretty well tuned database.
Not only was it faster, but I had the advantage of being able to trivially
tune the "retreivals" to be (nearly) optimal for my application.

Now much as I love PLOB, it hasn't had nearly the engineering hours of
Oracle or even ObjectStore, so it may not be a really fair comparison. But,
given that was mostly just doing non-sequential data access, nothing very
database-y (like transactions), I personally got high performance very
easily by just reading everything in once and saving an image. Although the
initial read took hours, one I had an image, restarting that image and
getting access to the data was effectively instantaneous.

Larry

--
Lawrence Hunter, Ph.D. Chief, Molecular Statistics and Bioinformatics
National Cancer Institute email: lhu...@nih.gov
Federal Building, Room 3C06 phone: +1 (301) 402-0389
7550 Wisconsin Ave. fax: +1 (301) 480-0223
Bethesda, MD 20892 USA

Christopher R. Barry

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Larry Hunter <lhu...@nih.gov> writes:

> Now much as I love PLOB, it hasn't had nearly the engineering hours of
> Oracle or even ObjectStore, so it may not be a really fair comparison.

How much better did ObjectStore do than PLOB for your application?

Christopher

Janos Blazi

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Eugene Zaikonnikov <vik...@cit.org.by> schrieb in im Newsbeitrag:
38208290...@cit.org.by...

> Well Janos, usually it's a good rule not to intrude into mutual attacks
> of others. Otherwise you'll be unavoidably flamed from at least one side
> for that.
> I think most readers suppose that you are adult person and can stand for
> yourself. Do *you* always insert your words when someone flamed?
>
Well, this is a good question. To tell you the truth, if something like this
happens, I do not always interfere as I am a coward. But I know I would have
to.

> > Is LISP the only value in your life? And good is what is good for LISP
and
> > bad is what is bad for LISP, whatever the price may be.
> >
> > So please, dear Paul and dear others, stop lecturing me about what is
good
> > and what is evil.
> >

> Nobody lectures you. Just note for the future that starting with posts
> like "Is FOO dying" in a dedicated newsgroup is not a good idea. What
> other response did you expected? That people would say "Yes, it is"?
>

With hindsigth it was really naive. On the other hand I was VERY SERIOUSLY
considering using LISP (in my teaching) and wanted to be informed. And I
learned so much that maybe it was worth it.

But I really expected that you would say "No it isn't". Or "Yes it is, most

unfortunately: But there are still applications and new implementations and
some rather interesting new directions."

Or: "Well: it is not really dying, it is only being transformed into a more

dilute form". (That was a joke.)

Concerning books: It is possible that getting ANY book om some other

countries is a problem. If lived in such a country (as I did for a long
time) , I should be content with ANY translation of course.

But I wonder: Sometimes it seems to me that Russia has more mental power

than the whole Western World. Once a well know mathematician told me: "If I
want to learn about a new topic, I simply wait until a Russian book on the
subject is published." Are there not many books from Russian authors about
computing?

Janos Blazi

> --
> Eugene.

Gareth McCaughan

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Janos Blazi wrote:

[on asking "Is Lisp dead?"]

> With hindsigth it was really naive. On the other hand I was VERY SERIOUSLY
> considering using LISP (in my teaching) and wanted to be informed. And I
> learned so much that maybe it was worth it.
>
> But I really expected that you would say "No it isn't". Or "Yes it is, most
> unfortunately: But there are still applications and new implementations and
> some rather interesting new directions."

Lots of people *did* give precisely that kind of answer, no?

--
Gareth McCaughan Gareth.M...@pobox.com
sig under construction

Gareth McCaughan

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Bijan Parsia wrote:

> (I still bet that an often gced small pool is probably a better idea, in
> the long run. I think Python holds onto reference counting to simplify
> integration with C modules.)

Also for guaranteed finalisation, I believe.

Duane Rettig

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

pe...@harlequin.co.uk (Pekka P. Pirinen) writes:

> Rahul Jain <ra...@owlnet.rice.edu> writes:
> > Duane Rettig wrote:
> > > > Can't you do a trick where you assume that almost all objects have
> > > > reference 1 or 0, and use a single bit in the pointer, together with

> > > > some pool of high-refcount objects. [...]

> > >
> > > The count, however it's implemented, has to be associated with the
> > > object itself, not pointers to it. Otherwise, how would you know
> > > how many references to the object unless you find all of the pointers?
>

> But if you know there's only one reference, then you do know where all
> of pointers are! It's called a one-bit reference count, see <URL:http
> ://www.harlequin.com/mm/reference/glossary/o.html#one-bit.reference.co
> unt> in the Memory Management Reference for details. It's said to be
> quite efficient in suitable applications. Even so, you still want a
> tracing GC to clean up all the multiply-referenced objects now and
> then.

Where is it that you are doing this? Was it in your ML product?
I assume it is not in your Lisp product, because I didn't see an
extra mask operation to mask out the MRB when compiling open
(cdr (cdr ...)) operations.

> > (I'm nowhere near an expert but I figure I might as well try

> > sounding smart...) [...] It would probably make sense to have
> > something like 2, maybe 3 bits for the refcount, since you'd want to
> > allow at least a couple refs.
>
> Actually, as Tim Bradshaw suggests, in many applications, you can
> assume most objects have only one reference. If there are too many
> cases of transiently having two references, there's a interesting
> additional technique for accounting for that, called an
> ought-to-be-two cache.

I can imagine that there are many single-references to _cons_ cells
(in fact, it is this observation that made cdr-coding pracical on
lisp machines), but most objects other than conse cells tend to
have more than one reference, at least in CL. Numbers may be an
exception, but we all tend to handle them with unboxing techniques.

Tim Bradshaw

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

* Robert Monfera wrote:

> By the way, has someone got comparative experiences or ideas about

> performance, considering two scenarios (both assuming a large memory
> capable of holding everything):

> A. Lisp communicates with a database server to perform queries and
> transactions (big DB memory slice)

> B. Lisp sucks up all data into hash tables and arrays, runs queries and
> transactions, and sometimes saves a snapshot (as a raw image or as
> relational database records)

I don't know about B, but people who do a lot of bashing on data often
suck it out of the database onece into flat files, and then bash the
flat files. Databases tend to be optimised for things like
transaction performance and safety, not `I want every record, and I
want it at disk bandwidth'. A is likely bad if you are mashing a lot
of data unless it's a special database optimised for this.

--tim

William Deakin

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Tim Bradshaw wrote:

> ...Databases tend to be optimised for things like transaction performance
> and safety, not `I want every record, and I want it at disk bandwidth'...

This is impacted by on-line or batch operations, are you are talking about
on-line `real-time' or off-line batch work.

The answer to the disk-bandwidth problem is obvious (sic), keep the whole of
your database in memory ;) For example: The Hertz Rent-a-Car on-line
transactions DB is held within memory (about 4.5G) and is backed up
dynamically via fibre onto a specialise bit of Digital Kit. I also understand
the database is also mirrored using fibre, so if one box goes down the second
one can be up within milliseconds.

> A is likely bad if you are mashing a lot of data unless it's a special
> database optimised for this.

It's not trivial, but it is by no means difficult to get a major database
product (e.g. Oracle or Sybase) to do this. Even without `major'
optimisation.

Best Regards,

:) will

Tim Bradshaw

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

* William Deakin wrote:

> The answer to the disk-bandwidth problem is obvious (sic), keep the whole of
> your database in memory ;)

Disk bandwidth (which is plentiful on big machines) may not be as much
an issue as `database bandwidth' is. Even then, this is only viable
if your database will fit in main memory. This is not an option for
people with terabatyes of data they want to mine. 4Gb of data is just
a microscopic amount for these people.

--tim

Pekka P. Pirinen

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Duane Rettig <du...@franz.com> writes:
> pe...@harlequin.co.uk (Pekka P. Pirinen) writes:

> > It's called a one-bit reference count, see <URL:http://www.harlequin.c

> > om/mm/reference/glossary/o.html#one-bit.reference.count> in the Memory
> > Management Reference for details. It's said to be quite efficient in

> > suitable applications. [...]

>
> Where is it that you are doing this? Was it in your ML product?

No, MLWorks has a specialized copying GC (ML has extremely high
allocation rates, you see). While one-bit refcounting is the only
variant that I could imagine using for a language outside distributed
systems, we don't use it in anything at the moment, we (the memory
management group) just make it our business to know the all the
techniques.

> > Actually, as Tim Bradshaw suggests, in many applications, you can

> > assume most objects have only one reference. [...]

>
> I can imagine that there are many single-references to _cons_ cells
> (in fact, it is this observation that made cdr-coding pracical on
> lisp machines), but most objects other than conse cells tend to
> have more than one reference, at least in CL. Numbers may be an
> exception, but we all tend to handle them with unboxing techniques.

Exactly. Even though that's usually _most_ objects, it's still not
enough to make it suitable for a generic CL solution. One might have
particular _applications_, where the proportions are different. On
the language side, I understand one-bit reference counting has been
used in combinator reduction engines with success. David Wise tried a
new improvement in Scheme just recently (reported at ISMM'98), but I
suspect that was just a convenient testbed.
--
Pekka P. Pirinen

Adaptive Memory Management Group, Harlequin Limited

Every time history repeats itself, the price goes up.

William Deakin

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Tim Bradshaw wrote:

> Disk bandwidth ... may not be as much an issue as `database bandwidth' is.

Yup. My answer is probably coloured by my lack of imagination. (Multiple processor
DB's are pretty scalable tho')

> Even then, this is only viable if your database will fit in main memory. This is
> not an option for people with terabatyes of data they want to mine.

That is true. I hadn't considered the data-mining thing at all. I was just
thinking about the large (1,000+ person) on-line databases.

> 4Gb of data is just a microscopic amount for these people.

Agreed. Just out of interest, how much of the terabyte of data is useable? If
you will indulge me: to be rather obvious (and stupid) 1 T = 1024 G = 1 048 576 M
= 1 073 741 824 K. Allowing a character a byte, 80 characters a line and 24 lines
a page, this corresponds to about 572 662 306 pages of text. I know places like
CERN churn this level of data out in a matter of minutes but out of that most of
that data is noise...

Where I work we have about 4T of graphics stored around and about the place, but
to administer the whole shebang, plus a couple hundred of users takes an oracle
database that quite happily sits in 500M of memory.

However, this is a classic example of me misunderstanding what was going on.
Whereof one does not know, thereof one should not speak. One of these days I might
even take notice.

Best Regards,

:) will

Jan Vroonhof

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

pe...@harlequin.co.uk (Pekka P. Pirinen) writes:

> then the object is removed from the ZCT. Periodically the stack is
> scanned, and any objects in the ZCT which were not referenced from the
> stack are reclaimed.

So this could be called "conservative reference counting"? Or did they
have an exact way to scan the stack?

Jan

Larry Hunter

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

How much better did ObjectStore do than PLOB for your application?

I haven't tested ObjectStore (I didn't want to pay for it), so I was just
speculating that, given all the extra engineering effort, it might be
faster.

Duane Rettig

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Lars Lundback <era...@eralslk.ericsson.se> writes:

> Duane Rettig wrote:
> >
> >
> > 2. Cache provides a finer granularity than memory. Cache lines are
> > on the order of tens of bytes (typically, 8 to 256 bytes). Memory
> > pages are 4k or 8k bytes, and sometimes much larger. Thus, more obects
> > fit into a page than a line, which means locality of reference makes a
> > much bigger difference when reading and writing memory pages than it
> > does when reading and writing cache lines.
> >
>

> What about trying to stay in the code cache (assuming different caches for code
> and data, eg Pentiums)? Inlining of native code ought to crowd the cache earlier
> than reusing code fragments. I suppose Franz has analyzed different
> implementation solutions extensively? Is any material on this subject publicly
> available?

The only material that is public is available indirectly, in the
resultant user interfaces to our product. Unfortunately you have
to dig a little and make some intelligent guesses to see what it is
you're getting. Some of the things you may or may not be able to
glean:

1. The symbol trampoline (a short coulpe-of-instructions code segment
that completes a lisp-to-lisp function call) is very fast, because it
is always in the instruction cache. Its presence also allows lisp code
to be smaller, thus helping all cache models.

2. Code vectors (which are separate from functions) can be manipulated in
lisp heap just like any other data, but due to the way we compile them,
there is a high incidence of identical bit patterns within any given
architecture [try disassembling (defun foo (x) (bar x)) and compare
the bits with dissassembled (defun bas (y) (bam y)) ]. They can also be
placed into "pure-space" which we create and label a ".pll" file (for
"pure lisp library" for all architectures. This file is mapped
read-only/shared, so that multiple lisp processes can share it. But
besides this, there is always only one copy of a codevector in the .pll,
and so in the example above #'foo and #'bar might share the same
codevector, if it exists in the pll file. It is the extent of the sharing
that may be surprising; if you read up on how to make your own .pll file,
it involves a program called "cvdcvt" which will tell you how many code
vectors it has collected and how many duplicates it found.

3. At a lower level, we look at basic-block placement, branch tensioning,
etc, to minimize not only the amount of instruction cache used, but also
the amount of pipeline disturbances that might occur. If you use
disassemble on a non-trivial function, you'll find that the "return"
instruction is somewhere in the middle of the code, and you can usually
follow a straight line from the beginning of the function to that
return instruction. There are two conflicting ideas here; the need to
keep the line of code in the instruction cache, which might suggest
short jumps within smal segments of code, and the need to keep the
instruction pipeline filled, which suggests never jumping if possible.
We thus try to order code so as to be as efficient as possible for
the best and hopefully most probable case, and jump out-of-line for
those cases where it will be expensive to perform the operation anyway.

As an example, consider the expression (+ x y). In the absence of
declarations, we optimize it for fixnums, doing a fixnum test and an add
fully inline, with no jumps, however, an out-of-line jump is made to code
which calls the 2-operand + function (excl::+_2op) and it doesn't matter
much that the jump was taken, becasue the fact that +_2op is a function
call will be causing a larger amount of cache line disturbance anyway.
In the code below, the code at instructions 9, 11, and 14 are testing the
operands (in eax and edx) for fixnumness, and jumping out-of-line if the
test fails. The actual add is done at instruction 18, and instruction 20
jumps out of line if the fixnums overflowed into a bignum result. Quite
a few instructions for a fixnum add, true, but the pipeline is kept full
because there are no jumps, and it is thus a fairly fast operation.
(Of course, if you knew you always had fixnums, you could make some
declarations and turn 8 instructions into 1).

USER(5): (defun foo (x y) (declare (optimize speed (safety 0))) (+ x y))
FOO
USER(6): (disassemble 'foo)
;; disassembly of #<Function (:ANONYMOUS-LAMBDA 5) @ #x204f55f2>
;; formals: X Y

;; code start: #x204f55c4:
0: 55 pushl ebp
1: 8b ec movl ebp,esp
3: 56 pushl esi
4: 83 ec 24 subl esp,$36
7: 8b d8 movl ebx,eax
9: 0b da orl ebx,edx
11: f6 c3 03 testb bl,$3
14: 75 0e jnz 30
16: 8b d8 movl ebx,eax
18: 03 da addl ebx,edx
20: 70 08 jo 30
22: 8b c3 movl eax,ebx
24: f8 clc
25: c9 leave
26: 8b 75 fc movl esi,[ebp-4]
29: c3 ret
30: 8b 5f 9f movl ebx,[edi-97] ; EXCL::+_2OP
33: ff 57 27 call *[edi+39] ; SYS::TRAMP-TWO
36: eb f3 jmp 25
USER(7):

Shiv

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Duane Rettig <du...@franz.com> writes:

>
> 3. At a lower level, we look at basic-block placement, branch tensioning,
> etc, to minimize not only the amount of instruction cache used, but also
> the amount of pipeline disturbances that might occur. If you use

> ....

This is going off-topic now and is not just directed at Duane/ACL. I
believe some versions of the (ultra?)SPARC chip are 4-way superscalar
and some are not. Do any of the current crop of CL compilers exploit
this difference to generate faster code? Anyone planning to do so in
the near future?

--shiv--

Janos Blazi

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Hey Erik!

I'd love to but only if you participate too.

J.B., in love

Erik Naggum <er...@naggum.no> schrieb in im Newsbeitrag:
31507086...@naggum.no...
> * Janos Blazi

> | Sometimes it seems to me that Russia has more mental power than the
whole
> | Western World.
>

Rainer Joswig

unread,

Nov 6, 1999, 3:00:00 AM11/6/99

to

In article <ey3r9i5...@lostwithiel.tfeb.org>, Tim Bradshaw <t...@tfeb.org> wrote:

> * William Deakin wrote:
>
> > The answer to the disk-bandwidth problem is obvious (sic), keep the whole of
> > your database in memory ;)
>

> Disk bandwidth (which is plentiful on big machines) may not be as much
> an issue as `database bandwidth' is. Even then, this is only viable

> if your database will fit in main memory. This is not an option for

> people with terabatyes of data they want to mine. 4Gb of data is just

> a microscopic amount for these people.

If I remember right, it was one reason the Ivory processor had
a 36 Bit address room - the 32 bit adress room was just too
small. It seemed that certain applications filled up 4 GB of data
on a Lispm easily. I also think (I've never seen this) that
there are memory boards for the VME that allowed a
Lispm to have *lots* of RAM. Princeton Capital (once owner
of Symbolics) seemed to have one of those applications for the
financial market. Nowadays one could use a Lisp implementation
running on a 64 bit architecture with a 64 Bit OS
(DEC Alpha and their Unix, Solaris 7 or 8 on UltraSPARC, ...).

Paolo Amoroso

unread,

Nov 6, 1999, 3:00:00 AM11/6/99

to

On Thu, 4 Nov 1999 08:33:42 +0100, "Janos Blazi" <jbl...@netsurf.de> wrote:

> Eugene Zaikonnikov <vik...@cit.org.by> schrieb in im Newsbeitrag:
> 38208290...@cit.org.by...

[...]

> > Nobody lectures you. Just note for the future that starting with posts
> > like "Is FOO dying" in a dedicated newsgroup is not a good idea. What
> > other response did you expected? That people would say "Yes, it is"?
> >
>

> With hindsigth it was really naive. On the other hand I was VERY SERIOUSLY

[...]

> But I really expected that you would say "No it isn't". Or "Yes it is, most
> unfortunately: But there are still applications and new implementations and
> some rather interesting new directions."

Just providing such a simplistic answer to a post about the supposed
imminent death of Lisp may not be enough. The reason is suggested by a
recent article by P.J. Plauger, a leading C/C++ authority ("Frequently
Answered Questions", "Standard C/C++" column, C/C++ Users Journal, vol. 17,
n. 11, November 1999, page 10).

Plauger starts by recognizing the importance of newsgroups as technical
forums. He states that the number of newsgroup participants is a tiny
minority of the programming community and then adds:

"But that tiny minority seems to have influence on all out of proportion
to its size. It is nonpareil at establishing factoids (or ``memes'' in
the current jargon) that survive and propagate, almost independently of
any evidence to support their validity."
"So like it or not, and I often don't, I track half a dozen newsgroups."

This shows the problem with such posts: factoids--e.g. the supposed
imminent death of Lisp--that propagate regardless of any supporting
evidence. Besides, experts such as Plauger are a resource for the
programming communities they belong to. Those factoids distract experts and
reduce their contributions to the community.

Paolo
--
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/

Janos Blazi

unread,

Nov 6, 1999, 3:00:00 AM11/6/99

to

Hallo Paolo!

(1)
I do not understand the word "factoid". Is it an opinion that pretends to be
a fact though it is not?

(2)
I have not read that article but the sentences you quoted say simply that
newsgroups are influential. The author may mean or hint that they are too
influential. But that is all to judge by the few sentences you quoted.
Also the author does noe present any evidence for his thesis, again to judge
from the few lines you quoted. So he may be right and he may be wrong. It is
probably very important, which topics are dealt with in the newsgroup.

(3)
That this "distracts" the experts is certainly true in a way. Instead of
dealing with purely technical questions they have to deal with a political
question if you like this expression. But then again they do good work as
they prove that the "factoid" is not true. Nowadays the state of LISP is
unclear to many people, books on LISP are not well sold while books on
VISUAL BASIC and all flavours of C abound. So it must be very important from
the point of view of the LISP community to make clear that LISP is ok.

(4)
If another newcomer was watching our debate: Do you believe he concluded
that Lisp WAS dying? No. He saw that some guy asked about it or even put
forward the thesis and that his thesis proved wrong.

(5)
And do you really believe that dealing with technical questions is the most
important thing? Lisp has been there for forty years and gave good answers
to programming challenges at a time when the en vogue languages were only
more or less comfortable assembly languages. So if Lisp has not become as
popular as you would like it and as even I would like it, then there must be
political reasons for that and not technical reasons. You must admit that
Lisp has always been superior to C if we do not look at raw speed and C's
utmost simplicity. (as it was basically an assembly language.) This utmost
simplicity was lost when C++ entered the scene but not much technical
excellence was won on the other hand (at least I am very skeptical). I think
even Knuth preferred C to C++. And raw speed is becoming less important at
the moment... So why shouldn't have Lisp a new chance?

J.B.

Paolo Amoroso <amo...@mclink.it> schrieb in im Newsbeitrag:
3826225...@news.mclink.it...

Paolo Amoroso

unread,

Nov 7, 1999, 3:00:00 AM11/7/99

to

On Sat, 6 Nov 1999 20:10:19 +0100, "Janos Blazi" <jbl...@netsurf.de> wrote:

> I do not understand the word "factoid". Is it an opinion that pretends to be
> a fact though it is not?

I think so. Some modern dictionaries may include the word "meme".

> I have not read that article but the sentences you quoted say simply that
> newsgroups are influential. The author may mean or hint that they are too
> influential. But that is all to judge by the few sentences you quoted.
> Also the author does noe present any evidence for his thesis, again to judge
> from the few lines you quoted. So he may be right and he may be wrong. It is
> probably very important, which topics are dealt with in the newsgroup.

Plauger's view about newsgroups looks like an opinion. The rest of the
article deals, not surprisingly, with C++ issues. Maybe the author doesn't
present any evidence because this is not the main point of that column's
installment. So deciding whether he's right or wrong is up to the reader. I
mentioned those sentences because they come from a well known expert and
suggest why it may not be enough to provide simplistic responses to certain
posts.

The rest of the article doesn't add any useful information on Plauger's
view of newsgroups. Of course I couldn't include all of it because of
copyright issues. If you are interested you may check your department's
library. I have also seen CUJ sold at several newsstands in Milan, Italy.
So maybe it's rather common.

> And do you really believe that dealing with technical questions is the most
> important thing? Lisp has been there for forty years and gave good answers

Issues other than technical ones--e.g. political, historical, business,
etc.--are often discussed in comp.lang.lisp, and I am personally interested
in them.

Frode Vatvedt Fjeld

unread,

Nov 7, 1999, 3:00:00 AM11/7/99

to

amo...@mclink.it (Paolo Amoroso) writes:

> Paolo
> --
> EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
> http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/

I haven't ever been able to contact this server, and I've been trying
over a period of about a month now..?

--
Frode Vatvedt Fjeld

Paolo Amoroso

unread,

Nov 8, 1999, 3:00:00 AM11/8/99

to

On 07 Nov 1999 17:48:00 +0100, Frode Vatvedt Fjeld <fro...@acm.org> wrote:

> amo...@mclink.it (Paolo Amoroso) writes:
[...]

> > http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/
>
> I haven't ever been able to contact this server, and I've been trying
> over a period of about a month now..?

Unfortunately, cvs2.cons.org is currently down because of a hardware
problem that also prevents access to the CMU CL, CLISP and ILISP main
sites. I don't know when the machine will be available again.

Lars Lundback

unread,

Nov 10, 1999, 3:00:00 AM11/10/99

to

Duane Rettig wrote:
>
>
> 1. The symbol trampoline (a short coulpe-of-instructions code segment
> that completes a lisp-to-lisp function call) is very fast, because it
> is always in the instruction cache. Its presence also allows lisp code
> to be smaller, thus helping all cache models.
>

> 2. Code vectors .... It is the extent of the sharing
> that may be surprising; ...

Duane,

Thanks, you always respond accurately and to the point. I had been pondering on
a small scheme (pardon) for running code fragments, when your previous post
touched memory caching. And insights into the internals of a Lisp system adds
flavor (hrrm, pardon again) to the language discussions.

Regards, Lars