Question from a non-user: Garbage Collection

Bennu Strelitzia

unread,

Dec 16, 2009, 9:31:42 AM12/16/09

to

I would appreciate any pointers you may have on a few issues which I
will make separate postings abput. Sorry if they are obvious or
otherwise-unsuitable questions, but I have spent some time reading
trying to answer them and have not been able to do so.

What I would wish for in garbage collection in Lisp:

Is there a flavor of lisp that due to immutability of structures makes
referential cycles impossible, eliminating a requirement for full-blown GC?

i.e. reference counting could do the trick for an efficient minimal
implementation if there were no referential cycles, I hate the real
weight of every full-blown GC I have ever seen in a language in spite of
all the "wisdom" that claims that it is not a problem. I love nothing
better than a simple C program that can run very fast start to finish in
a few K of space, and dislike how impossible most GC schemes seem to
make this, in my experience.

I also happen to really like the model that every mutation to a
structure creates a new structure sharing applicable read-only parts of
the old structure, making many things inherently more concurrency-safe,
so it seems like the two things would go hand in hand, because if each
mutation of a structure creates a new structure, then you have no
opportunity for a reference loop because any structure can only refer to
objects older than itself.

I have read that Clojure likes this sort of immutability in collections,
but unfortunately it is stuck with the Java garbage collector. Is this
something another Lisp might address without the fatness of a JVM or
other full-blown GC?

Bennu

Rahul Jain

unread,

Dec 16, 2009, 11:58:33 AM12/16/09

to

Bennu Strelitzia <bennu.st...@gmail.com> writes:

> Is there a flavor of lisp that due to immutability of structures makes
> referential cycles impossible, eliminating a requirement for full-blown GC?

Look at haskell. It doesn't have lisp syntax, tho, and the way it deals
with the fact that the real world has state makes it a bit arcane for
writing complex applications.

> i.e. reference counting could do the trick for an efficient minimal
> implementation if there were no referential cycles, I hate the real
> weight of every full-blown GC I have ever seen in a language in spite of
> all the "wisdom" that claims that it is not a problem. I love nothing
> better than a simple C program that can run very fast start to finish in
> a few K of space, and dislike how impossible most GC schemes seem to
> make this, in my experience.

Refcounting uses more space and takes more CPU time to do in a purely
functional language. Garbage collection does not take time, it is
collection of non-garbage. In a language without side-effects, you will
generate a lot of garbage, so you want to optimize for the garbage case.
This means not using refcounting.

--
Rahul Jain
rj...@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist

Duane Rettig

unread,

Dec 16, 2009, 12:36:53 PM12/16/09

to

On Dec 16, 6:31 am, Bennu Strelitzia <bennu.strelit...@gmail.com>
wrote:

> I would appreciate any pointers you may have on a few issues which I
> will make separate postings abput. Sorry if they are obvious or
> otherwise-unsuitable questions, but I have spent some time reading
> trying to answer them and have not been able to do so.

Likely because those who use modern Lisps simply don't find a need to
discuss it, because it is seldom a problem, and when it is it is
something to discuss with their Lisp vendor rather than publicly.
However, it is a perfectly fine question, so I'll attempt to answer
it.

> What I would wish for in garbage collection in Lisp:
>
> Is there a flavor of lisp that due to immutability of structures makes
> referential cycles impossible, eliminating a requirement for full-blown GC?

When you say "full-blown GC", it makes me think that you are likely
unaware of recent (within the last 30 years) developments in garbage-
collection. Depending on what you mean by "full-blown", generational
garbage-collectors have been the norm for many years, now, and almost
never have to work on the whole heap. The theory is that most data
are either short-lived or long-lived, and so if the gc works mostly on
the long-lived data, it can be most efficient.

> i.e. reference counting could do the trick for an efficient minimal
> implementation if there were no referential cycles, I hate the real
> weight of every full-blown GC I have ever seen in a language in spite of
> all the "wisdom" that claims that it is not a problem. I love nothing
> better than a simple C program that can run very fast start to finish in
> a few K of space, and dislike how impossible most GC schemes seem to
> make this, in my experience.

A C program can indeed run very fast with very little space, and it
will do very little (this is not a pejorative statement - it may be
that the task only requires a little bit of work). Lisps can do this
as well, and most lisps provide the ability to move aside the gc
activity in favor of all-out speed for the task. In practice, this
does not occur so often, because Lisp users like to have their lisps
available for long periods of time for many different tasks, and
because the GCs on these Lisps really aren't a problem (because they
tend to do only the work that's needed) they tend to keep their Lisps
running for longer periods of time, and put up with the percent or two
of gc time.

But back to your C analogy: once you start getting into larger tasks,
data driven and using malloc/free, you are suddenly up against the
same wall as every other language: "how do I manage my heap space?"
and C doesn't do it very well. First, since malloc/free are manually
managed, there is a lot of potential for user error, and many C
program bugs are seen due to use of already-deallocated memory (there
are even whole systems like Purify geared to precisely this problem).
Secondly, your version of malloc/free, or its current
parameterization, will determine whether it runs slowly or wastes
memory. If it set to use a "best fit" algorithm, it will have to take
malloc time to search for the best fit for the request, and it will
also take time, either at malloc time or at free time, to coalesce
multiple free blocks. At the other end of the scale, a bucket-style
malloc might have powers-of-two sized blocks given to the user, even
if the request was just larger than the next power-of-two smaller.
There is therefore a huge potential for space waste.

> I also happen to really like the model that every mutation to a
> structure creates a new structure sharing applicable read-only parts of
> the old structure, making many things inherently more concurrency-safe,
> so it seems like the two things would go hand in hand, because if each
> mutation of a structure creates a new structure, then you have no
> opportunity for a reference loop because any structure can only refer to
> objects older than itself.

STM is indeed a powerful tool, but it is a hammer that forces you to
shape your nails like STM nails. If you can do that, then great; your
job is done if you find an STM system (Clojure is just such a system,
as you mention below). But if your problem does not fit as easily
into the STM model, then you have to deal with that fact and use tools
that solve your problem. (note that Clojure may also have these
tools, but since I am not a Clojure advocate, I'll let someone who is
put in any plugs for it).

> I have read that Clojure likes this sort of immutability in collections,
> but unfortunately it is stuck with the Java garbage collector. Is this
> something another Lisp might address without the fatness of a JVM or
> other full-blown GC?

Many other Lisps take the following philosophy:

1. JVM is here to stay, and thus we must be able to work with it. We
do this via inter-language communication techniques, including in some
cases generating Java byte codes, but in other cases simply using RPC
style communication with JVM.

2. JVM is not everything, and we can generally get much more
efficient code generation and heap management by taking on the task
ourselves.

As for concurrency, most of the Lisp vendors are working on or have
concurrency models, with or without STM, based on the needs of their
user base. You should do some shopping, and find the one that suits
your needs. You haven't stated any needs in your post here, so it
will be up to you to decide whether a particular Lisp suits your needs
or not. As for the fear you've expressed that the Lisp GCs would be
"full-blown", implying that they are inefficient and/or will cause
noticeable pauses in your program's operation, I say try some for
yourself, and become pleasantly surprised at how little most Lispers
even think about their Lisp's garbage-collector.

Duane

Kaz Kylheku

unread,

Dec 16, 2009, 1:04:14 PM12/16/09

to

On 2009-12-16, Bennu Strelitzia <bennu.st...@gmail.com> wrote:
> i.e. reference counting could do the trick for an efficient minimal
> implementation if there were no referential cycles, I hate the real

Reference counting is not efficient. Except in some highly contrived
benchmarks that don't reflect real-world use, it is slower than garbage
collection. Even simple mark-and-sweep can beat reference counting in
realistic situations.

Refcounting does not eliminate ``pauses'' from the program execution.
If you drop the last reference on an object which is itself the last
reference a large number of other objects, this triggers a cascade of
refcounts dropping to zero, which takes time to complete.

Refcounting can't be deferred, so even a short-lived program whose heap
is cleaned up by the operating system has to waste cycles on
refcounting, whereas the program with GC wins simply by never invoking
GC over its lifetime.

Every time an object's refcount is modified, an entire cache line
becomes dirty and has to be flushed to main memory. When another
processor in the system wants to access anything which lands in that
cache line, having noticed that it's dirty, it has to fetch the latest
version of that cache line from main memory. This means that two or
more processors can't work with the same refcounted object in a way that
is scalable, if at all they touch the refcounts---even if they are
treating the objects as read-only! As the refcounts are bumped, you have
performance-wasting cache ping-pong, and this can happen even with
separate objects (false sharing) if they land wholly or partially into
the same cache line. Refcounts have to be updated in a thread-safe way,
using expensive atomic operations that can cause stalls of thousands of
cycles. The consequences is that a refcounted program simply will not
scale well to a multiprocessor system.

Cycles can't be avoided in Lisp as easily as you might think because of
lexical closures. Closure environments are mutable, because variables
are assignable, and can easily generate cycles.

;; cycle created: lexical varible fun points to
;; the closure, and is captured by it at the same time:
(setf fun (lambda () ...))

So what you have to do is banish all variable assignment.

Without cycles, you can't express graph structures, so an entire branch
of computer science goes out the window. Many useful algorithms
call for graphs. You can't compile a regular expression to an NFA using
standard algorithms without generating a graph. So with refcounting
you need hacks, like the use of some uncounted backreferences, or
maintaining the structure completely outside of the managed memory.

Tamas K Papp

unread,

Dec 16, 2009, 1:21:55 PM12/16/09

to

On Wed, 16 Dec 2009 18:04:14 +0000, Kaz Kylheku wrote:

> Without cycles, you can't express graph structures, so an entire branch
> of computer science goes out the window. Many useful algorithms

I disagree. Graphs can be represented, for example, in a matrix
(a[i,j] is nonzero iff i->j, etc). You can use this to reconstruct a
lookup table, basically emulating pointers.

Which would, of course, would be a pretty pointless thing to do -- but
the OP's questions have been pretty pointless so far, so maybe this
would be a harmonious outcome :-)

Cheers,

Tamas

William D Clinger

unread,

Dec 16, 2009, 1:36:21 PM12/16/09

to

Duane Rettig provided an excellent answer to the original
post.

Kaz Kylheku wrote:
> Refcounting can't be deferred,

A Google search on "deferred reference counting" would
provide more reliable information.

Will

Helmut Eller

unread,

Dec 16, 2009, 1:40:53 PM12/16/09

to

* Bennu Strelitzia [2009-12-16 15:31+0100] writes:

> Is there a flavor of lisp that due to immutability of structures makes
> referential cycles impossible, eliminating a requirement for full-blown GC?

Not really a Lisp, but Erlang seems to fit the bill. All data
structures in Erlang are immutable and "updates" are done by copying.
Not sure what the difference between full-blown vs. non-full-blown GC
could be, but Erlang has a generational GC. The absence of mutable
objects allows some simplifications (no write barriers, no old-to-new
pointers) and unsurprisingly Erlang's GC exploits those features.

Helmut

Pillsy

unread,

Dec 16, 2009, 2:29:01 PM12/16/09

to

On Dec 16, 1:04 pm, Kaz Kylheku <kkylh...@gmail.com> wrote:

> On 2009-12-16, Bennu Strelitzia <bennu.strelit...@gmail.com> wrote:

> > i.e. reference counting could do the trick for an efficient minimal
> > implementation if there were no referential cycles, I hate the real

> Reference counting is not efficient. Except in some highly contrived
> benchmarks that don't reflect real-world use, it is slower than garbage
> collection. Even simple mark-and-sweep can beat reference counting in
> realistic situations.

The following is a very naive observation.

The one thing it seems like reference counting could allow you to do
is immediate recycling of space. For example, if I want to sum two
vectors, using

(map 'vector #'+ a-big-vector another-big-vector)

I'll get a third big vector that has to be consed. If I know that A-
BIG-VECTOR isn't being used by anything else, I can recycle it using
MAP-INTO instead, but in general I have no idea. However, if there
were a reference count on those big vectors, the implementation could
make the decision for me at run-time. I'm primarily thinking of using
using reference counting as a way of augmenting garbage collection,
not as a replacement for it; indeed, you could limit it to only work
with some data types or objects, like big simple arrays of floats or
characters.

There's no way this is a new idea, and it's entirely possible I'm
overlooking some trade-offs that keep it from being worthwhile.

Cheers,
Pillsy

George Neuner

unread,

Dec 16, 2009, 3:03:45 PM12/16/09

to

On Wed, 16 Dec 2009 18:04:14 +0000 (UTC), Kaz Kylheku
<kkyl...@gmail.com> wrote:

>Refcounting can't be deferred,

There are lazy refcounters which defer recycling until the memory is
reallocated, but in terms of complexity all refcounters do more work
than tracing forms of GC.

George