How garbage collection can be worse than reference counting

10 views
Skip to first unread message

Nils Bruin

unread,
Sep 9, 2009, 4:01:17 PM9/9/09
to sage-flame
In a little experiment concerning big integer operations using GMP:

http://groups.google.ca/group/sage-devel/t/3c6159c238a7d557

I found that ECL (compiled), Magma and Python (OK, so that was
Python's native bigints) all had similar performance. No surprise,
because the example consisted of adding a whole bunch of integers, so
any bigint implementation is going to do essentially the same thing.
Probably just storing the words and allocating the memory are the
dominant factors here.

Sage was to my surprise quite a bit better. My hypothesis is that this
is because of its "integer pool": Rather than deallocating unused GMP
integers, sage keeps a few around for the next time, Especially in the
example I tried, this quickly leads to a situation where NO memory
allocation is needed anymore.

It is the eager part of reference counting that makes this easy to
implement efficiently: As soon as the ref count hits 0, the integer
can be relegated to the pool, ready for reuse (cache-friendly too!)

With a garbage collector that can only reclaim memory when it has
established the memory block is not reachable anymore, it would be a
lot harder to automatically recycle unused integers quickly.

I imagine there are many other scenarios where mark-and-sweep works
well and the burden of having to keep reference counts up to date all
the time is expensive (circular references?), but this seems to be an
example that shows that an accurate reference count can sometimes have
remarkable advantages too.

[note: I know that the code is an inefficient approach for computing
Fibonacci numbers in all cases. The point is, it's inefficient in
exactly the same way for all languages and I think it generates a
usage pattern that's not unusual in general mathematical number
crunchy experiments, where people would like decent performance
without having to invest in optimization of the code]

rjf

unread,
Sep 9, 2009, 5:29:23 PM9/9/09
to sage-flame
There's a lot of literature on this topic,

http://www.cs.kent.ac.uk/people/staff/rej/gc.html has 1900 entries...


but probably the classic from the perspective of your comment, is the
AED Storage package (circa 1967)

http://portal.acm.org/citation.cfm?doid=363534.363546

Basically, there is nothing to prevent you from keeping your own list
of storage and recycling it yourself
if you have a situation in which the generality of garbage collection
is not needed. This was supported
extensively in Lisp machine systems, and there are examples of how you
can go about allocating pools of storage in Common Lisp in several
texts. Sometimes called "resources" I think. This is good for I/O,
but also other tasks where memory speed is a keyl

In my own recent programming using big integers from GMP, I "compiled"
code from ordinary lisp into a register-style simulated machine, where
I allocated a few registers of pre-allocated GMP numbers, and
then furiously re-used them because I knew when they were no longer
needed. Of course in those circumstances, reference counting was
entirely unnecessary as well. Oh, and if I ran out of registers, I
just allocated another one.

When I was done, and if the system needed more storage, then the GC
would collect the structures around
the GMP numbers, and the deallocation of the GMP stuff would be called
by the GC in a finalization step.

There are other probably more important memory issues for really large
problems, but it is not clear what you should optimize for. Do you
expect to do some arithmetic on 2-megabyte numbers? If so, cache
considerations are far more importance than if you are mostly doing
stuff that fits in some cache.

Copying GCs can help maintain cache coherence. Conservative GCs do
not. Reference counts -- I suppose this relies on good luck in
patterns of allocation and deallocation, which may be likely.

RJF
Reply all
Reply to author
Forward
0 new messages