Could CDR-coding be on the way back?

Frank A. Adrian

unread,

Dec 6, 2000, 3:00:00 AM12/6/00

to

I just finished reading an artical in the Dec. 2000 issue of Computer
Magazine called "Making Pointer-based Data Structures Cache Conscious", by
Chilimbi, et.al. In it, the authors suggest that because of increasing
memory-CPU speed mismatches, modifying data structures to use less space and
be more cache coherant, even at the cost of raw instruction count, might
well be a win for today's machine architectures.

Although the article was based on structures in the current language of the
day (i.e., Java), the suggestions that the authors make would probably be
even more appropriate for a language like Lisp, due to the smaller size of
objects and greater frequency of pointer traversals. One of their main
points was that data structures should be compacted by using offsets instead
of full pointers and via data compaction so that more of them would stay in
cache. In addition, they believe that objects should be allocated together
so that objects likely to be traversed to would be brought into cache, as
well.

Oddly enough, this seems suspiciously close to some of the optimizations
that were made in Lisp machines. CDR-coding reduced the size of cons cells
and the garbage collectors of the day tried to place items thought to be
used together on the same memory page. Back then, the reason was to try to
reduce disk to memory paging, but a mismatch is a mismatch, eh?

If the data in this article (as well as the corresponding articles from
SIGPAN '99) is correct, spending a few cycles on decompressing a CDR might
make sense again today. That last analysis of CDR-coding I saw was from
about 10 years ago, when memory-CPU speeds were much more even. The
analysis then was that CDR-coding took more cycles to execute and caused
branch stalls, so it wasn't as good as directly coded addresses. Thinking
about (a) superscalar execution, (b) 6-to-10 cycle cache memory latencies,
and (c) larger instruction caches and branch tables, maybe CDR coding might
make sense again.

Even more importantly, perhaps one CDR-codes only the nursery of a
generational scavenger. This allows more objects to be created before GC is
necessary and, while aging objects are pushed out to older tranches,
translation into full-offset objects can be made on those objects that are
going to be around for a while. If enough cache is resident, the entire
nursery might be cached in, leading to very substantial speed increases.

So, again we see that the world simply re-invents Lisp but, more
importantly, one of the neatest hacks from the olden days might be
reasonable again. Opinions?

faa

Kai Harrekilde-Petersen

unread,

Dec 7, 2000, 2:22:58 AM12/7/00

to

"Frank A. Adrian" <fad...@uswest.net> writes:

> I just finished reading an artical in the Dec. 2000 issue of Computer
> Magazine called "Making Pointer-based Data Structures Cache Conscious", by
> Chilimbi, et.al.

[snip]

> Oddly enough, this seems suspiciously close to some of the optimizations

> that were made in Lisp machines. CDR-coding [...]

What does CDR-coding mean in this context? - To me, CDR is Clock Data
Recovery, and that's sure wrong here.

Kai

Stefan Monnier <foo@acm.com>

unread,

Dec 7, 2000, 3:00:00 AM12/7/00

to

>>>>> "Bruce" == Bruce Hoult <br...@hoult.org> writes:
> It's perhaps only slightly less common to want to store characters,
> booleans, the empty list, empty array, zero length string etc directly
> as well. This requires stealing another bit from the pointer
> representation, forcing all pointers to be 4-byte aligned (on a byte
> addressed machine).

You can also encode those as small values and make sure that pointers
are always larger than those values, so you don't need extra bits.

> "CDR-coding" consists of stealing *another* two bits from the pointer
> representation. There are three possible values:
> - the next word of memory is the CDR, as usual
> - the next word of memory would be a pointer to NULL (the
> empty list), if it was there. i.e. the current element is
> the last in the list.
> - the next word of memory is the CAR of the next cons cell in
> the current list

The "next word is NULL" case is not strictly necessary so you
can reduce the number of required bits to just one.

> memory for contiguously-allocated lists. The cost in extra instructions
> to check for special-cases is quite high, especially when such a list is
> modified by inserting or deleting elements in the middle.

But such `setcdr' operations are not very frequent, or might
even not be allowed (in languages like ML).

Stefan

Ian Kemmish

unread,

Dec 7, 2000, 3:00:00 AM12/7/00

to

In article <CzFX5.6952$i32.4...@news.uswest.net>, fad...@uswest.net says...

>
>I just finished reading an artical in the Dec. 2000 issue of Computer
>Magazine called "Making Pointer-based Data Structures Cache Conscious", by

>Chilimbi, et.al. In it, the authors suggest that because of increasing
>memory-CPU speed mismatches, modifying data structures to use less space and
>be more cache coherant, even at the cost of raw instruction count, might
>well be a win for today's machine architectures.

A less labour intensive rewrite is to make your code smaller (if you end up
with fewer cache misses, do you really care whether they're due to data or
instructions?). The latest major rewrite of Jaws (well, a few years old now)
was undertaken with just this aim in mind, and did show a noticeable
performance increase, despite having fewer unrolled loops etc.

Of course, writing compact code might mean having to eschew fashionable
techniques like ubiquitous object-orientation:-)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Ian Kemmish 18 Durham Close, Biggleswade, Beds SG18 8HZ, UK
i...@jawssytems.com Tel: +44 1767 601 361
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Behind every successful organisation stands one person who knows the secret
of how to keep the managers away from anything truly important.

Barry Margolin

unread,

Dec 7, 2000, 3:00:00 AM12/7/00

to

In article <BOQX5.399$3V2....@news.dircon.co.uk>,

Ian Kemmish <i...@jawssystems.com> wrote:
>A less labour intensive rewrite is to make your code smaller (if you end up
>with fewer cache misses, do you really care whether they're due to data or
>instructions?).

But since instructions are mostly sequential, so the hardware can prefetch
the next block into the cache, so even larger code may have few cache
misses. With data, even if the hardware does prefetching, it will only be
helpful if the data exhibits high locality. This is true for array
structures, but when there's lots of pointers and indirection, as there is
in Lisp, it's only true if the GC keeps pointers together with the things
they point to, or if the hardware has a pointer-aware prefetcher (as
someone mentioned existed on LMI Lisp Machines).

--
Barry Margolin, bar...@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Bruce Hoult

unread,

Dec 7, 2000, 6:49:15 AM12/7/00

to

In article <80bsuoa...@orthanc.exbit-technology.com>, Kai
Harrekilde-Petersen <k...@exbit.dk> wrote:

A Lisp "cons cell" is a data record that contains two pointers called
(for wierd historical reasons) CAR and CDR. The most common use of them
is to build linked lists in which the CDR of each cell points to the
next element in the list and the CAR points to the data item at that
position in the list.

In fact for efficiency the "pointers" aren't always actually pointers.
In particular it's useful to be able to store an integer directly in the
CAR and/or CDR positions, rather than storing a pointer to some record
containing an integer. Therefore most Lisps arrange to reserve e.g. the
even values for actual pointers and the odd values for integers with
integers represented as e.g. 2N+1. On machines with compulsory
word-alignment for pointers this doesn't affect pointers at all, and
give you half the number of integer values you'd otherwise have (i.e. 31
bit on a 32 bit machine).

It's perhaps only slightly less common to want to store characters,
booleans, the empty list, empty array, zero length string etc directly
as well. This requires stealing another bit from the pointer
representation, forcing all pointers to be 4-byte aligned (on a byte
addressed machine).

Cons cells can be used to build any data structure, but simple
single-linked lists are very common, and it's fairly common to allocate
them all at once, which means that each cons cell ends up pointing at a
cell right next to it in memory.

"CDR-coding" consists of stealing *another* two bits from the pointer
representation. There are three possible values:

- the next word of memory is the CDR, as usual
- the next word of memory would be a pointer to NULL (the
empty list), if it was there. i.e. the current element is
the last in the list.
- the next word of memory is the CAR of the next cons cell in
the current list

You now have a total of four bits stolen from the pointer
representation, but as a result you might be able to save quite a lot of

memory for contiguously-allocated lists. The cost in extra instructions
to check for special-cases is quite high, especially when such a list is
modified by inserting or deleting elements in the middle.

-- Bruce

Duane Rettig

unread,

Dec 7, 2000, 10:55:03 AM12/7/00

to

Bruce Hoult <br...@hoult.org> writes:

> You now have a total of four bits stolen from the pointer
> representation, but as a result you might be able to save quite a lot of
> memory for contiguously-allocated lists. The cost in extra instructions
> to check for special-cases is quite high, especially when such a list is
> modified by inserting or deleting elements in the middle.

Note that one such test that becomes nontrivial on General Purpose
hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
those extra bits must be masked off before the comparison. I don't
know if anyone has ever considered placing those extra bits "elsewhere",
e.g. in a parallel location to the actual pointer/tag word.

--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)

Joe Marshall

unread,

Dec 7, 2000, 12:43:28 PM12/7/00

to

At LMI, we abandoned CDR-coding for our last machine for several
reasons:
1. It took bits out of the pointers, which, being word pointers,
meant a reduction in address space.

2. It would complicate the hardware. We had hardware that
`understood' lists and could do `prefetches' on them.

3. Only about 10% of the allocated storage in a running Lisp
machine was compressed cdr-coded lists.

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Chuck Fry

unread,

Dec 8, 2000, 10:41:58 PM12/8/00

to

Like Frank, I've been wondering about this for a couple of years now. I
am starting to think current superscalar and/or VLIW technology might
encourage clever programmers to reintroduce the concept of
'microcoding', but under a New Millennium-style buzzword (e.g. VLIW!).
Shades of the B5000....

And with 64 bits to play with, who's to say we can't spare a couple for
CDR-coding?

In article <4k89cn...@beta.franz.com>,

Duane Rettig <du...@franz.com> wrote:
>Bruce Hoult <br...@hoult.org> writes:
>> You now have a total of four bits stolen from the pointer
>> representation, but as a result you might be able to save quite a lot of
>> memory for contiguously-allocated lists. The cost in extra instructions
>> to check for special-cases is quite high, especially when such a list is
>> modified by inserting or deleting elements in the middle.
>
>Note that one such test that becomes nontrivial on General Purpose
>hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
>those extra bits must be masked off before the comparison. I don't
>know if anyone has ever considered placing those extra bits "elsewhere",
>e.g. in a parallel location to the actual pointer/tag word.

Getting back to the original point of this thread, given superscalar
CPUs running in the GHz region, register-to-register or
immediate-to-register instructions are cheap (as long as they're in the
I-cache), and dereferencing pointers is *extremely* expensive (if the
targets are not in the D-cache).

Duane, I figure you would know this as well as anyone, so I have to ask:
What is the true cost of doing this masking compared to that of chasing
pointers into uncached RAM on a couple of well-known architectures,
e.g. x86, SPARC, or PowerPC? If it's done all the time I imagine the
instruction fetches would be damned near free.

-- Chuck, used-to-wannabe computer architect
--
Chuck Fry -- Jack of all trades, master of none
chu...@chucko.com (text only please) chu...@tsoft.com (MIME enabled)
Lisp bigot, car nut, photographer, sometime guitarist and mountain biker
The addresses above are real. All spammers will be reported to their ISPs.

Per Bothner

unread,

Dec 9, 2000, 2:53:01 PM12/9/00

to

"Frank A. Adrian" <fad...@uswest.net> writes:

> So, again we see that the world simply re-invents Lisp but, more
> importantly, one of the neatest hacks from the olden days might be
> reasonable again. Opinions?

I have an even more efficient and radical idea: Don't use lists.
They are almost always the wrong data structure. Use arrays.
Arrays are available and efficient in Lisp, Scheme, Java, C++,
Fortran, etc etc.

If you need variable-sized data structures, you should still use
arrays. Just double the size of the array when it becomes too small.
You double the array to make sure that amortized (average) cost of
insertions is constant. Even just after you've doubled the array and
so almost half the space is unused, you are not using any more space
than a linked list would take.

If it is important to be able to insert/delete in the middle of
a sequence, you might consider using a list. But consider a
buffer-gap array first. Instead of leaving the unused space at
teh end of the array, allow it to be in the middle, whereever
insertions/deletions are done. You need to "move the gap" (i.e.
copy things around in the data structure) when the gap is not where
you need to insert/delete, but most of the time insertions/deletions
will tend to be clustered (near each other).

A arguemnt could be made that lists are easy to use in a functional
(side-effect-free) programs - just use recursion. But I don't think
that is a strong argument. You can write clean effecient side-effect-free
code using arrays: Just look at APL and its successors. Modern
functional languages have "list comprehensions" - these can just as
easily be used for "array comprehensions".

I am talking about standard Lisp-style lists implemented using pairs.
There is a better case to be made for chaining objects together using
link slot in a more complex objects. At least in that case you're not
allocating a separate cons cell for each link.
--
--Per Bothner
p...@bothner.com http://www.bothner.com/~per/

Erik Naggum

unread,

Dec 9, 2000, 5:00:21 PM12/9/00

to

* Per Bothner <p...@bothner.com>

| I have an even more efficient and radical idea: Don't use lists.

Oh, Christ! We've been there, done that, about a million times.

| They are almost always the wrong data structure. Use arrays. Arrays
| are available and efficient in Lisp, Scheme, Java, C++, Fortran, etc
| etc.

Yes, so they are. Available. Arrays are also extremely inefficient
where lists are not the wrong data structure. Hence, use lists _and_
arrays, as appropriate. Lisp programmers are not the idiots you think
they are, Per. They figure out when it makes sense to use arrays even
if you don't and need to reduce the problem to silliness.

| If it is important to be able to insert/delete in the middle of
| a sequence, you might consider using a list.

If your data structures nest, use lists.

If your data structures have unpredictable structure, use lists.

If you need to maintain uniformity of type for the "rest" after
processing an initial portion of the data structure, use lists. (If
you think this only applies to recursion, which it seems you do, you
really must think more carefully about algorithm design with arrays
and why displaced arrays or passing _pairs_ (sorry, data structures in
small arrays, of course) of the array and the index won't cut it.)

If you need to manipulate the structure of the data structure in any
way, not just insert/delete, use lists. (Flattening, pairing, etc.)

If after much headache and painfully idiotic programming mistakes, you
discover that your array implementation really must use two elements
per "array", one data and one "next", you have reinvented the cons.
Congratulations! Welcome to the wisdom of 40 years of language design!

| I am talking about standard Lisp-style lists implemented using pairs.
| There is a better case to be made for chaining objects together using
| link slot in a more complex objects. At least in that case you're not
| allocating a separate cons cell for each link.

It is just insane to add "link" slots to more complex objects when you
can separate and abstract out the container concept from the containee.

Who cares about allocating separate cons cells? Simple arrays have to
be more than four elements long to take less space than the equivalent
list using cons cells. Specialized arrays may have to be much longer.
The allocation for cons cells is extremely efficiently implemented in
all Lisp implementations. Using your stupid arrays-for-lists would of
course force some super-clever implementation of exponent-of-2-sized
arrays that double in size as they are used _and_ move around all the
time (destroying object identity unless you waste even _more_ space on
their already inefficient implementation), but people have been there,
many times over, and they have _returned_ to lists, implemented via
cons cells, when that has been the most intelligent solution.

What we have left after so many years of using lists _and_ arrays
_and_ structures _and_ clos objects _and_ databases, etc, is that
there is not a single argument left for using anything _but_ lists
where lists are used. If you want to teach novices to use arrays, at
least don't make the misguided Scheme-style argument that alternatives
to the obvious and the most natural must be chosen because it somehow
more divine to use recursion, oops, I meant arrays.

You've done so much good work. How come you don't know simple things?

#:Erik
--
"When you are having a bad day and it seems like everybody is trying
to piss you off, remember that it takes 42 muscles to produce a
frown, but only 4 muscles to work the trigger of a good sniper rifle."
-- Unknown

Kai Henningsen

unread,

Dec 9, 2000, 10:14:00 AM12/9/00

to

i...@jawssystems.com (Ian Kemmish) wrote on 07.12.00 in <BOQX5.399$3V2....@news.dircon.co.uk>:

> now) was undertaken with just this aim in mind, and did show a noticeable
> performance increase, despite having fewer unrolled loops etc.

Maybe not despite, but because of fewer unrolled loops. For exatly the
cache footprint reasons mentioned, I've heard people argue that these
days, unrolled loops are actually counter productive.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)