* Robert Swindells
| PDP-10 Maclisp and Franz Lisp used bibop rather than tag bits to type
| pointers. They shifted down the address to get a page number which was used
| as an index into an array of type values.
When I implemented these things myself, I used fairly large pages (64K) and
held the type information in the first few words of the page. To access the
type, just zero the lowest 16 bits and read the type information. The cache
line would be fairly frequently accessed and would probably remain in L1.
When allocating values from such a type-homogenous page, the first few words
of the page would also contain the next unused unit. Since this word is in
the same cache line as the type information, it would also ensure that that
cache line would remain in L1. Which page to allocate from would be found
from a table of "open pages" for each type. Garbage collection knew the
layout of the types on a page from other administrative information on the
page and copying out those values that were still referenced was fairly
simple. Mark bits was implemented efficiently with one bit per allocated
type and stored in dedicated words. I used this scheme mainly to take care
of C programs that allocated a huge number of small pieces of memory of a
small set of sizes and noticed that the standard allocator used up a lot of
memory for administrative purposes and got really wasteful. Since it was
both expensive to allocate 64K-pages and wasteful to move things around too
much, I could exclude individual pages from the garbage collection with a
flag in the page that would cause it not to be copied. I implemented this
on the first Unix machine I had access to, an Altos running on M68K that had
4M or memory. Because of this new allocation scheme, the program was able
to solve problems three times as large as it could do with normal malloc and
free. We actually found that some types were used only for slowly acreting
long-term storage that had previously caused much fragmentation of memory
and that there was more harm than good in freeing these objects and this
caused the whole new allocator design.
| On a modern CPU with huge penalties for cache misses, tags make more sense
| as they will always be in the cache along with the pointer that they type.
| Needing to load a cacheline for the pointer as well as one for part of the
| type table would really reduce cache efficiency.
Not necessarily. The above scheme, now at least 20 years old, would scale
well in modern processors. I have not tried to reimplement it. Perhaps I
should... I "invented" it on the spot, but I have no idea whether this
scheme has been "independently invented" in the interim. I have seen
various scheme to allocate memory in particular sizes, but a lot of them
have just used the (IMHO stupid) word before the object to make a linked
list of freed objects, wasting much memory with alignment requirements.
| The 68000 was an ideal CPU for this as it had three "function code" pins
| that would indicate whether a bus cycle was for instruction fetch or data
| read.
Never did hardware stuff on the M68K, but I really liked its instruction
set. Such a pity that Intel "won" with its braindamaged design.
| In a modern CPU there are usually several spare bits in a page table entry.
| I would like to see some way of using them to store the type of objects in
| that page.
In my experience, you may need quite a bit of administrative overhead at the
beginning of the page, and the usual page size of 4K is way small to pay for
the overhead. Or so I thought when I used 64K pages. It was also sexy in
that the &~0xffff operation turned into smarter instructions in the compiler
back then. Ah, the calm breeze on these walks down memory lane (several
puns intended).
--
Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.